MIST: Distributed Indexing and Querying in Sensor Networks using Statistical Models

Report ID: 
Arnab Bhattacharya, Anand Meka and Ambuj K. Singh
2007-04-01 05:00:00


The modeling of high level semantic events from low level sensor signals assists in the understanding of distributed phenomena. For such content-modeling purposes, transformation of numeric data into symbols and the modeling of resulting symbolic sequences can be achieved using statistical modelsMarkov Chains (MCs) and Hidden Markov Models (HMMs). We consider the problem of distributed indexing and semantic querying over such sensor models. Specifically, we are interested in efficiently answering (i) range queries: return all sensors that have observed an unusual sequence of symbols with a high likelihood, (ii) top-1 queries: return the sensor that has the maximum probability of observation of a given sequence, and (iii) 1-nn queries: return the sensor (model) which is most similar to a query model. All the above queries can be answered at the centralized base station, if each sensor transmits its model to the base station. However, this is communication-intensive. We present a much more efficient alternativea distributed index structure, MIST, and accompanying algorithms for answering the above queries. MIST aggregates two or more constituent models into a single composite model, and constructs an in-network hierarchy over such composite models. We develop two kinds of composite models: the first kind captures the average behavior of the underlying models and the second kind captures the extreme behavior of the underlying models. Using the index parameters maintained at the root of a subtree, we bound the probability of observation of a query sequence from a sensor in the subtree. We also bound the distance of a query model to a sensor model using those parameters. Extensive experimental evaluation on both real-world and synthetic data sets show that the MIST schemes scale well in terms of network size and number of model states. We also show its superior performance over the centralized schemes in terms of update, query, and total communication costs up to a factor of five.


PDF icon 2007-02.pdf