Networked data-centric sensor applications have become popular in recent years. Sensors sample their surrounding physical environment and produce data that is then processed, aggregated, filtered, and queried by the application. Sensors are often untethered, necessitating efficient use of their energy resources to maximize application lifetime. Consequently, energy-efficient data management is a key problem in sensor applications.
Data management approaches in sensor networks have centered around two competing philosophies. Early efforts such as Directed Diffusion [11] and Cougar [23] espoused the notion of the sensor network as a database. The framework assumes that intelligence is placed at the sensors and that queries are pushed deep into the network, possibly all the way to the remote sensors. Direct querying of remote sensors is energy efficient, since query processing is handled at (or close to) the data source, thereby reducing communication needs. However, the approach assumes that remote sensors have sufficient processing resources to handle query processing, an assumption that may not hold in untethered networks of inexpensive sensors (e.g., Berkeley Motes [19]). In contrast, efforts such as TinyDB [13] and acquisitional query processing [3] from the database community have adopted an alternate approach. These efforts assume that intelligence is placed at the edge of the network, while keeping the sensors within the core of the network simple. In this approach, data is pulled from remote sensors by edge elements such as base-stations, which are assumed to be less resource- and energy-constrained than remote sensors. Sensors within the network are assumed to be capable of performing simple processing tasks such as in-network aggregation and filtering, while complex query processing is left to base stations (also referred to as micro-servers or sensor proxies). In acquisitional query processing [3], for instance, the base-station uses a spatio-temporal model of the data to determine when to pull new values from individual sensors; data is refreshed from remote sensors whenever the confidence intervals on the model predictions exceed query error tolerances.
While both of these philosophies inform our present work, existing
approaches have several drawbacks:
Need to capture unusual data trends: Sensor applications
need to be alerted when unusual trends are observed in the sensor
field; for instance, a sudden increase in temperature may indicate a
fire or a break-down in air-conditioning equipment. Although rare,
it is imperative for applications, particularly those used
for monitoring, to detect these unusual patterns with
low latency. Both TinyDB [13] and acquisitional query
processing [3] rely on a pull-based approach to acquire data
from the sensor field. A pure pull-based approach can never
guarantee that all unusual patterns will be always detected, since
the anomaly may be confined between two successive pulls. Further,
increasing the pull frequency to increase anomaly detection
probability has the harmful side-effect of increasing energy
consumption at the untethered sensors.
Support for archival queries: Many existing efforts focus on querying and processing of current (live) sensor data, since this is the data of most interest to the application. However, support for querying historical data is also important in many applications such as surveillance, where the ability to retroactively ``go back'' is necessary, for instance, to determine how an intruder broke into a building. Similarly, archival sensor data is often useful to conduct postmortems of unusual events to better understand them for the future. Architectures and algorithms for efficiently querying archival sensor data have not received much attention in the literature.
Adaptive system design: Long-lived sensor applications need to adapt to data and query dynamics while meeting user performance requirements. As data trends evolve and change over time, the system needs to adapt accordingly to optimize sensor communication overhead. Similarly, as the workload--query characteristics and error tolerance--changes over time, the system needs to adapt by updating the parameters of the models used for data acquisition. Such adaptation is key for enhancing the longevity of the sensor application.