Report ID
2002-29
Report Authors
Ahmet Bulut and Ambuj K. Singh
Report Date
Abstract
The problem of statistics and aggregate maintenance overdata streams has gained popularity in recent years especially intelecommunications network monitoring, trend-related analysis,web-click streams, stock tickers, and other time-variant data.The amount of data generated in such applications can become toolarge to store, or if stored too large to scan multiple times.We consider queries over data streams that are biased towardsthe more recent values.We develop a technique that summarizes a dynamic streamincrementally at multiple resolutions. This approximationcan be used to answer point queries, range queries, and inner product queries.Moreover, the precision of answers can be changed adaptively by a client.Later, we extend the above technique to work in a distributedsetting, specifically in a large network where a main sitesummarizes the stream and clients ask queries. We minimize themessage overhead by deciding what and where to replicate by usingan adaptive replication scheme. We maintain a hierarchy ofapproximations that change adaptively based on the query andupdate rates. We show experimentally that our technique performsbetter than existing techniques: up to $50$ times better in termsof approximation quality, up to four orders of magnitude timesbetter in response time, and up to five times better in terms ofmessage complexity.
Document
2002-29.ps798.53 KB