MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services

TitleMD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services
Publication TypeConference Paper
Year of Publication2011
AuthorsNishimura, S, Das S, Agrawal D, El Abbadi A
Conference NameMobile Data Management (MDM)
Date Published06/2011
PublisherIEEE Computer Society
Abstract

The ubiquity of location enabled devices has resulted
in a wide proliferation of location based applications and services.
To handle the growing scale, database management systems
driving such location based services (LBS) must cope with high
insert rates for location updates of millions of devices, while supporting
efficient real-time analysis on latest location. Traditional
DBMSs, equipped with multi-dimensional index structures, can
efficiently handle spatio-temporal data. However, popular opensource
relational database systems are overwhelmed by the high
insertion rates, real-time querying requirements, and terabytes
of data that these systems must handle. On the other hand,
Key-value stores can effectively support large scale operation,
but do not natively support multi-attribute accesses needed to
support the rich querying functionality essential for the LBSs.
We present MD-HBase, a scalable data management system for
LBSs that bridges this gap between scale and functionality. Our
approach leverages a multi-dimensional index structure layered
over a Key-value store. The underlying Key-value store allows
the system to sustain high insert throughput and large data
volumes, while ensuring fault-tolerance, and high availability. On
the other hand, the index layer allows efficient multi-dimensional
query processing. We present the design of MD-HBase that
builds two standard index structures—the K-d tree and the Quad
tree—over a range partitioned Key-value store. Our prototype
implementation using HBase, a standard open-source Key-value
store, can handle hundreds of thousands of inserts per second
using a modest 16 node cluster, while efficiently processing multidimensional
range queries and nearest neighbor queries in realtime
with response times as low as hundreds of milliseconds.

Refereed DesignationRefereed