Transactional Data Stores in the Cloud
Cloud computing has emerged as a powerful paradigm for hosting Internet scale applications in large computing infrastructures. Major enabling features of the cloud include pay per use and hence no up-front investment, perception of unlimited resources and infinite scalability, and elasticity of resources. There is a widespread migration of IT services from the enterprise-scale computing infrastructures (i.e., networked cluster of servers) to cloud computing infrastructures (i.e., large data-centers with thousands to tens of thousands of machines). Since one of the primary utility of the cloud is for hosting a wide range of applications, large-scale data management systems are a crucial technology component in the cloud infrastructure. Relational databases have been extremely successful in the enterprise setting over the last two decades. But data management infrastructures in the cloud require systems to be scalable, elastic, fault-tolerant, self managing and self-healing – features which traditional databases lack. As a result, key-value stores such as Bigtable, PNUTS, Dynamo, and their open source counterparts are considered preferred data stores in the cloud. Although the key-value stores have many of the desired features, they provide minimal consistency guarantees and significantly reduced functionality due to their single-key access guarantees. As a result, there is a huge chasm when selecting a data store of choice in the cloud. We propose the design of two scalable data stores for cloud computing infrastructures – ElasTraS: an elastic transactional data store which is scalable and fault-tolerant, and is targeted towards enterprise applications requiring a relational data model; and G-Store: a key-value store which provides scalable and consistent multi-key accesses over dynamic groups of keys and is targeted to applications which favor the data model of key-value stores, but require transactional access to larger granules of data beyond single keys. The success of the project would have impact both in research and in practice - this project will bring forth research solutions for designing and implementing scalable data management systems, and act as a building block for developing commodity solutions dealing with the growing scale of the Internet.
This is an umbrella project which outlines our endeavor to build a consistent, scalable, elastic, and autonomous database systems for the cloud. Following are the two major sub projects under the umbrella project.