ElasTraS: Elastic Transaction Management in the Cloud
ElasTraS targets the design space of scalable, elastic, fault-tolerant, self-managing, transactional relational database for the cloud. ElasTraS is designed to scale out using a cluster of commodity machines while being fault-tolerant and self-managing. ElasTraS is designed to support both classes of database needs for the cloud: (i) large databases partitioned across a set of nodes, and (ii) a large number of small and independent databases common in multi-tenant databases. ElasTraS borrows from the design philosophy of scalable Key-Value stores to minimize distributed synchronization and remove scalability bottlenecks, while leveraging decades of research on transaction processing, concurrency control, and recovery to support rich functionality and transactional guarantees.
Cloud computing has emerged as a pervasive paradigm for hosting Internet scale applications in large computing infrastructures. Major enabling features of the cloud include elasticity of resources, pay per use, low time to market, and the perception of unlimited resources and infinite scalability. As a result, there has been a widespread migration of IT services from enterprise computing infrastructures (i.e., networked cluster of expensive servers) to cloud infrastructures (i.e., large data-centers with hundreds of thousands of commodity servers). Since one of the primary uses of the cloud is for hosting a wide range of web applications, scalable data management systems that drive these applications are a crucial technology component in the cloud.
The ElasTraS project aims to fill the gap in the context of data management for applications in the cloud. ElasTraS is designed to support a relational data model and foreign key relationships amongst tables in a database. Given a set of partitions of the same or different databases, ElasTraS can:
- deal with workload changes resulting in the growing and shrinking of the cluster size,
- load balance the partitions,
- recover from node failures,
- if configured for dynamic partitioning, then dynamically split hot or very big partitions, and merge consecutive small partitions using a partitioning scheme specified in advance, and
- provide transactional access to the database partitions – all without any human intervention; thereby considerably reducing the administrative overhead attributed to partitioned databases, while scaling to large amounts of data and large numbers of concurrent transactions.