AbstractRecent technological advances in hardware and software have facilitated the explosive growth in the production of digital information. Cloud systems offer tremendous scale, resource availability, and ease of use, with which we can process this data in the pursuit of scientific, financial, social, and technological advances. However, there are many systems to choose from that differ in many ways including public versus private cloud support, data management interfaces, programming languages, supported feature sets, fault tolerance, consistency guarantees, configuration and deployment processes.
In this paper, we focus on technologies for structured data access (database/datastore systems) in cloud systems. Our goal is to simplify the use of these systems through automation and to facilitate their empirical evaluation using real world applications. To enable this, we provide a cloud platform abstraction layer that decouples a data access API from its implementation. Applications that use this API can use any datastore that “plugs into” our abstraction layer, thus enabling portability. We use this layer to extend the functionality of multiple datastores without modifying the datastores directly. Specifically, we provide support for ACID transaction semantics for popular key-value stores (none of which provide this feature). We integrate this layer into the AppScale cloud platform – an open-source cloud platform that executes cloud applications written in Python, Java, and Go, over virtualized cluster resources and infrastructures-as-a-service (Eucalyptus and Amazon EC2). We use this system to investigate the overhead of providing this application portability layer for disparate datastores and the impact of extending them via the layer with distributed transaction support.