Clouded Data: Comprehending Scalable Data Management Systems

Report ID

2008-18

Report Authors

Sudipto Das, Shyam Antony, Divyakant Agrawal, Amr El Abbadi

Report Date

2009-11-01

Abstract

Managing petabytes of data for millions of users has been a challenge for big internet based enterprises such as Google, Yahoo!, and Amazon. Even though database management systems have a long history of managing enterprise level data and information, they are deemed to be unsuitable in this context. This resulted in an architectural redesign of data management systems with an eye towards the requirements of high scalability, high availability, and low latency while providing weaker consistency and lower application generality. In this paper, we try to comprehend what is fundamentally different in the internet-scale applications that allowed these data management systems to achieve orders of magnitude higher levels of scalability compared to traditional databases. With an understanding of these modern systems, we also make an attempt to predict future application requirements and raise two fundamental questions: where do we really stand in terms of scalable data management, and how far are we from providing scalable data management as a service, just as computing is provided as a service in large scale infrastructures?

Document

2008-18.pdf157.9 KB