Course Syllabus

CMPSC 274: Advanced Topics on Databases

Data Management issues for Data-intensive Computing

FALL 2009: TR 1:00 - 3:00  932 101

Class Website: http://www.cs.ucsb.edu/~agrawal

Course Description

Data management systems and technologies have historically played a pivotal role in the context of computing environments that involve large volumes of data and information. In fact, data base management systems (DBMS) are the critical components of any data-intensive application infrastructure. Furthermore, the underlying technologies, both in terms of language and query models as well as with respect to the system architectures, have reached a level of maturity that has enabled its use as a plug-and-play component without the need for detailed learning of its internals.

Recently, however, the entire area of data-management especially as it pertains to large-scale data arising from Internet and Web-based applications is at the cross-roads. The main question in this debate is the effectiveness of old DBMS paradigms: declarative query languages, independence of logical and physical data model, and the computational framework based on the Transaction concept. Several of the large Internet companies such as Google, Yahoo, and Amazon have put forth competing solutions for both building data-intensive scalable applications over the Internet/Web as well as for large-scale data anlaytics.

During this quarter, we will begin a joint exploration to gain a deeper understanding to participate in this debate. In particular, the following topics will be covered:

The detailed lecture organization for the course appears below.

Pre-requisites: CMPSC 170.

Required Textbook: Transactional Information Systems by Gerhard WEIKUM and Gottfried VOSSEN

Instructor: Divy Agrawal, agrawal AT cs.ucsb.edu

Office hours: MW 14:30 - 15:30, 3117 Engineering I, and by appointment.

Teaching Assistant:

Grading: CMPSC 274 Course Outline (approximate):
Date Topic Related Reading Comments
Th: 9/24/2009 Data Management Issues in Data-intensive Computing Lecture #1 Notes Historical Overview & Motivation
Tu: 9/29/2009 Data Management for Enterprise Applications Lecture #2: Overview; Lecture #2: Correctness Database Correctness
Th: 10/1/2009 Data Management for Enterprise Applications Lecture #3: Serializability Correctness models for Transaction Execution; Homework #1 Assigned
Tu: 10/6/2009 Data Management for Enterprise Applications Lecture #4: Two Phase Locking;Lecture #4: Variants of 2PL Concurrency Control Protocols
Th: 10/8/2009 Data Management for Enterprise Applications Lecture #5: Non-locking Protocols Concurrency Control Protocols; Homework #1 due
Tu: 10/13/2009 Data Management for Enterprise Applications Lecture #6: Multiversion Data Multiversion Synchronization; Homework #2 Assigned
Th: 10/15/2009 Data Management for Enterprise Applications Lecture #7: Transaction Failures Transaction Failures and Recoverability
Tu: 10/20/2009 Data Management for Enterprise Applications Lecture #8: Crash Failures Database Recovery from Crash Failures
Th: 10/22/2009 Data Management for Enterprise Applications Lecture #9: Recovery Protocols Database Recovery from Crash Failures; Homework #2 Due
Tu: 10/27/2009 Data Management for Enterprise Applications Lecture #10: Distributed Recovery Data Distribution & Data Replication
Th: 10/29/2009 Project Discussion
Tu: 11/3/2009 Data Management for Internet Applications Powerpoint Slides Yahoo's PNUTS & Amazon's Dynamo
Th: 11/5/2009 Data Management for Internet Applications Powerpoint Slides Google Solution Stack: Chubby & BigTable
Tu: 11/10/2009 Data Management for Internet Applications Powerpoint Slides Google's BigTable & Chubby Lock Service
Th: 11/12/2009 Data Management for Internet Applications Powerpoint Slides Correcness Semantics & Future Outlook
Tu: 11/17/2009 Large-scale Data Analysis in the Enterprise Context Data Warehousing Data Warehousing Fundamentals
Th: 11/19/2009 Large-scale Data Analysis in the Enterprise Context OLAP & Data Cube Online Analytical Processing and the Data Cube Model
Tu: 11/24/2009 Large-scale Data Analysis in the Internet Context MapReduce The MapReduce Paradigm
Th: 11/26/2009 NO CLASS Thanksgiving Holiday  
Tu: 12/1/2009 Macro-trends in Computing Infrastructures Cloud Computing Cloud Computing, SaaS, PaaS, and IaaaS
Tu: 12/3/2009 Micro-trends in Computing Infrastructures Transactional Memory Parallel Computing Paradigms
12/10/2009 Project Demonstrations By Appointment