Set-Serializability: A Formal Theory for Partitioned Data

Report ID: 
1993-06
Authors: 
Gustavo Alonso and Amr El Abbadi
Date: 
1993-01-01 04:00:00

Abstract

High transaction throughput in a distributed database system can be achieved byincreasing the autonomy of each site. This can be achieved by making more dataavailable locally. One way to ensure this is to partition data items inseveral subsets and allocate one subset to each site. As long as a transactioncan execute locally using the data items allocated to a single site, it willnot need to access data at other sites. An example of partitioned data is aticket reservation system in which instead of locating all the tickets for anevent in one central site, they are divided into several lots and each lot isallocated to a site. In that way, sites can proceed locally withoutcommunication with the central site or with other sites, speeding up theprocessing of transactions. To guarantee correct executions, constraints mustbe imposed on the operations that can be executed in parallel over thepartitioned data items. These constraints can be represented as distributedassertions and a mechanism must be provided to maintain them across thedistributed system. The fact that the data has been partitioned may lead tosituations in which an operation can not be performed over a partition of thedata but it could be done if the data was not partitioned. As a consequence,Data Partitioning may reduce the overall availability of data, which is clearlyundesirable. To solve this problem, a protocol has to be developed todynamically redefine a partition of data in such a way that the data isavailable at the site where it is needed.In this paper we introduce a general theory for distributed assertions and aprotocol for maintaining the validity of these assertions over a distributeddatabase. The operations of the protocol are executed as an atomictransaction, which allows an analysis of the problem using an extension ofserializability theory. Set-serializability is defined as the correctnesscriteria for these type of databases. It has the advantage of allowingstandard serializable histories and histories that, due to the execution of theprotocol for partitioned data, contain cyclic dependencies among transactions.Such transactions must cooperate to maintain the correctness of the databaseand the validity f the distributed assertions. Thus, set-serializabilityguarantees the correctness of the database both from the serializability pointof view and from the assertions point of view.Our approach is novel and makes several significant contributions. Our theorymodels both partitioned and non-partitioned data objects and we proposesolutions that integrate both concurrency control and recovery. In particular,we propose the notion of transactions sets and set history, that allow a set ofindependent transactions to cooperate to maintain the database consistency andto terminate conditionally to each other. Thus, the set of executions allowedby set-serializability is a superset of of that allowed by serializability.Previous approaches considered a restricted set of constraints, ignored theissue of recovery and did not address the problem of integrating partitioneddata with the standard non-partitioned data. Our approach provides an uniformand comprehensive framework for partitioned data in distributed databasesystems.

Document

File 1993-06.ps