|
|

Hardware-based approaches to accelerate CPU intensive operations that
arise in the context of non-standard database applications are proposed. Three
hardware technologies, namely, Micro-Electro-Mechanical Systems (MEMS), off-the-shelf graphics cards(GPU) and content
addressable memories (CAM) that are used in contemporary Network Routers for very fast lookups are identified. Advances in Nanotechnology hold significant promise to overcome many of the physical
limitations of traditional magnetic hard disks and transistor-based memory chips. Given the significant potential MEMS has for large and fast non-volatile storage of data, databases can benefit tremendously from such devices. The graphics functionality designed primarily for display applications can be exploited for polygon intersection
in spatial databases and for distance-based query processing. Content
addressable memories (CAM) can be gainfully deployed for accelerating join operations in databases. In the context of new database applications, the join constraint which is generally based on equality is being expanded
to encompass other conditions such as set-containment. The major goal
of this research is to develop a general framework for
enabling fine-grained hardware acceleration within a commercial DBMS.
Principal Investigators
Graduate Student Researchers
- Nagendar Bandi (Ph.D. 2006)
Currently at Oracle
Dissertation: New Hardware Support for Compute-Intensive Database and Data Stream Operations
Abstract
High performance database systems require both the software and
hardware components of the system to deliver their best possible
performance. While years of research by the database community has
primarily focused on improving performance by developing better
software techniques, relatively little effort has gone into hardware
level innovations. Several important database problems such as
polygon-related spatial database queries, conditional database joins
and one-pass database summarization queries, are still computationally
quite expensive to solve. Although several architecture-conscious
solutions have been proposed, there is a limit to the performance
improvement that can be achieved on current architectures.
In this thesis, we take an alternative approach and propose novel
hardware-based techniques using off-the-shelf graphics and networking
hardware to augment the capabilities of modern database systems. We
first develop the dual-threaded framework, which enables the
integration of hardware-based techniques into a commercial
database. Using modern graphics hardware as a co-processor, we propose
a spatial query operator for answering intersection queries over
polygon datasets. This operator, built using the dual-threaded
framework, complements the existing index structures and optimizations
of a spatial database.
We next focus on developing faster database algorithms using Terenary
Content Addressable Memories (TCAMs), which enable giga-bit rate
forwarding at network routers. We propose the CAM-Cache architecture
which integrates an off-the-shelf TCAM into the memory hierarchy of a
conventional processor through the PCI interface. Using this
architecture, we develop a fast sorting algorithm, which also
functions as an index structure. We then develop a fast conditional
join algorithm which uses the sorting algorithm as a basic component.
Finally, we present fast TCAM-based solutions for solving the heavy
hitters and heavy distinct hitters problems over data streams. We
analyze several popular software solutions for detecting heavy hitters
and heavy distinct hitters, discuss their bottlenecks and propose
several TCAM-conscious solutions which address these issues.
- Ozgur Sahin (Ph.D. 2005)
Currently at Google
Dissertation: Supporting Complex Queries Over Structured P2P
Networks
Abstract
Traditional distributed systems generally use the client-server model, where
data is served by dedicated servers and clients are consumers. This model has
limited scalability since the servers can become performance bottlenecks and
the maintenance costs are high. Peer-to-peer (P2P) systems have emerged as a
scalable model for distributed computing as they allow the sharing of resources
available at the client machines in a decentralized manner. They gained a lot of
popularity with file sharing applications such as Napster and Gnutella. These
applications, however, do not provide efficient search mechanisms and do not
scale well with the increasing number of users. As a result, Distributed Hash
Table (DHT) based structured P2P systems have been proposed. DHTs support
efficient exact key lookups in addition to offering very desirable properties
such as scalability, dynamic node insertion and departure, fault tolerance, and
efficient routing. However their applicability is limited as they only support
exact key lookups. We believe that DHTs can be extended to support more
complex querying operations, and thus provide the infrastructure for a larger
set of applications. In this thesis, we describe how various complex querying
operations can be supported over DHTs. Among the operations we explored
are range queries, content-based event dissemination, and similarity search. We
also describe several applications that can benefit from these operations such
as publish/subscribe systems and distributed Web service discovery.
- Alireza Aghili (Ph.D. 2005)
Currently at Teradata, a division of NCR
corporation
Dissertation: Sequence and Structure Similarity Search in
Biological and XML Databases
Abstract
The unprecedented growth of the Internet and biological databases has introduced
challenging and complex data formats and hence furnishing unique collaborative
venues for scientists of various disciplines. The set of such complex
databases includes, 1) XML (eXtended Markup Language) databases, 2) DNA
and Protein sequence and structure databases, 3) Microarray gene expressions, 4)
Biomedical images, and 5) Sensor data stream and Time series databases. Given a
source query pattern and a target database, the similarity search (range query or
top-k) seeks to identify those records of the database which match the given query.
The problem of similarity search in biological and textual databases has received
substantial attention in the past decade. Numerous filtration and indexing techniques
have been proposed to address the scalability issues and reduce the curse
of dimensionality. However, complex applications demand special customization
based on the inherent and underlying dynamics of the data. In this work, we study
the integration of various transformation and shape summarization techniques on
biological sequence and protein structure data, as well as path encoding in the
tree-structured XML data, for more efficient similarity search query processing.
- Hailing Yu (Ph.D. 2004)
Currently at Oracle
Dissertation: Architecture Conscious Information Management
Systems
Abstract
Due to continuous advances in semiconductor manufacturing, the memory hierarchy has suffered from problems of latency, bandwidth, and cost gaps for decades. The processor-to-memory gap has been partially mitigated by the integration of multi-level caches. However the memory-to-disk gap has been constantly growing. This has resulted in a significant performance bottleneck for data intensive applications.
The goal of this dissertation is to address the I/O bottleneck by exploiting the symbiosis between advanced technologies in hardwares and algorithms capable of maximizing the benefits of the new features provided by these technologies. To this end, we study methods to replace disks with MEMS-based storage devices, which have emerged as the leading candidate for next generation storage devices. Since MEMS-based storage devices have very different characteristics from disks, we exploit their inherent properties when integrating them into existing systems. As a first step, we study the I/O scheduling problem and propose a new scheduling algorithm for MEMS-based storage. Next, we explore MEMS-based storage in the context of relational databases, two-dimensional datasets, and vector sets. We propose the Flexible Retrieval Model (FRM) to place relational data on MEMS-based storage, which enables data retrieval in both row-wise and column-wise manner. We evaluate the performance of FRM using the TPC-H benchmark and show significant I/O improvement. In the context of two-dimensional datasets, we re-evaluate the optimal cost of range queries based on the current disk technology. We establish that it is mpossible to achieve the new optimal cost using disk devices, because the new optimal cost requires two-dimensional sequential access. Therefore we propose to use MEMS-based storage and design a new placement scheme, which almost achieves the new optimal cost.
Considering the I/O bottleneck purely from the perspective of an application, we propose methods for reducing the number of I/O transfers for aggregation ranking queries in OLAP data cubes. These queries aggregate information over a specified range and return the ranked order of the aggregated values. Existing techniques for range aggregate queries are not able to process aggregation ranking queries efficiently due to the large number of I/O transfers. We handle this problem through a new ranking algorithm, which reduces I/O cost dramatically.
- Abhishek Gupta (Ph.D. 2004)
Currently at Google
Dissertation: Attribute-based Data Access over P2P
Systems
Abstract
Peer-to-peer (P2P) systems provide a distributed and scalable alternative to
access services and data over the network which is in contrast to the
traditional
client-server model. The basic idea in these systems is to harness the
collective
storage resources available at the large number of peers, obviating
the need for a
few servers with large storage. In recent years, P2P computing has emerged as a
powerful distributed computing model for large scale systems. P2P systems have
been widely used for large scale file sharing and instant messaging
applications.
The initial P2P designs could either not scale to match their
popularity or caused
a lot of overhead on the networks. More recently Distributed Hash Table based
structured designs for P2P systems have been proposed that exhibit excellent
scalability and routing and lookup performance. These designs are however
limited in the kind of applications they can support because of the simple hash
table interface they provide. DHT systems simply rely on the names of the data
objects to distribute, index and locate them. In this thesis, we present designs
of DHT based P2P systems that extend the interface to support attribute based
data access by extracting semantic information from the queries. We describe
solutions for approximate as well as exact answers to range queries. We apply
the designs to support Select, Project, and Join based SQL queries and also
present an application to Publish/Subscribe systems. We have also developed
a prototype system for the proposed Publish/Subscribe system and evaluated
its performance on PlanetLab machines distributed across the world.
- Chengyu Sun (Ph.D. 2004)
Currently at California State University, Los Angeles
Dissertation: Improving Access Efficiency of Spatial Databases
Abstract
Due to the interactive nature of the applications and the complexity of the
data, spatial databases face some unique challenges compared to more
traditional DBMS. As part of the Alexandria Digital Library Project, our
work addresses some of these challenges with a focus on efficient access of
large spatial datasets. At the collection level, we develop three
approximation algorithms based on the Euler Histogram, and for the first
time enable users to quickly browse a spatial dataset with a variety of
spatial relations. In addition to browsing, we also generalize the Euler
Histogram to provide accurate selectivity estimation for spatial joins. At
the object level, we propose a novel approach to accelerate spatial
operations using the highly optimized rendering and searching capabilities
of modern graphics hardware. Experiments with real world datasets show
that by combining hardware and software methods, the overall computational
cost can be reduced substantially for both spatial selections and joins.
Acknowledgement
This work was supported by NSF award IIS-02201152 ITR: Hardware Acceleration of Database Operations
Selected
Publications:
- Fast Data Stream Algorithms using Associative Memories. Nagender Bandi, Divyakant Agrawal, Amr El Abbadi, Ahmed Metwally. SIGMOD 2007.
abstract
pdf
- Fast Computation of Spatial Selections and Joins Using Graphics Hardware. Nagender Bandi, Chengyu Sun, Divyakant Agrawal, Amr El Abbadi. Information Systems 2007.
- TCAM-Conscious Algorithms for Data Streams. Nagender Bandi, Ahmed Metwally, Divyakant Agrawal, Amr El Abbadi. ICDE 2007.
abstract
pdf
- MEMS-based Storage Architecture for Relational Databases. Hailing Yu, Divyakant Agrawal, Amr El Abbadi. VLDB Journal 16(2):251-268,2007.
abstract
pdf
- Fast Computation of Database Operations Using Content-Addressable Memories. Nagender Bandi, Divyakant Agrawal, Amr El Abbadi. DEXA 2006.
abstract
- Exploiting Sequential Access when Declustering Data over Disks and MEMS-based Storage. Hailing Yu, Divyakant Agrawal, Amr El Abbadi. Distributed and Parallel Databases, 19(2-3), 2006.
abstract
- New Hardware Support for Database Operations. Nagender Bandi, Chengyu Sun, Divyakant Agrawal, Amr El Abbadi. IEEE Data Engineering Bulletin, vol.28, 2003.
ps
- Hardware Acceleration in Commercial Databases: A Case Study of Spatial Operations. Nagender Bandi, Chengyu Sun, Divyakant Agrawal, Amr El Abbadi. VLDB 2004.
pdf
- Declustering two-dimensional Datasets over MEMS-based Storage Devices Hailing Yu, Divyakant Agrawal, Amr El Abbadi. EDBT 2004.
abstract
pdf
- Tabular Placement of Relational Data on MEMS-based Storage Devices. Hailing Yu, Divyakant Agrawal, Amr El Abbadi. VLDB 2003.
pdf
- Hardware Acceleration for Spatial Selections and Joins. Chengyu Sun, Divyakant Agrawal, Amr El Abbadi. SIGMOD 2003.
abstract
pdf
- Selectivity Estimation for Spatial Joins with Geometric
Selections. Chengyu Sun, Divyakant Agrawal, Amr El
Abbadi. EDBT 2002: 609-626.
abstract
pdf
- Exploring Spatial Datasets with Histograms.
Chengyu Sun, Divyakant Agrawal, Amr El Abbadi. ICDE 2002(Best
Paper Award).
abstract
- Hardware Acceleration for Spatial Selection and
Join. Chengyu Sun and Divyakant Agrawal and Amr El
Abbadi, UCSB Technical Report (2002-17).
abstract
ps
- Towards Optimal I/O Scheduling for MEMS-based
Storage. Hailing Yu, Divyakant Agrawal, and Amr El
Abbadi. Twentieth IEEE/Eleventh NASA Goddard Conference on Mass
Storage Systems & Technologies (MSST 03), 2003.
abstract
pdf

Copyright(c) 2007
DSL. All rights reserved. dsl@cs.ucsb.edu
|