Decentralized Object Location and Routing: A New Networking Paradigm

by

Ben Yanbin Zhao

B.S. (Yale University) 1997
M.S. (University of California at Berkeley) 2000


A dissertation submitted in partial satisfaction of the
requirements for the degree of
Doctor of Philosophy

in

Computer Science

in the

GRADUATE DIVISION
of the
UNIVERSITY OF CALIFORNIA, BERELEY

Committee in charge:
Professor John D. Kubiatowicz, Co-Chair
Professor Anthony D. Joseph, Co-Chair
Professor Ion Stoica
Professor John Chuang

Fall 2004

[Full Text in PDF Format, 1.7 MB]
[Full Text in GZIP PS Format, 937 KB]
[Full Text in Postscript Format, 3.5 MB]


Abstract


The growth of the Internet has led to technological innovations in a variety of fields. Today, the Internet provides a wide variety of valuable services to end host clients via well-known DNS hosts. These hosts serve up content ranging from maps and directions to online shopping to database and application servers. Along with the growth of the Internet, network applications are also growing in client population and network coverage. This is exemplified by new applications that support requests from hundreds of thousands of users and scale across the entire Internet.

Our work seeks to facilitate the design, implementation and deployment of these applications, by building a communication and data management infrastructure for global-scale applications. The applications we target share several defining characteristics. First, they support large user populations of potentially millions. Second, their components span across large portions of the global Internet. Finally, they expect to support requests from users with wide-ranging resources from wireless devices to well-connected servers.

We identify the key application requirement as scalable location-independent routing. For any large-scale network application, both communication and data management distill down to the problem of communicating with a desired endpoint via a location independent name. For communication with a node, the endpoint is its location-independent name. To locate a data object, the endpoint is the name of the node with a current and closeby replica of the object. Additionally, nodes can use the latter scenario as a way to announce its membership of a group, and allowing others to rendezvous with it using the group name.

The key goals for our infrastructure include the following:

Our approach is to build this scalable, efficient, reliable communication and data location infrastructure in the form of a structured peer-to-peer overlay network called Tapestry. Tapestry is one of the original structured peer-to-peer overlay systems. In its design and implementation, we provided one of the first application platforms for large-scale Internet applications, removing the existing scale limitations of unstructured networks. In the process, we also gained a better understanding of the interfaces these overlays provide to the applications, and the implications these interfaces had on application performance. Finally, we developed a number of techniques to enhance the efficiency and resiliency of Tapestry on top of the dynamic and failure-prone wide-area Internet.

In this thesis, we present details on the motivation, mechanisms, architecture and evaluation of Tapestry. We highlight the key differences between Tapestry and its contemporary counterparts in interface design, efficiency and resiliency mechanisms. We evaluate Tapestry using a variety of platforms, including simulations, microbenchmarks, cluster emulation, and wide-area deployment, and find that Tapestry provides a flexible, efficient and resilient infrastructure for building wide-area network applications.