Introduction
The ocean contains many tons of gold. But, the gold atoms are too
diffuse
to extract usefully. Idle cycles on the Internet, like gold atoms in
the
ocean, seem too diffuse to extract usefully. If we could harness
effectively
the vast quantities of idle cycles, we could greatly accelerate our
acquisition
of scientific knowledge, successfully undertake grand challenge
computations,
and reap the rewards in physics, chemistry, bioinformatics, and
medicine,
among other fields of knowledge. An opportunity is suggested by the
following
trends, taken as a whole:
- The number of networked computing devices is increasing:
Computation is
getting faster and cheaper: The number of unused cycles per second is
growing
rapidly
- Bandwidth is increasing and getting cheaper
- Communication latency is not decreasing
- Humans are getting neither faster nor
cheaper.
These trends and other technological advances lead to opportunities
whose
surface we have barely scratched. It now is technically feasible to
undertake
"Internet computations" that are technically infeasible
for
a network of supercomputers in the same time frame. The maximum
feasible
problem size for "Internet computations" is growing more rapidly than
that
for supercomputer networks. The SETI@home project discloses an emerging
global computational organism, bringing "life" to Sun Microsystem's
phrase
"The network is the computer". The underlying concept holds the promise
of a huge computational capacity, in which users pay only for the
computational
capacity actually used, increasing the utilization of existing
computers.
Project Goals
In the Jicos project, we are designing an open, extensible computation
exchange that can be instantiated privately, within a single
organization
(e.g., a university, distributed set of researchers, or corporation),
or
publicly as part of a market in computation, including charitable
computations
(e.g., AIDS or cancer research, SETI). Application-specific computation
services constitute one kind of extension, in which computational
consumers
directly contact specialized computational producers, which provide
computational
support for particular applications. The system must enable application
programmers to design, implement, and deploy large computations, using
computers on the Internet. It must reduce human administrative costs,
such
as costs associated with:
- downloading and executing a program on heterogeneous sets
of machines
and
operating systems
- distributing software component upgrades.
It should reduce application design costs by:
- giving the application programmer a simple but general
programming
abstraction
- freeing the application programmer from concerns of
interprocessor
communication
and fault tolerance.
System performance must scale both up and down, despite communication
latency,
to a set of computation producers whose size varies widely even within
the execution of a single computation. It must serve several clients
concurrently,
associating different clients with different priorities. It should
support
computations of widely varying lifetimes, from a few minutes to several
months. Hosts must be secure from the code they execute. Discriminating
among clients is supported, both for security and privacy, and for
prioritizing
the allocation of resources, such as compute hosts. After initial
installation
of system software, no human intervention is required to upgrade those
components. The computational model must enable general task
decomposition
and composition with a restrictive shared state that is appropriate to
the medium. The API must be simple but general. Communication and fault
tolerance must be transparent to the user. Hosts' interests must be
aligned
with their client's interests: computations are completed according to
how highly they are valued.
Some Fundamental Issues
It is a challenge to achieve the goals of this system with respect to
performance,
correctness, ease of use, incentive to participate, security, and
privacy.
Although this introduction does not focus on security and privacy, the Java
security model {Gong} and the ``Davis" release of Jini address
network
security {Scheifler} (covering authentication, confidentiality, and
integrity) clearly are intended to support such concerns. Our choice of
the Java programming system and Jini reflects these benefits
implicitly.
In this introduction, we present the HostingServiceProvider (HSP)
subsystem of Jicos, focusing on its design with respect to application
programming complexity, administrative complexity, and performance.
Application
programming complexity is managed by presenting the programmer with a
simple,
compact, general API, briefly presented in the API
section. Administrative complexity is managed by using the Java
programming
system: Its virtual machine provides a homogeneous platform on top of
otherwise
heterogeneous sets of machines and operating systems. We use a small
set
of interrelated RMI (soon to be Jini) clients and services to further
simplify
the administration of system components, such as the distribution of
software
component upgrades. The HSP is a service that interfaces with every
other
Jicos client and service. We however focus in this introduction on the
Task Server and the Host. Performance issues can be decomposed into
several
sub-issues.
- Heterogeneity of machines/OS
- The goal is to overcome the administrative complexity
associated with
multiple
hardware platforms and operating systems, incurring an acceptable loss
of execution performance. The tradeoff is between the efficiency of
native
machine code vs. the universality of virtual machine code. For the
applications
targeted (not, e.g., real-time applications) the benefits of Java JITs
reduce the benefits of native machine code: Java wins by reducing
application
programming complexity and administrative complexity, whose costs are
not
declining as fast as execution times.
- Communication latency
- There is little reason to believe that technological
advances will
significantly
decrease communication latency. Hiding latency, to the extent that it
is
possible, thus is central to our design.
- Scalability
- The architecture must scale to a higher degree than
existing
multiprocessor
architectures, such as workstation clusters. Login privileges must not
be required for the consumer to use a machine; such an administrative
requirement
limits scalability.
- Robustness
- An architecture that scales to thousands of computational
producers
must
tolerate faults, particularly when participating machines, in addition
to failing, can disengage from an ongoing computation.
Ease of use
The computation consumer distributes code/data to a heterogeneous set
of
machines/OSs. This motivates using a virtual machine, in
particular,
the JVM. Computational producers must download/install/upgrade system
software
(not just application code). Use of a screensaver/daemon obviates the
need
for human administration beyond the one-time installation of host
software.
The screensaver/daemon is a wrapper for a (soon to be Jini) client
(that
will download a "task server" service proxy every time it starts,
automatically
distributing system software upgrades.) |