Give a man a program, frustrate him for a day. <i>Teach</i> a man to program, frustrate him for a lifetime.

Test

Assignment 2: A Basic Compute Farm

Purpose

Motivation

Each large Internet computing project, such as SETI@HOME, tackles some problem that has a simple parallel decomposition. We call such "embarrassingly parallel" problems piecework-parallel, indicating that a problem in this class has a piecework decomposition: The problem decomposes into objects that implement Task, and whose execute methods return values that can be composed into a solution to the original problem.

Piecework Decomposition

Fig. 1: Pieceworktask decomposition topology

Specification

In this assignment, you build a basic compute farm infrastrure for hosting piecework-parallel problems. The client decomposes the problem, constructing a set of Task objects. These tasks are passed to a Space, which makes them available to compute servers which function much like those in your first assignment. The results computed by Computers are returned to the Space. The client retrieves results from the Space, composing them into a solution to the original problem.

The API

Task

Task.java
package api;
import java.io.Serializable;
import java.util.concurrent.Callable;

/**
 *
 * @author Peter Cappello
 * @param <V> the task return type.
 */
public interface Task<V> extends Serializable, Callable<V> 
{ 
    @Override
    V call(); 
}

An immutable Result container

Result class has:

A definition follows: Result.java
package api;
import java.io.Serializable;

/**
 *
 * @author Peter Cappello
 * @param <T> type of return value of corresponding Task.
 */
public class Result<T> implements Serializable
{
    private final T taskReturnValue;
    private final long taskRunTime;

    public Result( T taskReturnValue, long taskRunTime )
    {
        assert taskReturnValue != null;
        assert taskRunTime >= 0;
        this.taskReturnValue = taskReturnValue;
        this.taskRunTime = taskRunTime;
    }

    public T getTaskReturnValue() { return taskReturnValue; }

    public long getTaskRunTime() { return taskRunTime; }
    
    @Override
    public String toString()
    {
        StringBuilder stringBuilder = new StringBuilder();
        stringBuilder.append( getClass() );
        stringBuilder.append( "\n\tExecution time:\n\t" ).append( taskRunTime );
        stringBuilder.append( "\n\tReturn value:\n\t" ).append( taskReturnValue.toString() );
        return stringBuilder.toString();
    }
}

The Computer interface

The Computer interface is not part of the API; the client interacts with the Space, which interacts with Computers as its backend.

The Space interface

Space.java
package api;

import java.rmi.Remote;
import java.rmi.RemoteException;
import java.util.List;
import system.Computer;

/**
 *
 * @author Peter Cappello
 */
public interface Space extends Remote 
{
    public static int PORT = 8001;
    public static String SERVICE_NAME = "Space";

    void putAll ( List<Task> taskList ) throws RemoteException;

    Result take() throws RemoteException;
    
    void register( Computer computer ) throws RemoteException;
}

The client is responsible for decomposing the problem into a set of Task objects, and passing them to the Space via the putAll method. In principle, these task objects can be processed in parallel by Computers.

(Alternatively, one could put the tasks in 1 at a time, using a putTask( Task task ) remote method. What are the tradeoffs here?)

Since the client puts tasks into the Space and Computer proxies (see below) take tasks from the Space, these interacting threads fit the Producer-Consumer design pattern.

After passing all the task objects to the Space, the client retrieves the associated Result objects via the take method: This method blocks until a Result is available to return the the client. Since Computers put Result objects into the Space and the client "consumes" them, we again see the Producer-Consumer design pattern.

If a particular Result needs to be associated with a particular Task (e.g., a Mandelbrot Result), this information is passed as a component of the Task execute method's return value. Based on this association, if it matters, it composes the result values into a solution to the original problem.

The Job interface & JobRunner class

From a OOD perspective, you may wish to follow the design ideas in the compute farm paper: Have Job and JobRunner classes that function much the same way. The client then would use these classes to perform the task decomposition/composition, and to interact with the Space on the client's behalf.

The main advantage of doing so is to increase code reuse among different clients. The Don't Repeat Yourself (DRY) design maxim is an energy investment strategy that generally yields dividends.

The SpaceImpl class

Faulty Computers

For the purposes of this assignment, a computer is defined to be faulty when a Remote method invoked on it returns a RemoteException. The Space accommodates faulty computers: If a computer that is running a task returns a RemoteException, the task is assigned to another computer.

The Space implementation's main method instantiates a Space object and binds it into its rmiregistry.

ComputerProxy

The space's implementation of register should instantiate a ComputerProxy, which is a separate thread. This thread's run method loops forever, removing tasks from a task queue, invoking the associated Computer's execute method with the task as its argument, and putting the returned Result object in a data structure for retrieval by the client. These data structures need to be thread-safe. (Why?) The Java BlockingQueue interface may be useful, as well as its LinkedBlockingQueue implementation.

The Computer Implementation

Fig. 2: The client-Space-Computer architecture. If there is an arc from A to B, then A has a remote reference to B.

Thread Safety

When an object implements a Remote method, the JVM allows that method to be invoked concurrently by multiple threads. Synchronizing access less than necessary leads to race conditions. One way to avoid race conditions is to declare all of the object's methods synchronous. However, this is not always possible. For example, if the object implements the Runnable interface, its run method may not be declared synchronous. In this case, when the run method accesses the object's state, put that code fragment in a synchronous block. Synchronizing more than necessary may lead to deadlock or livelock: Synchronization must be used carefully.

Basic design process for a thread-safe class:

Task classes

For each of the Task classes that you defined in the 1st assignment, define a corresponding Task class that solves part of the original problem. The decompositions need not be masterpieces of efficiency. For the TSP, partition the set of all possible tours. For example, if there are n cities, you can partition the set of tours into n - 1 parts: those that begin with cities

The clients

Define a client for each application that:

Experiments

Repeat the above steps for c = 1 and 2.

For the case of c = 2, the computers run on different machines.

For each problem type (e.g., Mandelbrot visualization), plot the completion time (ordinate) for the 2 experiments (abscissa).

Analysis

  1. If the runtime when using 2 computers is not approximately 1/2 of that when using 1 computer, speculate as to why.
  2. To improve your understanding of what is going on, what other measurements would you like to see included?

Deliverables

Directories

Files



 cappello@cs.ucsb.edu © Copyright 2010 Peter Cappello                                           2016.04.12