CS170 Please Hand Me That Wrench.


Background

This lab started out as a question, the answer to which I'm hoping you will provide. Imagine that you have a primitive that lets you write a stream of bytes in one end and read them out of the other. Network programming (e.g. sockets) work this way, more or less. You open a socket between two processes (on two separate machines, typically), one process writes into the socket on one machine and the other process reads from the socket on the machine. Pretty much all of the cool things you like to do in the Internet rely on socket programming. We won't study it directly, but we will get the "feel" for it in this lab.

Another primitive that works like a socket, but is local, is called a pipe. Unix pipes can be opened between processes running on the same machine, and once opened, they work like sockets. One process writes data into the pipe and another process reads data form the other end. It is called a pipe for two reasons. The first is that data is delivered in First-In-First-Out (FIFO) order. That is, if a single writer writes 10 bytes into a pipe, a single reader will see them in the order they were written.

The second reason for the name is that the rate that data flows in need not be the same as the rate that data flows out. Think trying to take a drink of water from a water hose. If you drink fast enough, the pipe will go dry as water is flowing in slower than you are drinking it out. If you drink too slowly, you either miss some, or the water will have to "back-up" waiting for you to drink more. Pipes, as we will study them, will be designed so that you do not have to miss any data if you are slow in draining them. We'll discuss how.

Another important features about both sockets and pipes is that they can be shared by multiple processes on the same machine. That is, you can have several processes writing into a single pipe and, at the same time, several processes reading data from the other end. Of course, they must take turns, but you can arrange that. The important thing to know, however, is that the pipe or socket agrees only to deliver the data once.

For example, if two processes are reading a pipe that has 100 bytes waiting in it, and the first reader read 25 bytes, the second reader will read from byte 26 onward. The 25 bytes that the first reader read are removed from the pipe (much like the water we discussed earlier) after the read. The same is true on the write side. If one writer writes 30 bytes and another writes 40 bytes, the writes will both go into the pipe and will not over-write each other.

In this lab, you will be implementing a restricted version of pipes that will work with pthreads. It will have some of the features of Unix pipes and sockets, but not all of them so don't panic. Our pipes will be simpler.

The Question

I said that this all started with a question. Well, here it is. Let's say you have correctly implemented pipes for pthreads already. How fast can you make a pipe run on our machines? Answering this question is a little tricky, but I'm sure you'll be creative. Notice a couple of things. First, the number of readers and the number of writers of a single pipe may affect the speed of the transfer. Why? Say there is one writer thread and one reader thread. The reader thread has to wait while the writer thread puts data in the pipe. Then, the reader starts emptying the pipe, and the writer must stop. If there were two readers, one might be able to drain the pipe while the other is processing the data. This idea (one reads while the other works) is sometimes called "double buffering" and it works really well in network setting. Does it work well with pthreads? You tell me.

Notice also that the size of the data buffers that are moved around can affect the speed. If the writer writes one byte and then tells the reader that a byte is there, the overhead associated with the communication will be assessed on each byte. Let's say that is 10 ms (I don't know what the real number is, but 10 ms is probably within a factor of 10 of it). You would be "charged" 10ms for each byte moved. Now let's say that the 10 ms overhead is constant, no matter how much data is moved. The more data you move each time a reader or writer runs, the faster you will run because your "cost" per byte will be lower.

If this cost business isn't crystal clear the first time you read it, don't worry -- it will be when you are finished.


What you Need to Do

Okay -- so what does this all mean to you? You will need to do three things: