CS 276: Distributed Computing and Computer Networks
Homework Assignment #1

Introduction

In this assignment, you will implement a reliable, sequenced message transport protocol (RMTP) on top of UDP. ``Reliable'' means that messages are delivered reliably ``end to end'' in the face of packet drops within the network, while ``sequenced'' means that messages are delivered to the receiver in the order they were generated at the sender. The RMTP protocol described herein supports simplex (i.e., unidirectional) data transfer and is robust against packet reordering, duplication, and corruption. RMTP also supports a simple congestion control strategy that modulates the sender's window to limit its transmission rate. Your assignment is to implement a RMTP sender and all of the required services.

Unlike TCP, RMTP ``preserves'' message boundaries and is thus not a logical byte stream. In addition, RMTP is asymmetric: the sender and the receiver are distinct entities that behave differently. Data flows in the forward direction and acknowledgements flow in the reverse direction. Data packets cannot be sent from the receiver to the sender and acknowledgements cannot be sent from the sender to the receiver.

Recall that UDP is stateless and unreliable. In order to implement reliability on top of UDP, an RMTP conversation requires connection state at both comminucation end-points. Hence, RMTP must be a connection-oriented protocol and must include phases for connection setup and teardown.

You will format RMTP packet headers on top of the simple UDP socket interface. In this approach, the RMTP sender ``fills in'' header fields in the first part of the UDP payload, sends the UDP packet over the network (via the socket interface) to the receiving host, which reads the UDP packet, parses the RMTP headers, and takes appropriate action. Normally, an arbitrary ``application'' invokes a protocol like RMTP, but for our purposes, we will simplify the configuration so that a single ``file transfer function'' is rather tightly integrated into our protocol implementation.

RMTP

The RMTP packet header layout is given in the adjacent figure. As with TCP, all RMTP packet headers look alike, but unlike TCP, we use a type field rather than flags to distinguish different packet types. The TYPE field says whether the packet is a data packet, acknowledge, SYN, FIN, or reset. The type codes are as follows:
        DATA = 1
        ACK = 2
        SYN = 3
        FIN = 4
        RST = 5
The WINDOW field is valid only for packets of type 2 (ACK) and indicates the receivers advertised window. This field must be set to 0 for all non-ACK packets.

The SEQNO field indicates the sequence number of the packet. For type 1 packets (DATA), the sequence number increases for each consecutive packet. For type 2 packets (ACK), the sequence number indicates the number of next packet expected by the receiver. For type 3 and 4 packets (SYN and FIN), the sequence number indicates the sequence number of the first data packet of the connection (SYN) or one past the last data packet of the connection (FIN). In other words, the SYN and FIN packets demarcate the sequence boundary of the connection incarnation.

The packet header along with a number of other useful definitions is included in rmtp.h. A sample C structure that define the RMTP header is as follows:

        struct rmtphdr {
                u_int32_t type;
                u_int32_t window;
                u_int32_t seqno;
        };
The maximum message size in RMTP is 1024 bytes. The actual message size can be smaller than this, but it is not explicitly carried in the RMTP because the lower layer protocol (i.e., IP and UDP) carries the packet length.

Note that we are ``borrowing'' the port fields from UDP. Rather than implement packet demultiplexing in RMTP, we simply let UDP carry out demultiplexing and implement RMTP on top of this.

State Diagram

Because RMTP is connection-oriented, a handshake protocol must be devised to initiate connection setup and teardown. During connection setup, initial sequence numbers are exchanged.

The state diagram for RMTP is as follows:

Note that the asymmetry of the protocol leads to separate finite state machines for the sender and the receiver.

Initially, the receiver is in the CLOSED state. Upon issuing a (passive) open, it enters the LISTEN state, and at that point, waits for a SYN packet to arrive (on the specified UDP port). When the SYN arrives, the receiver ACKs the SYN and transitions to the EST (established) state. In this state, the data transfer ensues under the normal sliding window protocol with congestion control. Upon reception of a FIN packet, the receiver ACKs the FIN packet and enters the TIME_WAIT state. While in TIME_WAIT, the receiver ACKs retransmitted FIN packets but ignores all other packets. After two maximum segment lifetimes (MSLs), the receiver leaves TIME_WAIT and enters the CLOSED state. For this assignment, assume an MSL is 20 seconds.

As with the receiver, the sender starts in the CLOSED state. Upon issuing an (active) open, the source sends a SYN packet to the receiver (on the specified address and port) and enters the SYN_SENT state. The SYN packet's sequence number contains the initial sequence number (ISN) of the conversation. The receiver should be prepared to handle any initial sequence number represented in 32 bits. If the sender's SYN packet is not ACK'd within a reasonable amount of time, the SYN must be retransmitted. If the SYN is not ACK'd after three retransmission attempts, an error is returned to the application and a RST packet is sent to the destination address and port. If the SYN is acknowledged correctly (i.e., the ACK sequence number is ISN + 1), the sender enters the EST state and starts sending data packets. When the sending application is finished generating data, it issues a ``close'' operation on the RMTP layer. This causes the sender to enter the CLOSING state. At this point, there may or may not be unacknowledged data buffered at the sender. Once all the data has been transmitted and acknowledged, the sender sends a FIN packet and enters the FIN_WAIT state. The FIN packet's sequence number must be one more than the sequence number of the last data packet. Once the FIN packet is acknowledged,(the sequence number of the acknowldgement must be one larger than that of the FIN packet) the sender moves directly to the closed state.

If the protocol detects that the other side is misbehaving, it should reset the connection by sending a RST packet. The sequence number and window fields should be 0 in this type of packet. Note that the other side might appear to be misbehaving because a delayed or duplicate control packet (e.g., a SYN or FIN). In this case, the message should be gracefully ignored. If the message could only have come from another incarnation of the connection or a buggy implementation, the connection should in fact be reset. You can assume that the 2 MSL wait guarantees that multiple incarnatations do not overlap (even though you do not have to implement all the mechanism to guarantee this). For example, if a data packet arrives when the receiver is in the LISTEN state, then the source must be misbehaving because it should not send any data packets until its SYN packet is acknowledged, which would imply that the receiver is in the EST state (assuming separate connections do not overlap). But, if the receiver receives a SYN while in the EST state, then it should be ignored because it simply might be a delayed, duplicate packet.

Implementation

We will provide you with an implemention of the RMTP receiver process and it is your task to implement the sender. To use the protocol, you'll need to implement a very simple ``application programming interface'' (API) consisting of three routines: These procedures carry out the obvious functions: opening and establishing the connection, sending data over the connection, and closing, flushing, and tearing down the connection.

Normally, a protocol implementation consists of a number of concurrent processes, each executing some portion of the protocol, and all synchronizing together to effect the distributed protocol. To simplify matters, you should instead implement the sender protocol using a single thread of control within a "polling loop" inside rmtp_send. Feel free to implement this as you see fit, but one possible decomposition for rmtp_send is as follows:

If we assume that rmtp_send is continuously invoked, then the data transfer will proceed as we expect. Eventually, rmtp_close is called, and the protocol should finish processing any outstanding data then initiate connection teardown. You might want to start from our skeleton sender-template.c (rename to sender.c) and use the input file testInput

You should use our Makefile to build two programs: our reference program called ``receiver'' and the program you must write called ``sender''. They should both take exactly three command line arguments consisting of a destination address (a), a destination port (p), and a source port (q). The respective process (either sender or receiver) transmits all RMTP packets to the IP destination ``a'' using UDP destination port ``p'' and source port ``q'', and receives and RMTP packet sent to it from the IP host with address ``a'' on UDP port ``q''. The sender should use RMTP to transfer the contents of a called ``testInput'' (in the current directory) to the receiver. The receiver stores the output in a file called ``testOutput''. You can diff the contents of the two files to make sure your implementation functions correctly (at least in part).

Your sender protocol must include a congestion control algorithm. For this assignment, you should implement slow-start and congestion avoidance (additive increase, multiplicative decrease) in the spirit of TCP. A simple scheme is sufficient. You need not worry about more advanced techniques like fast retransmit, fast recovery, or selective ACKs.

To recap, your assignment is to convert sender-template.c into sender.c and implement all of the necessary functions. You can build receiver from all of the parts available via the Assignment #1 WWW page. You'll need these 5 files:

Just type ``make receiver''. Once you've created sender.c from sender-template.c, you can run ``make sender'' to build the sender application.

Simplifications

Even a simple ARQ protocol like RMTP requires a fair amount of work, so we will make a number of additional simplifying assumptions:

Sequence Number Comparisons

Because sequence numbers wrap, you must be prepared to compare their value in modulo arithmetic. For example, 0xffffffff is two sequence numbers before 1 (for 32-bit sequence numbers), but an unsigned integer comparison would give the wrong result. The following macros, from rmtp.h, can be used to compare sequence numbers in a fairly portable way:
        #define SEQ_LT(a,b)     ((int)((a)-(b)) < 0)
        #define SEQ_LEQ(a,b)    ((int)((a)-(b)) <= 0)
        #define SEQ_GT(a,b)     ((int)((a)-(b)) > 0)
        #define SEQ_GEQ(a,b)    ((int)((a)-(b)) >= 0)
You might consult receiver.c to see how these macros are utilized therein.

Byte Order

Different computer systems represent multi-byte integers using different conventions for byte order. In ``little-endian'' machines, for example, the least significant byte appears at the lowest memory address, while ``big-endian'' machines using the opposite convention. Because of this, care must be exercised when sending multi-byte packet fields over the network. In particular, agreement must be on what format and byte-order convention is adopted for network transmission. Most often, the big-endian convention is adopted and little-endian machines must perform format conversion when reading and writing network data. In order to write portable code, a set of system functions/macros has been standardized that do nothing on big-endian machines and expand to the proper conversion code on little-endian machines. Look at the man pages for ntohl, htonl, ntohs, and htons for more details.

Testing

Debugging and testing your protocol will not be easy. Designing a complete set of tests cases is part of the assignment. Some hints are described below.

You should probably do most of your initial testing in a local environment without actually going over the network. You can do this easily by sending packets to ``localhost'', which gets directed to a special software interface called the ``loopback interface''. All packets sent to the loopback interface are looped back to the local host. Consequently, you can test your program by running the receiver in one window or command shell:

        % ./receiver localhost 3001 3000
and the sender in another window or command shell:
        % ./sender localhost 3000 3001
Note that the port numbers are swapped in the too cases so that the two programs plug together in the natural fashion.

You'll probably need to do debugging, so be prepared to run the above programs from within a debugger like gdb. For reference, seegdb1and gdb2. Once you have your program running, you'll need to find a way to generate an interesting network environment where congestion occurs in a well defined fashion, where packet losses and duplicates sometimes happen. The easiest suggestion is to add into the receiver functions to randomly drop packets (data or ACK packets), duplicate them, delay them, etc. You can be sure that we will test your program to see that your protocol functions correctly.

What to turn in

After you implement, debug, and throroughly verify your protocol implementation, you should submit your program via the turnin program before midnight on the due date. Your program should build against our Makefile and the available other source files. Your grade will be based on the actual code (which should include comments for full credit) as well as the resulting behavior. We will test your sender against a more elaborate version of our receiver (which might drop, duplicate, or otherwise corrupt the packet stream). We will also check that you've implemented a sound congestion control strategy.
 

How to turn in

Use turnin tool to turn in your homework. From any GSL or CSIL machines, type:
promtp>   turnin  hw1@cs276   files-or-directories
 

Hints

Start early. Writing and debugging distributed protocols is hard! This assignment should give you an appreciation for the difficulty of protocol and congestion control implementation. Good luck!