CS170: Project 2 - User Mode Thread Library (20% of project score)

Project Goals

The goals of this project are:

to understand the idea of threads
to implement independent, parallel execution within a process

Administrative Information

The project is an individual project. It is due on uesday, April 30, 2019, 23:59:59 PST (no deadline extensions or late turn ins).

Implement a basic thread subsystem in Linux user mode

The goal of this project is to implement a basic thread system for Linux. As discussed in class, threads are independent units of execution that run (virtually) in parallel in the address space of a single process (and thus, share the same heap memory, open files, process identifier, ...). Each thread has its own context, which consists of (a) the set of CPU registers and (b) a stack. The goal of a thread subsystem is to provide applications that want to use threads a set of library functions (an interface) that the application can use to create and start new threads, terminate threads, or manipulate threads in different ways.

The most well-known and wide-spread standard that defines the interface for threads on Unix-style operating systems is called POSIX threads (or pthreads). The pthreads interface defines a set of functions, a few of which we want to implement for this project. Of course, there are different ways in which the pthreads interface can be realized, and systems have implemented a pthreads subsystem both in the OS kernel and in user mode. For this project, we aim to implement a few pthreads functions in user mode (as a library) on Linux.

More specifically, for this project, we want to implement the following POSIX thread functions (prototypes and explanations partially taken from the respective man pages):

      int pthread_create(pthread_t *restrict thread, const pthread_attr_t *restrict attr, void *(*start_routine)(void*), void *restrict arg);

The pthread_create() function shall create a new thread within a process. Upon successful completion, pthread_create() shall store the ID of the created thread in the location referenced by thread. In our implementation, the second argument (attr) shall always be NULL. The thread is created executing start_routine with arg as its sole argument. If the start_routine returns, the effect shall be as if there was an implicit call to pthread_exit() using the return value of start_routine as the exit status. Note that the thread in which main() was originally invoked differs from this. When it returns from main(), the effect shall be as if there was an implicit call to exit() using the return value of main() as the exit status.

      void pthread_exit(void *value_ptr);

The pthread_exit() function shall terminate the calling thread. In our current implementation, we ignore the value passed in as the first argument (value_ptr) and clean up all information related to the terminating thread. The process shall exit with an exit status of 0 after the last thread has been terminated. The behavior shall be as if the implementation called exit() with a zero argument at thread termination time.

      pthread_t pthread_self(void);

The pthread_self() function shall return the thread ID of the calling thread.

The goal of this project is to implement the three functions introduced above. For more details about error handling, please refer to the respective man pages. The code for these three functions should be compiled into an object file called threads.o, which will act as our thread library. Note that your code will not contain a main routine, and hence, cannot run as a stand-alone executable. To create the thread library, simply call the compiler like this:

      gcc/g++ -c -o threads.o [list of your source files]

You should include the pthreads header file (#include <pthread.h>) in your source(s) so that you have the definitions of types such as pthread_t or pthread_attr_t available. The object file that was created can then be combined with a (third party) application that requires threads (and hence, calls the three functions that you have implemented in your library). This is done by running:

      gcc/g++ -o application [list of application object files] threads.o

To test your thread implementation, we will compile your thread library with our test application. This test application will call into your pthread functions and check whether threads are properly started and terminated.

Implementation

Implementing a user space thread library seems like a daunting task at first. This section contains a number of hints and directions on how you could approach the problem.

First, you will need a data structure that can store information about a single thread. This data structure will likely need to hold, at least, information about the state of the thread (its set of registers), information about its stack (e.g., a pointer to the thread's stack area), and information about the status of the thread (whether it is running, ready to run, or has exited). This data structure is often called a thread control block (TCB). Since there will be multiple threads active at the same time, the thread control blocks should be stored in a list or a table (array). To make it easier, you can assume that a program will not create more than 128 threads.

You will likely need a routine that initializes your thread subsystem. This init routine should be called when the application calls pthread_create for the first time. Before that, there is only one thread running (the main program), and there is not much you need to do.

Once multiple threads are running, you will need a way of switching between different threads. To do this, you could use the library functions setjmp and longjmp. In a nutshell, setjmp saves the current state of a thread into a jmp_buf structure. longjmp uses this structure to restore (jump back) to a previously saved state. You will find these functions very useful, since they do exactly what is needed when switching from one thread to another and later resuming this thread's execution. More precisely, you can have the currently executing thread call setjmp and save the result in the thread's TCB. Then, your thread subsystem can pick another thread, and use the saved jmp_buf of that thread together with longjmp to resume the execution of the new thread.

The process of picking another thread is called scheduling. In our system, we want to have a very simply scheduler. Just cycle through the available threads in a round robin fashion, giving an equal, fair share to each thread.

Following the discussion about setjmp and longjmp, there is one question that should immediately arise: How can we force the thread of a program to call setjmp every once in a while (and thus, giving up control of the CPU)? In particular, application developers do not need to know about your thread implementation, so it is unlikely that they will periodically call setjmp and allow your scheduler to switch to a new process. To solve this issue, we can make use of signals and alarms. More precisely, we can use the ualarm or the setitimer function to set up a periodic timer that sends a SIGALRM signal every X milliseconds (for this project, we choose X to be 50ms). Whenever the alarm goes of, the operating system will invoke the signal handler for SIGALRM. So, you can install your own, custom signal handler that performs the scheduling (switching between threads) for you. For installing this signal handler, you should use sigaction with the SA_NODEFER flag (read the man page for sigaction for details). Otherwise (e.g., when using the deprecated signal function, alarms are automatically blocked while you are running the signal handler. This is something that we clearly do not want for this project.

Note that we require that your thread system supports thread preemption and switches between multiple threads that are ready. It is not okay to run each individual thread to completion before giving the next one a chance to execute.

A final problem that needs to be addressed is how to create a new thread. We have previously discussed means to switch between two threads once they have been created. However, there must also be a mechanism to create a new thread "from scratch." To this end, the system has to properly initialize the TCB for the new thread. This means that a new thread ID needs to be created. In addition, the system has to allocate a new stack. This can be done easily using malloc. For this project, we want to allocate for each thread a stack of 32,767 bytes. The last step is to initialize the thread's state so that it "resumes" execution from the start function that is given as argument to the pthread_create function. For this, we could use setjmp to save the state of the current thread in a jmp_buf, and then, modify this jmp_buf in two important ways. First, we want to change the program counter (the EIP) to point to the start function. Second, we want the stack pointer (the ESP) to point to the top of our newly allocated stack.

To modify the jmp_buf directly, we have to first understand that it is a very operating system and processor family-specific data structure that is typically not modified directly. On the CSIL machines, we can see the definition of the jmp_buf here in this header file: /usr/include/bits/setjmp.h. Given that we work on a 64-bit machine on CSIL, the jmp_buf is defined as an array of 8 long (64-bit) integers. Moreover, libc (x86_64/jmpbuf-offsets.h) defines the following constants as the eight integer elements of this structure:

  #define JB_RBX   0
  #define JB_RBP   1
  #define JB_R12   2
  #define JB_R13   3
  #define JB_R14   4
  #define JB_R15   5
  #define JB_RSP   6
  #define JB_PC    7

We can see that the stack pointer (RSP) has index 6 and the program counter (PC) has index 7 into the jmp_buf. This allows us to easily write the new values for ESP and EIP into a jmp_buf. Unfortunately, there is a small complication on the Linux systems in the lab. These machines are equipped with a libc that includes a security feature to protect the addresses stored in jump buffers. This security feature "mangles" (i.e., encrypts) a pointer before saving it in a jmp_buf. Thus, we also have to mangle our new stack pointer and program counter before we can write it into a jump buffer, otherwise decryption (and subsequent uses) will fail. To mangle a pointer before writing it into the jump buffer, make use of the following function:

   static long int i64_ptr_mangle(long int p)
   {
        long int ret;
        asm(" mov %1, %%rax;\n"
            " xor %%fs:0x30, %%rax;"
            " rol $0x11, %%rax;"
            " mov %%rax, %0;"
        : "=r"(ret)
        : "r"(p)
        : "%rax"
        );
        return ret;
   }

We are almost there. Now, we just have to remember that the start routine of every new thread expects a single argument (a void pointer called arg). Also, when the start routine returns, it should perform an implicit jump to pthread_exit. We first have to understand how arguments are passed between functions. When you think back to your compiler class, you might remember that arguments were passed on the stack. Indeed, this was the way it was done on "old" 32-bit x86 machines. However, we now have 64-bit machines, and these machines have more registers. To improve the performance of function calls, compiler writers decided to leverage these additional registers. More specifically, the function calling convention was changed, and the first six arguments are passed in registers. Only the 7th argument and onwards are passed on the stack. This page provides some more details about this. Now, the question is how we can pass the required argument to the start function of the thread? One possible solution is to introduce a wrapper function. This wrapper function could grab the argument from the TCB and then invoke the actual start function, passing it the required argument. In addition, the wrapper function also provides a convenient way to handle the case when the start function returns and you have to call pthread_exit. For this, you can simply invoke pthread_exit after the thread returns from the previous call to the start function. When you use a wrapper function, you need to make the program counter (EIP) point to this function instead of the actual start function. And you need to find a way to make the address of the start function available to the wrapper.

Once your stack is initialized and the (mangled) stack pointer (ESP) and program counter (EIP) are written to the jmp_buf, your new thread is all ready to go, and when the next SIGALRM arrives, it is ready to be scheduled!

This is a complicated project. Take it one step at a time, and make sure that everything works before moving on. Also, the debugger (gdb) is your friend. Expect that the code will crash. At this point, use the debugger to understand what is going on. For this project, you might want to look at the gdb commands disassemble and stepi. For example, your code might crash when you invoke longjmp for the first time to switch to a new thread, and you don't understand why. To debug this problem, you could set a breakpoint in __longjmp (say yes when the debugger asks whether you want to "make breakpoint pending on future shared library load"). Since __longjmp is a libc library function, you don't have the source directly available to the debugger. However, you can issue the disassemble command to show the assembly code for this (short) function. And you can use stepi to step forward for a single machine (assembly) instruction. This allows you to check if you restore the correct register values and also see the address where you end up jumping to. You can use the commands info registers to list the content of all the CPU registers, and you can use x [address] to print out the value that is stored in the memory at [address]. This is nice when you want to see where your stack pointer points to.

Deliverables

Please follow the instructions below exactly!

We use gradescope to manage your project submissions and to communicate the results back to you. You will submit all files that are part of your project via the gradescope web interface.
All your files must be in a directory named threads. The name of the threads library that we will test must be threads.o, and the POSIX function implementation must be done in C/C++. Of course, you cannot leverage any of the existing pthread library code to implement your thread library.
All files that you need to build your library must be included (sources, headers, makefile) in that folder. We will just call make and expect that the object file threads.o is built from your sources. Please do not include any object or executable files.
Gradescope does support built-in autograding, but, currently, we do not intend to use it. Instead, we will test your projects in our own environment. So, do not worry if you don't get immediate feedback or if the system tells you that the autograder is not running.
Your project must compile on a CSIL machine. If you worked on a Windows machine or your laptop at home, then make sure it still works on CSIL or modify it appropriately!
Include a README with this project. Explain what you did in the README. If you had problems, tell us why and what.

Created by Christopher Kruegel (© 2008, using Apache Cocoon).