CS290I Lecture notes -- Threads

  • Rich Wolski
  • Lecture notes: http://www.cs.ucsb.edu/~rich/class/cs290I-grid/Threads/index.html
  • For code examples, use this makefile.
  • Solaris versions of the binaries and the source examples are in /cs/faculty/rich/public_html/class/cs290I-grid/notes/Threads
    In this lecture, we'll discuss threads in general, and Posix threads in particular. A great deal of the discussion and most of the code examples given here owes their origin to Dr. James Plank at the University of Tennessee whose lecture notes on the subject are something to behold. My take on things is somewhat different, but you should visit his teaching page for an alternative view.

    What is a thread?

    It sounds strange to ask this question since the term gets bandied about so much, particularly since languages like Java declared "threads" to be a fundamental programming abstraction. However, despite their popularity, a universal definition remains elusive. My definition is fairly simple and will suffice for this class, but caveat emptor.

    A thread consists of four subcomponents:

    Since the definition of these subcomponents is not obvious (at least to me) they warrant some elucidation as well.

    A program counter in this context refers to a logical index into some body of executable code that is incremented sequentially after each instruction except instructions that modify it explicitly (e.g. jump and branch instructions). That is, the program counter determines the instructions a thread will execute.

    Variables that can be accessed only by the thread are called local state. Some definitions of "thread" refer to this local state as "a stack" since it is often implemented using stack frames in the compiler sense. Logically, though, stacks don't really enter into the picture since they are really an implementation methodology. What is key, however, is that the variable names must be scoped locally. That is, different threads can each have their own version of a variable, but each of those threads must be able to refer to that variable by the same name.

    Variables that can be accessed by more than one thread are typically called global state but even this definition is somewhat problematic. Technically, global state need not be completely global. It is possible to devise a perfectly good thread model in which variable shared between threads are not shared by all threads (i.e. globally). The protected notion implemented in some object-oriented languages embodies this more complex sharing relationship. A better name for these variables, then, is shared state but because of the scoping rules used by C, I may slip and occasionally refer to this component as global state.

    Threads must be able to control, deterministically, the program counters of other threads. If they cannot, it is not possible for the programmer to use more than one of them to accomplish a single task. The term synchronization primitive is used to denote this capability because when it is exercised, the programmer is entitle to reason that the thread program counters have all reached known locations in the program (i.e. they have all reached a pre-defined "time" in the life of the program).

    Implicit in this definition is the notion that multiple threads are, in some way, cooperating to execute a single "program." This is, in fact, a necessary implication in my opinion as long as the term "program" is defined rather loosely to be "a set of operations desired by the programmer or user." Obviously, we could go on all day about whether there is one programmer or many, etc. We'll stop here, though, and simply point out that if you are using a threaded model, presumably you are doing so because you want to use more than one thread, and the threads you are using should cooperate.

    What are They Good For?

    state encapsulation -- The primary reason to use threads is that they encapsulate and localize program state. The general programming methodology is to decompose the problem into independent tasks that must be synchronized at well-defined points, and then to assign a thread to each task. The state necessary to accomplish each task is held (in a well-written threaded program) in local state, and shared state is used to communicate only the data that is necessary between threads.

    parallelism -- Since the threads are logically independent in between synchronization points, they may be executed in parallel if multiple CPUs are available without violating the semantics of the program. In fact, good sequential thread programming practice is to imagine that each thread is executing on its own processor concurrently for the purposes of synchronization.

    asynchrony -- The state encapsulation capability inherent in the threaded model makes programming for asynchronous events (e.g messages) much easier. The technique here is to create a thread that is responsible for "handling" a particular event, and then to use the synchronization mechanisms that are present to allow the thread to wait for the event to occur. When the thread wakes up, all of its local state is in tact and unchanged by any other thread. The local state, then, carries the information necessary to service the event throughout the life of the thread.

    To see how this all works in concrete terms, think of writing a server that is to handle requests sent to it via messages from potentially independent clients. The simplest way to write this application is to have one thread field all incoming messages and dispatch them, one each, to other threads executing in the program. Each dispatched thread, then, is responsible for handling the message it has been given.

    processes -- All of this may become a little clearer when we get to the concrete discussion of POSIX threads below, but before we do, it is probably best to spend a little time discussing threads and Unix in the abstract. In particular, at first blush, threads and Unix processes seem to have the same purpose. Under Unix, a process is a program counter, and address space, and some synchronization primitives, so what are the differences? Semantically (as opposed to implementationally) the chief difference is that threads are able to share variables, but processes (in their standard definition) cannot. When a Unix process forks, the child gets a complete copy of the parent (logically, only the process ID is different after the fork), but the copies are completely disjoint. A change to a global variable in the child does not affect any of the variables, global or local, in the parent.

    Another useful way to view the relationship between processes and threads is to think of a process as being a collection of resources (CPUs, address space, file descriptors, etc.) and threads being the "active" execution entities within the process. A Unix process then, is really a process with a single thread executing inside it.


    POSIX Threads

    POSIX threads (pthreads) are an implementation of threads that have been defined as a standard by the IEEE. There are many references on the web describing how to use them (like this one), some good, some bad. From our perspective, however, pthreads is the most portable thread implementation available for Unix. You might get an argument out of the Java community over this last statement, but as of today, I stand by it. Just for completeness, it is important to realize that most commercial Unix vendors (Sun, SGI, DEC, HP, etc.) have or have had a customized threads package that work only on their respective operating system. Pthreads, however, is available for all of them.

    To make use of pthreads in your program, you need to have the following include directive:

    #include < pthread.h >
    
    And you have to link libpthread.a to your object files (unless you are using a version of Linux that includes Posix threads as part of the standard C library. In this case, you must still add a "-lpthread" command to your load line).
    UNIX> cc -c main.c
    UNIX> cc -o main main.o -lpthread
    
    You can use gcc too. There's a lot of junk in the pthread library. You can read about it in the various man pages. Start with ``man pthreads'' (Solaris only). The two basic primitives defined above are the following in Posix threads:
         int pthread_create(pthread_t *new_thread_ID,
                            const pthread_attr_t *attr,
                            void * (*start_func)(void *), 
                            void *arg);
    
         int pthread_join(pthread_t target_thread, 
                          void **status);
    
    The pthread_create() command creates a thread that will begin executing by calling the function start_func and passing it a single void *arg as an argument. The attr argument allows the creating thread to influence some of the characteristics of how the system will treat the created thread (e.g. how it will be scheduled in a pre-emptive system), and the first argument pthread_t *new_thread_ID is an out parameter that returns a an identifier that can be used to address the thread in the future. The return code indicates success or failure (0 on success).

    The second command, pthread_join(), is a synchronization primitive. The calling thread specified the thread ID (returned from pthread_create()) of a thread for which the caller wishes to wait. While the thread specified in the argument exists, the calling thread will block. When the "target" thread (as it is called in the man page -- you did read the man page, right?) terminates, the calling thread is unblocked. The argument status is an out parameter returning the value that comes back when the target thread exits (allow the target thread to pass back a return value to any thread waiting). Only one thread may be waiting to join with a given target thread at a time. If a thread is already waiting, a second call to pthread_join() by another thread will terminate with an error.

    In all the Posix threads, calls, an integer is returned. If zero, everything went ok. Otherwise, an error has occurred. As with system calls, it is always good to check the return values of these calls to see if there has been an error. In my code here in the lecture notes, I'll omit error checking, but it is in the files, and you should do it.

    How does a thread exit? By calling return or pthread_exit().

    Ok, so check out the following program (in hw.c):

    
    /*
     * hw.c -- hello world with posix threads
     */
    
    #include 
    #include 
    
    void *printme(void *arg)
    {
      printf("Hello world\n");
      return NULL;
    }
    
    main()
    {
      pthread_t tcb;
      void *status;
    
      if (pthread_create(&tcb, NULL, printme, NULL) != 0) {
        perror("pthread_create");
        exit(1);
      }
      if (pthread_join(tcb, &status) != 0) { perror("pthread_join"); exit(1); }
    
    }
    
    Try copying hw.c to your home area, compiling it, and running it. It should print out ``Hello world''.
    For most if not all of the code examples I will present, I will have a makefile in the directory associated with the lecture. You may have to modify this makefile to fit your environment (I do a lot of development under Linux while the department's machines are primarily Solaris) but I'm assuming that you are proficient enough with makefiles so that this will not be a problem.

    Forking multiple threads

    Now, look at print4.c. This forks off 4 threads that print out ``Hi. I'm thread n'', where n is the TCB identifier. Notice that this might be an integer or an address, depending on the implementation, but that it doesn't matter which. The TCB is the unique "name" of the thread within your program. This should give you a good idea of how the pthread library works. Feel free to play with this library to get a feeling for how a thread system works. Since Unix is not multithreaded, and since your machines are not multiprocessors, the threads don't get you any extra performance. It just lets you use threads as an approximation of parallelism and as a way of keeping track of program state.

    Here's the output of print4.c when run on Solaris:

    Hi.  I'm thread 1
    I'm 1 Trying to join with thread 4
    Hi.  I'm thread 4
    Hi.  I'm thread 5
    Hi.  I'm thread 6
    Hi.  I'm thread 7
    1 Joined with thread 4
    I'm 1 Trying to join with thread 5
    1 Joined with thread 5
    I'm 1 Trying to join with thread 6
    1 Joined with thread 6
    I'm 1 Trying to join with thread 7
    1 Joined with thread 7
    
    So what happened is the following. The main() program got control after forking off the four threads. It called pthread_join for thread 4 and blocked. Then thread 4 got control, printed its line, and exited. Next came threads 5, 6 and 7. When they finished, the main() thread got control again and since thread 4 was done, its pthread_join() call returned. Then it made the pthread_join() calls for threads 5, 6 and 7, all of which returned since these threads were done. When main() returns, all the threads are done, and the program exits. Two things to note. The main program is implicitly, itself, a thread. Notice that thread 1 was never created -- only threads 4, 5, 6, and 7 (due to the four pthread_create calls). Secondly, notice that the Solaris version just happens to skip 2 and 3 for reasons it is not obligated to tell us. Under Red Hat Linux, the following output is generated from the same program:
    Hi.  I'm thread 1025
    Hi.  I'm thread 2050
    Hi.  I'm thread 1024
    I'm 1024 Trying to join with thread 1025
    1024 Joined with thread 1025
    I'm 1024 Trying to join with thread 2050
    1024 Joined with thread 2050
    I'm 1024 Trying to join with thread 3075
    Hi.  I'm thread 3075
    1024 Joined with thread 3075
    I'm 1024 Trying to join with thread 4100
    Hi.  I'm thread 4100
    1024 Joined with thread 4100
    
    This output also illustrates another point. The thread IDs do not have to be low values, monotonically increasing integers. The system is free to use whatever representation it wishes to implement pthread_t, although most systems will be using some form of unsigned integer in practice. Do not count on that fact, however.

    exit() vs pthread_exit()

    In pthreads there are two things you should know about thread/program termination. The first is that pthread_exit() makes a thread exit, but keeps the task alive, while exit() terminates the task. If all threads (and the main() program should be considered a thread) have terminated, then the task terminates. So, look at p4a.c.

    Here, all threads, including the main() program exit with pthread_exit(). You'll see that the output is the same as print4. Notice, however, that the main thread cannot call printme() and get the same output since printme() calls pthread_exit(). p4b.c illustrates what happens when we replace the printf statement at line 37 with a call to printme() which contains a pthread_exit(). The output (for Solaris) is:

    Hi.  I'm thread 1
    Hi.  I'm thread 4
    Hi.  I'm thread 5
    Hi.  I'm thread 6
    Hi.  I'm thread 7
    
    You'll note that none of the "Joining" lines were printed out because the main thread had exited. However, the other threads ran just fine, and the program terminated when all the threads had exited.

    The second thing you need to know is that when a forked thread returns from its initial calling procedure (e.g. printme() in print4.c, then that is the same as calling pthread_exit(). However, if the main() thread returns, then that is the same as calling exit(), and the task dies. That's why there is no output in p4c.c. All threads when the main thread exits, but they haven't run yet. When the main thread returns, the task is terminated, and thus the threads do not run.

    Finally, look at p4d.c. Here, the threads call exit() instead of pthread_exit(). You'll note that the output is:

    Hi.  I'm thread 1
    I'm 1 Trying to join with thread 4
    Hi.  I'm thread 4
    
    This is because the task is terminated by thread 4's exit() call.

    A word about Linux: your mileage may vary. If you run these programs under different versions of Linux you may see different output because Linux pthreads decides to run the main thread at a different time than the Solaris version. Since the threads are assumed to execute in parallel, there is no prescribed order that the threads must execute in -- each implementation is free to choose. If you are having trouble, stick with Solaris as it is easiest to understand.


    Preemption versus non-preemption

    Now, take a look at iloop.c. Here, four threads are forked off, and then the main() thread goes into an infinite loop. When you execute it, you see nothing. Threads zero through 3 are never executed. This is because the threads system on our machines is non-preemptive. In other words, there is one CPU, and unless a thread voluntarily gives up the CPU (via a blocking call line pthread_join or by terminating), it will retain the CPU. In a preemptive system, threads may be interrupted and rescheduled at any time, and iloop will actually have threads 0 through 3 print out their id's (although the program will never terminate). There are some machines that have multiple CPU's attached to a single memory. These systems are by nature preemptive, since different threads will actually execute on different CPU's. Such a machines will have threads 0 through 3 print out their id's (although the program will never terminate).

    We'll talk more about pre-emption in the next section, but iloop.c illustrates an important point regarding the implementation of pthreads and what you can assume to be true from system to system. If you run the program under Solaris, you get no output. Here is the output of the program when it is run on my Debian Linux laptop:

    Hi.  I'm thread 1026
    Hi.  I'm thread 2051
    Hi.  I'm thread 3076
    Hi.  I'm thread 4101
    
    (the program does not terminate, of course). Why are they different? We'll see, in the next section, that the reason is that Linux creates threads as pre-emptable by default (notice that in the call to pthread_create() the attr argument is NULL). However, there is another possible explanation. When a call to pthread_create() is made, it is possible that the calling thread yields the processor to allow another, non-pre-emptable thread to run. Once running, the thread cannot be pre-empted, but it is up to the implementation as to whether there is an explicit yield built in. It turns out that under Linux, I can't figure out how to create anything but a pre-emptive thread. That won't be true on all implementations, however, so we learn two things from this experiment. First, we learn that "standard" doesn't actually mean "standard." Secondly, if we wish to write portable thread code, we must write assuming pre-emption and consider non-pre-emption a degenerate case.

    Pre-emption

    It turns out that there are two kinds of threads in Solaris: user-level threads, and system-level threads. User-level threads exist solely in the running process -- they have no operating system support. That means that if a program has many user-level threads, it looks the same to the operating system as a ``normal'' Unix program with just one thread. In Solaris, user-level threads are non-preemptive. In other words, when a thread is running, it will not be interrupted by another user-level thread unless it voluntarily blocks, through a call such as pthread_exit() or pthread_join().

    When one thread stops executing and another starts, we call that a thread context switch. To restate the above then, user level threads only context switch when they voluntarily block. If you think about it, you can implement thread context switching with setjmp()/longjmp(). What this means is that you don't need the operating system in order to do thread context switching. This in turn means that context switching between user-level threads can be very fast, since there are no system calls involved.

    So what is a system-level thread? It is a unit of execution as seen by the operating system. Standard non-threaded Unix programs are each managed by a separate system-level thread. The operating system performs time-slicing by periodically interrupting the system-level thread that is currently running, saving its state, and running a different system-level thread. This is how you can have multiple programs running simultaneously. Such an action is also called context switching.

    When you call pthread_create() under Solaris, you create a new user-level thread that is managed by the same system-level thread as the calling thread. These two threads will run non-preemptively in relation to each other. In fact, whenever a collection of user-level threads is serviced by the same system-level thread, they all run non-preemptively in relation to each other.

    Take a look at preempt1.c. This is a program that forks off two threads, each of which runs an infinite loop. When you run it under Solaris:

    UNIX> preempt1
    thread 0.  i =          0
    thread 0.  i =          1
    thread 0.  i =          2
    thread 0.  i =          3
    thread 0.  i =          4
    thread 0.  i =          5
    ...
    
    You'll see that only thread 0 runs. (If you can't kill this with control-c, go into another window and kill the process with the kill command). The reason that thread 1 never runs is that thread 0 never voluntarily gives up the CPU. This is called starvation.

    Note that under Linux, the default scheduling behavior supports pre-emption. If you run the same program under Red Hat, you get

    thread 0.  i =          0
    thread 1.  i =          0
    thread 0.  i =          1
    thread 1.  i =          1
    thread 1.  i =          2
    thread 0.  i =          2
    thread 0.  i =          3
    thread 1.  i =          3
    thread 0.  i =          4
    thread 1.  i =          4
    thread 1.  i =          5
    thread 0.  i =          5
    ...
    
    Both threads are running. Notice that they don't necessarily alternate.

    Now, you can explicitly bind different user-level threads to different system-level threads. This means that if one user-level thread is running, then at some point the operating system will interrupt it and run another user-level thread. This is because the two user-level threads are bound to different system level threads (which is the default behavior under Linux but not under Solaris).

    One way to bind a user-level thread to a different system level thread is to call pthread_create() in a different way. Look at preempt2.c. You'll see that you give an ``attribute'' to pthread_create() that says ``create this thread with a different system-level thread.'' Now when you run it on a Solaris system, you'll see that the two threads interleave -- every now and then, the running thread is preempted, and the other thread gets to run:

    UNIX> preempt2
    thread 0.  i =          0
    thread 1.  i =          0
    thread 0.  i =          1
    thread 1.  i =          1
    thread 1.  i =          2
    thread 0.  i =          2
    thread 0.  i =          3
    thread 1.  i =          3
    thread 0.  i =          4
    thread 1.  i =          4
    thread 0.  i =          5
    thread 1.  i =          5
    
    Note that this is the same output as preempt1.c on a Linux system.

    Now, here's the tricky part. If a thread makes a blocking system call, then if there are other user-level threads bound to the same system-level thread, a new system-level thread is created and the blocking thread is bound to it. What this does is let the other user-level threads run while the thread is blocked. This state of affairs is true for both the Solaris and Linux implementations.

    So, look at preempt3.c. First, you should see that the threads are created as user-level threads bound to the same system-level thread. Next, you'll see that the thread 0 first reads a character from standard input before beginning its loop. This is a blocking system call. Therefore, it results this threads being bound to a separate system threads from the main thread and thread 1. Therefore, while it blocks, thread 1 can run. Go ahead and run it:

    UNIX> preempt3
    Thread 0: stopping to read
    thread 1.  i =          0
    thread 1.  i =          1
    thread 1.  i =          2
    thread 1.  i =          3
    ..
    
    So, thread 0 is blocked, and thread 1 is running. They are thus bound to separate system threads. Now, type RETURN, and thread 0 will start up again, and you'll see that they interleave as in preempt2:
    ...
    thread 1.  i =          3
                                    ( RETURN was typed here )
    Thread 0: Starting up again
    thread 0.  i =          0
    thread 1.  i =          4
    thread 0.  i =          1
    thread 1.  i =          5
    thread 0.  i =          2
    thread 1.  i =          6
    thread 0.  i =          3
    ...
    
    That's user/system level threads and preemption in a nutshell. Go over these examples again if you are confused. If you are not getting what you expect from Linux, revert to Solaris as the Sun implementation uses more standard default scheduling settings.

    Race conditions and mutexes

    Look at race1.c. Its usage is
    race1 nthreads stringsize iterations
    
    This is a pretty simple program. The command line arguments call for the user to specify the number of threads, a string size and a number of iterations. Then the program does the following. It allocates an array of stringsize+1 characters (the +1 accounts for the null terminator). Then it forks off nthreads threads, passing each thread its id, the number of iterations, and the character array. Each thread is a user-level thread, so threads are non-preemptive. Now each thread loops for the specified number of iterations. At each iteration, it fills in the character array with one character -- thread 0 uses 'A', thread 1 uses 'B' and so on. At the end of an iteration, the thread prints out the character array. So, if we call it with the arguments 4, 4, 1, we'd expect the following output, and indeed that is what we get:
    UNIX> race1 4 4 1
    Thread 0: AAAA
    Thread 1: BBBB
    Thread 2: CCCC
    Thread 3: DDDD
    
    Similarly, the following make sense:
    UNIX> race1 4 4 2
    Thread 0: AAAA
    Thread 0: AAAA
    Thread 1: BBBB
    Thread 1: BBBB
    Thread 2: CCCC
    Thread 2: CCCC
    Thread 3: DDDD
    Thread 3: DDDD
    UNIX> race1 4 30 2
    Thread 0: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    Thread 0: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    Thread 1: BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
    Thread 1: BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
    Thread 2: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
    Thread 2: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
    Thread 3: DDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
    Thread 3: DDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
    
    Now, look at race2.c. The only difference here is that the threads are all bound to different system-level threads. This means that they may be preempted.

    Look at the output of the same calls to race2:

    UNIX> race2 4 4 1
    Thread 0: AAAA
    Thread 1: BBBB
    Thread 2: CCCC
    Thread 3: DDDD
    
    This looks the same as before, but what's wrong with this picture?
    UNIX> race2 4 4 2
    Thread 0: AAAA
    Thread 0: AAAA
    Thread 1: BBBB
    Thread 1: BBBB
    Thread 2: CCCC
    Thread 3: DDDD
    Thread 3: DDDD
    Thread 2: DDDD
    
    Or this one?
    UNIX> race2 2 40 1
    Thread 0: BBBBBBBBBBBBBBBBBBBBBBBBBBBBAAAAAAAAAAAA
    Thread 1: BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
    UNIX> 
    

    What is happening is that threads can be preempted anywhere. In particular, they may be preempted while they are filling in the string buffer (which is shared among all threads -- see how 's' is passed to each thread in race2.c) which means that another thread can modify s, and then when the original thread actually calls the printf() statement, the values of s are not what the thread thought they were.

    These kinds of bugs or race conditions are extremely difficult to debug. Consider the output from

    UNIX> race2 2 40 2
    Thread 2: CCCCCCCCCCCCCCBBBBAAAAAAAAAAACCCCCCCCCCC
    Thread 0: CCCCCCCCCCCCCCBBBBAAAAAAAABBBBBBBAAAAAAA
    Thread 1: AAACCCCCCCCCCCCCCAAAAAAAAABBBBBBBAAAAAAB
    Thread 1: DDDDDDDDDDDDDBBBBBBBBBBBBBBBBBBBBBBBBBBB
    Thread 3: DDDDDDDDDDDDDBBDDDDDDDAAAAAAAAAAADDDDDDD
    Thread 2: DDDDDDDDDDDDDBBDDDDDDDAAAAAAAAACCCCCCCCC
    Thread 0: DDDDDDDDDDDDDDDDDDDDDDDDDDDAAAACCAAAAAAA
    Thread 3: DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
    UNIX>
    
    In the code, the array s is always filled from left to right. Assume that printf() is atomic (cannot be interrupted) and that the lines come out in the order shown. How is this pattern possible given that the last write is what is shown. The line:
    Thread 2: CCCCCCCCCCCCCCBBBBAAAAAAAAAAACCCCCCCCCCC
    
    can be explained as "A" gets written first out to the last "B" and then thread 0 stops. Then "B" is written out to the first "B" by thread 1. Thread 2 now runs and writes Cs all the way through s. Thread 0 wakes, writes some As, and gets pre-empted by thread 1, which writes some Bs, and then thread 2 prints s. This all makes sense until
    Thread 1: AAACCCCCCCCCCCCCCAAAAAAAAABBBBBBBAAAAAAB
    
    which is printed after the previous line. How did the As wind up at the beginning?

    I'm not really sure of the answer, but I suspect it has to do with printf() and whether/when it copies its arguments completely onto the stack. We won't go into this any farther. Let it suffice to say that even for a simple program, figuring out the behavior of race conditions can be somewhat difficult.

    In our race program, we can fix the race condition by enforcing that no thread can be interrupted by another thread when it is modifying and printing s. This can be done with a mutex, sometimes called a ``lock'' or sometimes a ``binary semaphore.'' There are three procedures for dealing with mutexes in pthreads:

    pthread_mutex_init(pthread_mutex_t *mutex, NULL);
    pthread_mutex_lock(pthread_mutex_t *mutex);
    pthread_mutex_unlock(pthread_mutex_t *mutex);
    
    You create a mutex with pthread_mutex_init(). Then any thread may lock or unlock the mutex. When a thread locks the mutex, no other thread may lock it. If they call pthread_mutex_lock() while the thread is locked, then they will block until the thread is unlocked. Only one thread may lock the mutex at a time.

    So, we fix the race program with race3.c. You'll notice that a thread locks the mutex just before modifying s and it unlocks the mutex just after printing s. This fixes the program so that the output makes sense:

    UNIX> race3 4 4 1
    Thread 0: AAA
    Thread 1: BBB
    Thread 2: CCC
    Thread 3: DDD
    UNIX> race3 4 4 2
    Thread 0: AAA
    Thread 0: AAA
    Thread 2: CCC
    Thread 2: CCC
    Thread 1: BBB
    Thread 1: BBB
    Thread 3: DDD
    Thread 3: DDD
    UNIX> race3 4 70 1
    Thread 0: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    Thread 1: BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
    Thread 2: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
    Thread 3: DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
    UNIX> race3 10 70 100 > output3
    

    Condition Variables

    Locks, implemented as "mutexes" in pthreads, ensure regions of mutual exclusion, but it is frequently true that more complicated forms of synchronization are desirable. For example, consider how you might modify race3.c so that the threads must execute round robin, starting with thread 0. You can do this with mutexes, but it is cumbersome, as is evidenced by rr_mutex.c. Here is the code:
    
    /*
     * this code is a modification of race3.c to use mutexes to ensure
     * round robin scheduling
     */
    
    #include 
    #include 
    
    int MyTurn = 0;
    
    typedef struct {
      pthread_mutex_t *lock;
      int id; 
      int size;
      int iterations;
      char *s;
      int nthreads;
    } Thread_struct;
    
    void *infloop(void *x)
    {
      int i, j, k;
      Thread_struct *t;
     
      t = (Thread_struct *) x;
    
      for (i = 0; i < t->iterations; i++) {
        /*
         * don't try this at home
         */
    
        pthread_mutex_lock(t->lock);
        while((MyTurn % t->nthreads) != t->id)
        {
    	    pthread_mutex_unlock(t->lock);	/* give it up */
    	    pthread_mutex_lock(t->lock);	/* get it again */
        }
    
        for (j = 0; j < t->size-1; j++) {
          t->s[j] = 'A'+t->id;
          for(k=0; k < 50000; k++);		/* delay loop */
        }
        t->s[j] = '\0';
        printf("Thread %d: %s\n", t->id, t->s);
        MyTurn++;
        pthread_mutex_unlock(t->lock);
      }
    }
    
    main(int argc, char **argv)
    {
      pthread_mutex_t lock;
      pthread_t *tid;
      pthread_attr_t *attr;
      Thread_struct *t;
      void *retval;
      int nthreads, size, iterations, i;
      char *s;
    
      if (argc != 4) {
        fprintf(stderr, "usage: race nthreads stringsize iterations\n");
        exit(1);
      }
    
      pthread_mutex_init(&lock, NULL);
      nthreads = atoi(argv[1]);
      size = atoi(argv[2]);
      iterations = atoi(argv[3]);
    
      tid = (pthread_t *) malloc(sizeof(pthread_t) * nthreads);
      attr = (pthread_attr_t *) malloc(sizeof(pthread_attr_t) * nthreads);
      t = (Thread_struct *) malloc(sizeof(Thread_struct) * nthreads);
      s = (char *) malloc(sizeof(char *) * size);
    
      for (i = 0; i < nthreads; i++) {
        t[i].nthreads = nthreads;
        t[i].id = i;
        t[i].size = size;
        t[i].iterations = iterations;
        t[i].s = s;
        t[i].lock = &lock;
        pthread_attr_init(attr+i);
        pthread_attr_setscope(attr+i, PTHREAD_SCOPE_SYSTEM);
        pthread_create(tid+i, attr+i, infloop, t+i);
      }
      for (i = 0; i < nthreads; i++) {
        pthread_join(tid[i], &retval);
      }
    }
    
    Note that both the increment and test of MyTurn take place while one thread is in the mutual exclusion region, so there is no race condition for the counter. Why would there be otherwise?

    relies on fairness -- The boldfaced while loop directs each thread to wait its turn assuming that the implementation schedules threads fairly. If it does not, this code will deadlock if a thread, constantly grabbing and releasing the lock, starves the other threads out.

    efficiency -- Even if the implementation is fair, the threads that are waiting to fill the s buffer "burn" their timeslice by executing nothing but the lock-test-unlock sequence. A more efficient synchronization primitive would allow a thread to block without consuming time slices until it is released by another thread.

    Pthreads solves the fairness and efficiency problems using a synchronization abstraction known as a condition variable. A condition variable allows a thread to

    Here are the relevant calls:
    int  pthread_cond_init(pthread_cond_t  *cond, pthread_cond_attr_t *cond_attr);
    int  pthread_cond_wait(pthread_cond_t  *cond, pthread_mutex_t *mutex);
    int  pthread_cond_signal(pthread_cond_t *cond);
    
    The first call initializes a condition variable (the attribute field, I will let you read about). The second takes the condition variable and a mutex lock as arguments. It is expected that the caller will have successfully acquired the specified lock. When pthread_cond_wait() is called, the calling thread is put to sleep and the lock specified as the second argument is released. When a different thread calls pthread_cond_signal() one of the threads waiting on the condition variable is selected and reawakened. It then re-acquires the lock so when it returns from pthread_cond_wait() it, once again holds the lock.

    The utility of these semantics can be a little obscure until you've used them a bit. Consider the code in rr_condvar.c:

    
    /*
     * CS170: rr_condvar.c
     * this code is a modification of race3.c to use condition variables to ensure
     * round robin scheduling using condition variables
     */
    
    #include < unistd.h >
    #include < stdlib.h >
    #include < stdio.h >
    #include < pthread.h >
    
    int MyTurn = 0;
    
    typedef struct 
    {
    	pthread_mutex_t *lock;
    	pthread_cond_t *wait;
    	int id; 
    	int size;
    	int iterations;
    	char *s;
    	int nthreads;
    } Thread_struct;
    
    void *infloop(void *x)
    {
    	int i, j, k;
    	Thread_struct *t;
     
    	t = (Thread_struct *) x;
    
    	for (i = 0; i < t->iterations; i++) 
    	{
    		/*
    		 * do try this at home
    		 */
    
    		pthread_mutex_lock(t->lock);
    		while((MyTurn % t->nthreads) != t->id)
        		{
    	    		pthread_cond_wait(t->wait,t->lock);
        		}
    
    		for (j = 0; j < t->size-1; j++) 
    		{
    			t->s[j] = 'A'+t->id;
    			for(k=0; k < 50000; k++);	/* delay loop */
        		}
    		t->s[j] = '\0';
    		printf("Thread %d: %s\n", t->id, t->s);
    		MyTurn++;
    
    		pthread_cond_broadcast(t->wait);
    		pthread_mutex_unlock(t->lock);
    
    	}
    
    	return(NULL);
    }
    
    int
    main(int argc, char **argv)
    {
    	pthread_mutex_t lock;
    	pthread_cond_t wait;
    	pthread_t *tid;
    	pthread_attr_t *attr;
    	Thread_struct *t;
    	void *retval;
    	int nthreads, size, iterations, i;
    	char *s;
    
    	if (argc != 4) 
    	{
    		fprintf(stderr, "usage: race nthreads stringsize iterations\n");
    		exit(1);
    	}
    
    	pthread_mutex_init(&lock, NULL);
    	pthread_cond_init(&wait, NULL);
    	nthreads = atoi(argv[1]);
    	size = atoi(argv[2]);
    	iterations = atoi(argv[3]);
    
    	tid = (pthread_t *) malloc(sizeof(pthread_t) * nthreads);
    	attr = (pthread_attr_t *) malloc(sizeof(pthread_attr_t) * nthreads);
    	t = (Thread_struct *) malloc(sizeof(Thread_struct) * nthreads);
    	s = (char *) malloc(sizeof(char *) * size);
    
    	for (i = 0; i < nthreads; i++) 
    	{
    		t[i].nthreads = nthreads;
    		t[i].id = i;
    		t[i].size = size;
    		t[i].iterations = iterations;
    		t[i].s = s;
    		t[i].lock = &lock;
    		t[i].wait = &wait;
    		pthread_attr_init(&(attr[i]));
    		pthread_attr_setscope(&(attr[i]), PTHREAD_SCOPE_SYSTEM);
    		pthread_create(&(tid[i]), &(attr[i]), infloop, (void *)&(t[i]));
      	}
    	for (i = 0; i < nthreads; i++) 
    	{
    		pthread_join(tid[i], &retval);
    	}
    
    	return(0);
    }
    
    
    
    Here, each thread blocks on its own condition variable until it is signalled by another thread to continue. Notice that each thread waits while holding the lock. If pthread_cond_wait() did not release the lock, the code would deadlock as soon as a thread tried to acquire the critical section out of order.

    how much more efficient? -- Here is an example of the performance difference: rr_mutex 40 30 3 ran in 53.3 seconds on ella.cs.ucsb.edu, while rr_condvar 40 30 3 took only 5.1 seconds. I'm not exactly sure why the difference is that big, but there you have it. Condition variables are your friend.