A thread consists of four subcomponents:
A program counter in this context refers to a logical index into some body of executable code that is incremented sequentially after each instruction except instructions that modify it explicitly (e.g. jump and branch instructions). That is, the program counter determines the instructions a thread will execute.
Variables that can be accessed only by the thread are called local state. Some definitions of "thread" refer to this local state as "a stack" since it is often implemented using stack frames in the compiler sense. Logically, though, stacks don't really enter into the picture since they are really an implementation methodology. What is key, however, is that the variable names must be scoped locally. That is, different threads can each have their own version of a variable, but each of those threads must be able to refer to that variable by the same name.
Variables that can be accessed by more than one thread are typically called global state but even this definition is somewhat problematic. Technically, global state need not be completely global. It is possible to devise a perfectly good thread model in which variable shared between threads are not shared by all threads (i.e. globally). The protected notion implemented in some object-oriented languages embodies this more complex sharing relationship. A better name for these variables, then, is shared state but because of the scoping rules used by C, I may slip and occasionally refer to this component as global state.
Threads must be able to control, deterministically, the program counters of other threads. If they cannot, it is not possible for the programmer to use more than one of them to accomplish a single task. The term synchronization primitive is used to denote this capability because when it is exercised, the programmer is entitle to reason that the thread program counters have all reached known locations in the program (i.e. they have all reached a pre-defined "time" in the life of the program).
Implicit in this definition is the notion that multiple threads are, in some way, cooperating to execute a single "program." This is, in fact, a necessary implication in my opinion as long as the term "program" is defined rather loosely to be "a set of operations desired by the programmer or user." Obviously, we could go on all day about whether there is one programmer or many, etc. We'll stop here, though, and simply point out that if you are using a threaded model, presumably you are doing so because you want to use more than one thread, and the threads you are using should cooperate.
state encapsulation -- The primary reason to use threads is that they encapsulate and localize program state. The general programming methodology is to decompose the problem into independent tasks that must be synchronized at well-defined points, and then to assign a thread to each task. The state necessary to accomplish each task is held (in a well-written threaded program) in local state, and shared state is used to communicate only the data that is necessary between threads.
parallelism -- Since the threads are logically independent in between synchronization points, they may be executed in parallel if multiple CPUs are available without violating the semantics of the program. In fact, good sequential thread programming practice is to imagine that each thread is executing on its own processor concurrently for the purposes of synchronization.
asynchrony -- The state encapsulation capability inherent in the threaded model makes programming for asynchronous events (e.g messages) much easier. The technique here is to create a thread that is responsible for "handling" a particular event, and then to use the synchronization mechanisms that are present to allow the thread to wait for the event to occur. When the thread wakes up, all of its local state is in tact and unchanged by any other thread. The local state, then, carries the information necessary to service the event throughout the life of the thread.
To see how this all works in concrete terms, think of writing a server that is to handle requests sent to it via messages from potentially independent clients. The simplest way to write this application is to have one thread field all incoming messages and dispatch them, one each, to other threads executing in the program. Each dispatched thread, then, is responsible for handling the message it has been given.
processes -- All of this may become a little clearer when we get to the concrete discussion of POSIX threads below, but before we do, it is probably best to spend a little time discussing threads and Unix in the abstract. In particular, at first blush, threads and Unix processes seem to have the same purpose. Under Unix, a process is a program counter, and address space, and some synchronization primitives, so what are the differences? Semantically (as opposed to implementationally) the chief difference is that threads are able to share variables, but processes (in their standard definition) cannot. When a Unix process forks, the child gets a complete copy of the parent (logically, only the process ID is different after the fork), but the copies are completely disjoint. A change to a global variable in the child does not affect any of the variables, global or local, in the parent.
Another useful way to view the relationship between processes and threads is to think of a process as being a collection of resources (CPUs, address space, file descriptors, etc.) and threads being the "active" execution entities within the process. A Unix process then, is really a process with a single thread executing inside it.
POSIX threads (pthreads) are an implementation of threads that have been defined as a standard by the IEEE. There are many references on the web describing how to use them (like this one), some good, some bad. From our perspective, however, pthreads is the most portable thread implementation available for Unix. You might get an argument out of the Java community over this last statement, but as of today, I stand by it. Just for completeness, it is important to realize that most commercial Unix vendors (Sun, SGI, DEC, HP, etc.) have or have had a customized threads package that work only on their respective operating system. Pthreads, however, is available for all of them.
To make use of pthreads in your program, you need to have the following include directive:
#include < pthread.h >And you have to link libpthread.a to your object files (unless you are using a version of Linux that includes Posix threads as part of the standard C library. In this case, you must still add a "-lpthread" command to your load line).
UNIX> cc -c main.c UNIX> cc -o main main.o -lpthreadYou can use gcc too. There's a lot of junk in the pthread library. You can read about it in the various man pages. Start with ``man pthreads'' (Solaris only). The two basic primitives defined above are the following in Posix threads:
int pthread_create(pthread_t *new_thread_ID,
const pthread_attr_t *attr,
void * (*start_func)(void *),
void *arg);
int pthread_join(pthread_t target_thread,
void **status);
The pthread_create() command creates a thread that will begin
executing by calling the
function start_func and passing it a single void *arg as an
argument. The attr argument allows the creating thread to influence
some of the characteristics of how the system will treat the created thread
(e.g. how it will be scheduled in a pre-emptive system), and the first
argument pthread_t *new_thread_ID is an out parameter that returns a
an identifier that can be used to address the thread in the future. The return
code indicates success or failure (0 on success).
The second command, pthread_join(), is a synchronization primitive. The calling thread specified the thread ID (returned from pthread_create()) of a thread for which the caller wishes to wait. While the thread specified in the argument exists, the calling thread will block. When the "target" thread (as it is called in the man page -- you did read the man page, right?) terminates, the calling thread is unblocked. The argument status is an out parameter returning the value that comes back when the target thread exits (allow the target thread to pass back a return value to any thread waiting). Only one thread may be waiting to join with a given target thread at a time. If a thread is already waiting, a second call to pthread_join() by another thread will terminate with an error.
In all the Posix threads, calls, an integer is returned. If zero, everything went ok. Otherwise, an error has occurred. As with system calls, it is always good to check the return values of these calls to see if there has been an error. In my code here in the lecture notes, I'll omit error checking, but it is in the files, and you should do it.
How does a thread exit? By calling return or pthread_exit().
Ok, so check out the following program (in hw.c):
/* * hw.c -- hello world with posix threads */ #includeTry copying hw.c to your home area, compiling it, and running it. It should print out ``Hello world''.#include void *printme(void *arg) { printf("Hello world\n"); return NULL; } main() { pthread_t tcb; void *status; if (pthread_create(&tcb, NULL, printme, NULL) != 0) { perror("pthread_create"); exit(1); } if (pthread_join(tcb, &status) != 0) { perror("pthread_join"); exit(1); } }
Here's the output of print4.c when run on Solaris:
Hi. I'm thread 1 I'm 1 Trying to join with thread 4 Hi. I'm thread 4 Hi. I'm thread 5 Hi. I'm thread 6 Hi. I'm thread 7 1 Joined with thread 4 I'm 1 Trying to join with thread 5 1 Joined with thread 5 I'm 1 Trying to join with thread 6 1 Joined with thread 6 I'm 1 Trying to join with thread 7 1 Joined with thread 7So what happened is the following. The main() program got control after forking off the four threads. It called pthread_join for thread 4 and blocked. Then thread 4 got control, printed its line, and exited. Next came threads 5, 6 and 7. When they finished, the main() thread got control again and since thread 4 was done, its pthread_join() call returned. Then it made the pthread_join() calls for threads 5, 6 and 7, all of which returned since these threads were done. When main() returns, all the threads are done, and the program exits. Two things to note. The main program is implicitly, itself, a thread. Notice that thread 1 was never created -- only threads 4, 5, 6, and 7 (due to the four pthread_create calls). Secondly, notice that the Solaris version just happens to skip 2 and 3 for reasons it is not obligated to tell us. Under Red Hat Linux, the following output is generated from the same program:
Hi. I'm thread 1025 Hi. I'm thread 2050 Hi. I'm thread 1024 I'm 1024 Trying to join with thread 1025 1024 Joined with thread 1025 I'm 1024 Trying to join with thread 2050 1024 Joined with thread 2050 I'm 1024 Trying to join with thread 3075 Hi. I'm thread 3075 1024 Joined with thread 3075 I'm 1024 Trying to join with thread 4100 Hi. I'm thread 4100 1024 Joined with thread 4100This output also illustrates another point. The thread IDs do not have to be low values, monotonically increasing integers. The system is free to use whatever representation it wishes to implement pthread_t, although most systems will be using some form of unsigned integer in practice. Do not count on that fact, however.
Here, all threads, including the main() program exit with pthread_exit(). You'll see that the output is the same as print4. Notice, however, that the main thread cannot call printme() and get the same output since printme() calls pthread_exit(). p4b.c illustrates what happens when we replace the printf statement at line 37 with a call to printme() which contains a pthread_exit(). The output (for Solaris) is:
Hi. I'm thread 1 Hi. I'm thread 4 Hi. I'm thread 5 Hi. I'm thread 6 Hi. I'm thread 7You'll note that none of the "Joining" lines were printed out because the main thread had exited. However, the other threads ran just fine, and the program terminated when all the threads had exited.
The second thing you need to know is that when a forked thread returns from its initial calling procedure (e.g. printme() in print4.c, then that is the same as calling pthread_exit(). However, if the main() thread returns, then that is the same as calling exit(), and the task dies. That's why there is no output in p4c.c. All threads when the main thread exits, but they haven't run yet. When the main thread returns, the task is terminated, and thus the threads do not run.
Finally, look at p4d.c. Here, the threads call exit() instead of pthread_exit(). You'll note that the output is:
Hi. I'm thread 1 I'm 1 Trying to join with thread 4 Hi. I'm thread 4This is because the task is terminated by thread 4's exit() call.
A word about Linux: your mileage may vary. If you run these programs under different versions of Linux you may see different output because Linux pthreads decides to run the main thread at a different time than the Solaris version. Since the threads are assumed to execute in parallel, there is no prescribed order that the threads must execute in -- each implementation is free to choose. If you are having trouble, stick with Solaris as it is easiest to understand.
We'll talk more about pre-emption in the next section, but iloop.c illustrates an important point regarding the implementation of pthreads and what you can assume to be true from system to system. If you run the program under Solaris, you get no output. Here is the output of the program when it is run on my Debian Linux laptop:
Hi. I'm thread 1026 Hi. I'm thread 2051 Hi. I'm thread 3076 Hi. I'm thread 4101(the program does not terminate, of course). Why are they different? We'll see, in the next section, that the reason is that Linux creates threads as pre-emptable by default (notice that in the call to pthread_create() the attr argument is NULL). However, there is another possible explanation. When a call to pthread_create() is made, it is possible that the calling thread yields the processor to allow another, non-pre-emptable thread to run. Once running, the thread cannot be pre-empted, but it is up to the implementation as to whether there is an explicit yield built in. It turns out that under Linux, I can't figure out how to create anything but a pre-emptive thread. That won't be true on all implementations, however, so we learn two things from this experiment. First, we learn that "standard" doesn't actually mean "standard." Secondly, if we wish to write portable thread code, we must write assuming pre-emption and consider non-pre-emption a degenerate case.
When one thread stops executing and another starts, we call that a thread context switch. To restate the above then, user level threads only context switch when they voluntarily block. If you think about it, you can implement thread context switching with setjmp()/longjmp(). What this means is that you don't need the operating system in order to do thread context switching. This in turn means that context switching between user-level threads can be very fast, since there are no system calls involved.
So what is a system-level thread? It is a unit of execution as seen by the operating system. Standard non-threaded Unix programs are each managed by a separate system-level thread. The operating system performs time-slicing by periodically interrupting the system-level thread that is currently running, saving its state, and running a different system-level thread. This is how you can have multiple programs running simultaneously. Such an action is also called context switching.
When you call pthread_create() under Solaris, you create a new user-level thread that is managed by the same system-level thread as the calling thread. These two threads will run non-preemptively in relation to each other. In fact, whenever a collection of user-level threads is serviced by the same system-level thread, they all run non-preemptively in relation to each other.
Take a look at preempt1.c. This is a program that forks off two threads, each of which runs an infinite loop. When you run it under Solaris:
UNIX> preempt1 thread 0. i = 0 thread 0. i = 1 thread 0. i = 2 thread 0. i = 3 thread 0. i = 4 thread 0. i = 5 ...You'll see that only thread 0 runs. (If you can't kill this with control-c, go into another window and kill the process with the kill command). The reason that thread 1 never runs is that thread 0 never voluntarily gives up the CPU. This is called starvation.
Note that under Linux, the default scheduling behavior supports pre-emption. If you run the same program under Red Hat, you get
thread 0. i = 0 thread 1. i = 0 thread 0. i = 1 thread 1. i = 1 thread 1. i = 2 thread 0. i = 2 thread 0. i = 3 thread 1. i = 3 thread 0. i = 4 thread 1. i = 4 thread 1. i = 5 thread 0. i = 5 ...Both threads are running. Notice that they don't necessarily alternate.
Now, you can explicitly bind different user-level threads to different system-level threads. This means that if one user-level thread is running, then at some point the operating system will interrupt it and run another user-level thread. This is because the two user-level threads are bound to different system level threads (which is the default behavior under Linux but not under Solaris).
One way to bind a user-level thread to a different system level thread is to call pthread_create() in a different way. Look at preempt2.c. You'll see that you give an ``attribute'' to pthread_create() that says ``create this thread with a different system-level thread.'' Now when you run it on a Solaris system, you'll see that the two threads interleave -- every now and then, the running thread is preempted, and the other thread gets to run:
UNIX> preempt2 thread 0. i = 0 thread 1. i = 0 thread 0. i = 1 thread 1. i = 1 thread 1. i = 2 thread 0. i = 2 thread 0. i = 3 thread 1. i = 3 thread 0. i = 4 thread 1. i = 4 thread 0. i = 5 thread 1. i = 5Note that this is the same output as preempt1.c on a Linux system.
Now, here's the tricky part. If a thread makes a blocking system call, then if there are other user-level threads bound to the same system-level thread, a new system-level thread is created and the blocking thread is bound to it. What this does is let the other user-level threads run while the thread is blocked. This state of affairs is true for both the Solaris and Linux implementations.
So, look at preempt3.c. First, you should see that the threads are created as user-level threads bound to the same system-level thread. Next, you'll see that the thread 0 first reads a character from standard input before beginning its loop. This is a blocking system call. Therefore, it results this threads being bound to a separate system threads from the main thread and thread 1. Therefore, while it blocks, thread 1 can run. Go ahead and run it:
UNIX> preempt3 Thread 0: stopping to read thread 1. i = 0 thread 1. i = 1 thread 1. i = 2 thread 1. i = 3 ..So, thread 0 is blocked, and thread 1 is running. They are thus bound to separate system threads. Now, type RETURN, and thread 0 will start up again, and you'll see that they interleave as in preempt2:
...
thread 1. i = 3
( RETURN was typed here )
Thread 0: Starting up again
thread 0. i = 0
thread 1. i = 4
thread 0. i = 1
thread 1. i = 5
thread 0. i = 2
thread 1. i = 6
thread 0. i = 3
...
That's user/system level threads and preemption in a nutshell. Go over these
examples again if you are confused. If you are not getting what you
expect from Linux, revert to Solaris as the Sun implementation uses
more standard default scheduling settings.
race1 nthreads stringsize iterationsThis is a pretty simple program. The command line arguments call for the user to specify the number of threads, a string size and a number of iterations. Then the program does the following. It allocates an array of stringsize+1 characters (the +1 accounts for the null terminator). Then it forks off nthreads threads, passing each thread its id, the number of iterations, and the character array. Each thread is a user-level thread, so threads are non-preemptive. Now each thread loops for the specified number of iterations. At each iteration, it fills in the character array with one character -- thread 0 uses 'A', thread 1 uses 'B' and so on. At the end of an iteration, the thread prints out the character array. So, if we call it with the arguments 4, 4, 1, we'd expect the following output, and indeed that is what we get:
UNIX> race1 4 4 1 Thread 0: AAAA Thread 1: BBBB Thread 2: CCCC Thread 3: DDDDSimilarly, the following make sense:
UNIX> race1 4 4 2 Thread 0: AAAA Thread 0: AAAA Thread 1: BBBB Thread 1: BBBB Thread 2: CCCC Thread 2: CCCC Thread 3: DDDD Thread 3: DDDD UNIX> race1 4 30 2 Thread 0: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Thread 0: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Thread 1: BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB Thread 1: BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB Thread 2: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC Thread 2: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC Thread 3: DDDDDDDDDDDDDDDDDDDDDDDDDDDDDD Thread 3: DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDNow, look at race2.c. The only difference here is that the threads are all bound to different system-level threads. This means that they may be preempted.
Look at the output of the same calls to race2:
UNIX> race2 4 4 1 Thread 0: AAAA Thread 1: BBBB Thread 2: CCCC Thread 3: DDDDThis looks the same as before, but what's wrong with this picture?
UNIX> race2 4 4 2 Thread 0: AAAA Thread 0: AAAA Thread 1: BBBB Thread 1: BBBB Thread 2: CCCC Thread 3: DDDD Thread 3: DDDD Thread 2: DDDDOr this one?
UNIX> race2 2 40 1 Thread 0: BBBBBBBBBBBBBBBBBBBBBBBBBBBBAAAAAAAAAAAA Thread 1: BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB UNIX>
What is happening is that threads can be preempted anywhere. In particular, they may be preempted while they are filling in the string buffer (which is shared among all threads -- see how 's' is passed to each thread in race2.c) which means that another thread can modify s, and then when the original thread actually calls the printf() statement, the values of s are not what the thread thought they were.
These kinds of bugs or race conditions are extremely difficult to debug. Consider the output from
UNIX> race2 2 40 2 Thread 2: CCCCCCCCCCCCCCBBBBAAAAAAAAAAACCCCCCCCCCC Thread 0: CCCCCCCCCCCCCCBBBBAAAAAAAABBBBBBBAAAAAAA Thread 1: AAACCCCCCCCCCCCCCAAAAAAAAABBBBBBBAAAAAAB Thread 1: DDDDDDDDDDDDDBBBBBBBBBBBBBBBBBBBBBBBBBBB Thread 3: DDDDDDDDDDDDDBBDDDDDDDAAAAAAAAAAADDDDDDD Thread 2: DDDDDDDDDDDDDBBDDDDDDDAAAAAAAAACCCCCCCCC Thread 0: DDDDDDDDDDDDDDDDDDDDDDDDDDDAAAACCAAAAAAA Thread 3: DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD UNIX>In the code, the array s is always filled from left to right. Assume that printf() is atomic (cannot be interrupted) and that the lines come out in the order shown. How is this pattern possible given that the last write is what is shown. The line:
Thread 2: CCCCCCCCCCCCCCBBBBAAAAAAAAAAACCCCCCCCCCCcan be explained as "A" gets written first out to the last "B" and then thread 0 stops. Then "B" is written out to the first "B" by thread 1. Thread 2 now runs and writes Cs all the way through s. Thread 0 wakes, writes some As, and gets pre-empted by thread 1, which writes some Bs, and then thread 2 prints s. This all makes sense until
Thread 1: AAACCCCCCCCCCCCCCAAAAAAAAABBBBBBBAAAAAABwhich is printed after the previous line. How did the As wind up at the beginning?
I'm not really sure of the answer, but I suspect it has to do with printf() and whether/when it copies its arguments completely onto the stack. We won't go into this any farther. Let it suffice to say that even for a simple program, figuring out the behavior of race conditions can be somewhat difficult.
In our race program, we can fix the race condition by enforcing that no thread can be interrupted by another thread when it is modifying and printing s. This can be done with a mutex, sometimes called a ``lock'' or sometimes a ``binary semaphore.'' There are three procedures for dealing with mutexes in pthreads:
pthread_mutex_init(pthread_mutex_t *mutex, NULL); pthread_mutex_lock(pthread_mutex_t *mutex); pthread_mutex_unlock(pthread_mutex_t *mutex);You create a mutex with pthread_mutex_init(). Then any thread may lock or unlock the mutex. When a thread locks the mutex, no other thread may lock it. If they call pthread_mutex_lock() while the thread is locked, then they will block until the thread is unlocked. Only one thread may lock the mutex at a time.
So, we fix the race program with race3.c. You'll notice that a thread locks the mutex just before modifying s and it unlocks the mutex just after printing s. This fixes the program so that the output makes sense:
UNIX> race3 4 4 1 Thread 0: AAA Thread 1: BBB Thread 2: CCC Thread 3: DDD UNIX> race3 4 4 2 Thread 0: AAA Thread 0: AAA Thread 2: CCC Thread 2: CCC Thread 1: BBB Thread 1: BBB Thread 3: DDD Thread 3: DDD UNIX> race3 4 70 1 Thread 0: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Thread 1: BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB Thread 2: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC Thread 3: DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD UNIX> race3 10 70 100 > output3
/* * this code is a modification of race3.c to use mutexes to ensure * round robin scheduling */ #includeNote that both the increment and test of MyTurn take place while one thread is in the mutual exclusion region, so there is no race condition for the counter. Why would there be otherwise?#include int MyTurn = 0; typedef struct { pthread_mutex_t *lock; int id; int size; int iterations; char *s; int nthreads; } Thread_struct; void *infloop(void *x) { int i, j, k; Thread_struct *t; t = (Thread_struct *) x; for (i = 0; i < t->iterations; i++) { /* * don't try this at home */ pthread_mutex_lock(t->lock); while((MyTurn % t->nthreads) != t->id) { pthread_mutex_unlock(t->lock); /* give it up */ pthread_mutex_lock(t->lock); /* get it again */ } for (j = 0; j < t->size-1; j++) { t->s[j] = 'A'+t->id; for(k=0; k < 50000; k++); /* delay loop */ } t->s[j] = '\0'; printf("Thread %d: %s\n", t->id, t->s); MyTurn++; pthread_mutex_unlock(t->lock); } } main(int argc, char **argv) { pthread_mutex_t lock; pthread_t *tid; pthread_attr_t *attr; Thread_struct *t; void *retval; int nthreads, size, iterations, i; char *s; if (argc != 4) { fprintf(stderr, "usage: race nthreads stringsize iterations\n"); exit(1); } pthread_mutex_init(&lock, NULL); nthreads = atoi(argv[1]); size = atoi(argv[2]); iterations = atoi(argv[3]); tid = (pthread_t *) malloc(sizeof(pthread_t) * nthreads); attr = (pthread_attr_t *) malloc(sizeof(pthread_attr_t) * nthreads); t = (Thread_struct *) malloc(sizeof(Thread_struct) * nthreads); s = (char *) malloc(sizeof(char *) * size); for (i = 0; i < nthreads; i++) { t[i].nthreads = nthreads; t[i].id = i; t[i].size = size; t[i].iterations = iterations; t[i].s = s; t[i].lock = &lock; pthread_attr_init(attr+i); pthread_attr_setscope(attr+i, PTHREAD_SCOPE_SYSTEM); pthread_create(tid+i, attr+i, infloop, t+i); } for (i = 0; i < nthreads; i++) { pthread_join(tid[i], &retval); } }
relies on fairness -- The boldfaced while loop directs each thread to wait its turn assuming that the implementation schedules threads fairly. If it does not, this code will deadlock if a thread, constantly grabbing and releasing the lock, starves the other threads out.
efficiency -- Even if the implementation is fair, the threads that are waiting to fill the s buffer "burn" their timeslice by executing nothing but the lock-test-unlock sequence. A more efficient synchronization primitive would allow a thread to block without consuming time slices until it is released by another thread.
Pthreads solves the fairness and efficiency problems using a synchronization abstraction known as a condition variable. A condition variable allows a thread to
int pthread_cond_init(pthread_cond_t *cond, pthread_cond_attr_t *cond_attr); int pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex); int pthread_cond_signal(pthread_cond_t *cond);The first call initializes a condition variable (the attribute field, I will let you read about). The second takes the condition variable and a mutex lock as arguments. It is expected that the caller will have successfully acquired the specified lock. When pthread_cond_wait() is called, the calling thread is put to sleep and the lock specified as the second argument is released. When a different thread calls pthread_cond_signal() one of the threads waiting on the condition variable is selected and reawakened. It then re-acquires the lock so when it returns from pthread_cond_wait() it, once again holds the lock.
The utility of these semantics can be a little obscure until you've used them a bit. Consider the code in rr_condvar.c:
/*
* CS170: rr_condvar.c
* this code is a modification of race3.c to use condition variables to ensure
* round robin scheduling using condition variables
*/
#include < unistd.h >
#include < stdlib.h >
#include < stdio.h >
#include < pthread.h >
int MyTurn = 0;
typedef struct
{
pthread_mutex_t *lock;
pthread_cond_t *wait;
int id;
int size;
int iterations;
char *s;
int nthreads;
} Thread_struct;
void *infloop(void *x)
{
int i, j, k;
Thread_struct *t;
t = (Thread_struct *) x;
for (i = 0; i < t->iterations; i++)
{
/*
* do try this at home
*/
pthread_mutex_lock(t->lock);
while((MyTurn % t->nthreads) != t->id)
{
pthread_cond_wait(t->wait,t->lock);
}
for (j = 0; j < t->size-1; j++)
{
t->s[j] = 'A'+t->id;
for(k=0; k < 50000; k++); /* delay loop */
}
t->s[j] = '\0';
printf("Thread %d: %s\n", t->id, t->s);
MyTurn++;
pthread_cond_broadcast(t->wait);
pthread_mutex_unlock(t->lock);
}
return(NULL);
}
int
main(int argc, char **argv)
{
pthread_mutex_t lock;
pthread_cond_t wait;
pthread_t *tid;
pthread_attr_t *attr;
Thread_struct *t;
void *retval;
int nthreads, size, iterations, i;
char *s;
if (argc != 4)
{
fprintf(stderr, "usage: race nthreads stringsize iterations\n");
exit(1);
}
pthread_mutex_init(&lock, NULL);
pthread_cond_init(&wait, NULL);
nthreads = atoi(argv[1]);
size = atoi(argv[2]);
iterations = atoi(argv[3]);
tid = (pthread_t *) malloc(sizeof(pthread_t) * nthreads);
attr = (pthread_attr_t *) malloc(sizeof(pthread_attr_t) * nthreads);
t = (Thread_struct *) malloc(sizeof(Thread_struct) * nthreads);
s = (char *) malloc(sizeof(char *) * size);
for (i = 0; i < nthreads; i++)
{
t[i].nthreads = nthreads;
t[i].id = i;
t[i].size = size;
t[i].iterations = iterations;
t[i].s = s;
t[i].lock = &lock;
t[i].wait = &wait;
pthread_attr_init(&(attr[i]));
pthread_attr_setscope(&(attr[i]), PTHREAD_SCOPE_SYSTEM);
pthread_create(&(tid[i]), &(attr[i]), infloop, (void *)&(t[i]));
}
for (i = 0; i < nthreads; i++)
{
pthread_join(tid[i], &retval);
}
return(0);
}
Here, each thread blocks on its own condition variable until it is signalled
by another thread to continue. Notice that each thread waits while holding
the lock. If pthread_cond_wait() did not release the lock, the code
would deadlock as soon as a thread tried to acquire the critical section out
of order.
how much more efficient? -- Here is an example of the performance difference: rr_mutex 40 30 3 ran in 53.3 seconds on ella.cs.ucsb.edu, while rr_condvar 40 30 3 took only 5.1 seconds. I'm not exactly sure why the difference is that big, but there you have it. Condition variables are your friend.