There are two kinds of threads in Solaris: user-level threads, and system-level threads. The distinction between the two is kind of confusing, but I'll try to enlighten you. User-level threads exist solely in the running process -- they have no operating system support. That is,
from the perspective of the operating system, a program containing user-level threads looks exactly the same as one that contains only a single thread.
If a program has many user-level threads, it still looks the same to the operating system as a ``normal'' Unix program with just one thread. If you find this confusing, just remember that this sentence is here. It will be come clear as we go along.
In Solaris, user-level threads are non-preemptive. In other words, when a thread is running, it will not be interrupted by another user-level thread unless it voluntarily blocks or exits, through a call such as pthread_exit() or pthread_join().
When one thread stops executing and another starts, we call that a thread context switch. To restate the above then, user level threads only context switch when they voluntarily block.
So what is a system-level thread? It is a unit of execution as seen by the operating system. Standard non-threaded Unix programs are each managed by a separate system-level thread. The operating system performs time-slicing by periodically interrupting the system-level thread that is currently running, saving its state, and running a different system-level thread. This is how you can have multiple programs running simultaneously. Such an action is also called context switching.
When you call pthread_create(), you create a new user-level thread that is managed by the same system-level thread as the calling thread. These two threads will run non-preemptively in relation to each other. In fact, whenever a collection of user-level threads is serviced by the same system-level thread, they all run non-preemptively in relation to each other. All of the programs in the previous threads lecture work in this way.
Let's look at a few more. First, look at preempt1.c. This is a program that forks off two threads, each of which runs an infinite loop. When you run it:
UNIX> preempt1 thread 0. i = 0 thread 0. i = 1 thread 0. i = 2 thread 0. i = 3 thread 0. i = 4 thread 0. i = 5 ...You'll see that only thread 0 runs. (If you can't kill this with control-c, go into another window and kill the process with the kill command). The reason that thread 1 never runs is that thread 0 never voluntarily gives up the CPU. This is called starvation.
Note that under Linux, the default scheduling behavior supports pre-emption. If you run the same program under Red Hat, you get
thread 0. i = 0 thread 1. i = 0 thread 0. i = 1 thread 1. i = 1 thread 1. i = 2 thread 0. i = 2 thread 0. i = 3 thread 1. i = 3 thread 0. i = 4 thread 1. i = 4 thread 1. i = 5 thread 0. i = 5 ...Both threads are running. Notice that they don't necessarily alternate.
Now, you can explicitly bind different user-level threads to different system-level threads. This means that if one user-level thread is running, then at some point the operating system will interrupt it and run another user-level thread. This is because the two user-level threads are bound to different system level threads (which is the default behavior under Linux but not under Solaris).
One way to bind a user-level thread to a different system level thread is to call pthread_create() in a different way. Look at preempt2.c. You'll see that you give an ``attribute'' to pthread_create() that says ``create this thread with a different system-level thread.'' Now when you run it on a Solaris system, you'll see that the two threads interleave -- every now and then, the running thread is preempted, and the other thread gets to run:
UNIX> preempt2 thread 0. i = 0 thread 1. i = 0 thread 0. i = 1 thread 1. i = 1 thread 1. i = 2 thread 0. i = 2 thread 0. i = 3 thread 1. i = 3 thread 0. i = 4 thread 1. i = 4 thread 0. i = 5 thread 1. i = 5Note that this is the same output as preempt1.c on a Linux system.
threads within threads If you are like me (and I know I am), this state of affairs sounds a little weird. On the one hand, you are probably familiar with running multiple Unix processes at the same time on the same machine, using an "&". The processes (while they are running) will interleave as the machine timeshares the CPU. Try it. In proc1.c is a very simple program that iterates 10,000 times printing its process ID and the iteration number. Try compiling and running two copies from the same shell in the background. You should see output from one for a while followed by output from the other.
So, look at preempt3.c. First, you should see that the threads are created as user-level threads bound to the same system-level thread. Next, you'll see that the thread 0 first reads a character from standard input before beginning its loop. This is a blocking system call. Therefore, it results this threads being bound to a separate system threads from the main thread and thread 1. Therefore, while it blocks, thread 1 can run. Go ahead and run it.
UNIX> preempt3 Thread 0: stopping to read thread 1. i = 0 thread 1. i = 1 thread 1. i = 2 thread 1. i = 3 ..
So, thread 0 is blocked, and thread 1 is running. They are thus bound to separate system threads. Now, type RETURN, and thread 0 will start up again, and you'll see that they interleave as in preempt2:
...
thread 1. i = 3
( RETURN was typed here )
Thread 0: Starting up again
thread 0. i = 0
thread 1. i = 4
thread 0. i = 1
thread 1. i = 5
thread 0. i = 2
thread 1. i = 6
thread 0. i = 3
...
That's user/system level threads and preemption in a nutshell. Go over these examples again if you are confused.
difference between a Unix process and a pthread You are probably wondering why it is we are spending all of this time talking about threads, with the example described above generates more or less the same results using Unix processes. "What are the differences?" you might ask. If you have made this observation and asked this question, consider yourself astute as it is an important question. Unix processes and pre-emptable Posix threads are similar in thay they both have individual program counters and local state, and they are both context switched by Unix. They are different in that Unix processes do not share variables or memory locations between themselves. That is, there is no shared state between Unix processes.
Consider the use of an ATM at a bank. Somewhere, in bowels of your bank's computer system, is a variable called "account balance" that stores your current balance. When you withdraw $200, there is a piece of assembly language code that runs on some machine that does the following calculation:
ld r1,@richs_balance sub r1,$200 st r1,@richs_balancewhich says (in a fictitious assembly language) "load the contents of rich's account_balance" into register r1, subtract 200 from it and leave the result in r1, and store the contents of r1 back to the variable rich's account_balance." The code is executed sequentially, in the order shown.
So far, so good.
Your bank is a busy place, though, and there are potentially millions of ATM transactions all at the same time, but each to a different account variable. So when Bob withdraws money, the machine executes
ld r2,@bobs_balance
sub r2,$200
st r2,@bobs_balance
and Fred's transactions look like
ld r3,@freds_balance
sub r3,$200
st r3,@freds_balance
In each case, the register and the variable are different.
Now, let's assume that the bank wants to use threads as a programming convenience, and that the programmer has chosen per-emptive threads as we have been discussing. Each set of instructions goes in its own thread
thread_0 thread_1 thread_2 ------- -------- -------- ld r1,@richs_balance ld r2,@bobs_balance ld r3,@freds_balance sub r1,$200 sub r2,$200 sub r3,$200 st r1,@richs_balance st r2,@bobs_balance st r3,@freds_balanceThe thing about pre-emptive thread is you don't know when pre-emption will take place. For example, thread_0 could start, execute two instructions, and suddenly be pre-empted for thread_1 which could be pre-empted for thread_2, and so on
ld r1,@richs_balance ;; thread_0
sub r1,$200 ;; thread_0
**** pre-empt! ****
ld r2,@bobs_balance ;; thread_1
sub r2,$200 ;; thread_1
**** pre-empt! ****
ld r3,@freds_balance ;; thread_2
sub r3,$200 ;; thread_2
**** pre-empt! ****
st r1,@richs_balance ;; thread_0
**** pre-empt! ****
st r2,@bobs_balance ;; thread_1
**** pre-empt! ****
st r3,@freds_balance ;; thread_2
In fact (and this is the part to get)
any interleaving of instructions that preserves the sequential order of each individual thread is legal and may occur.
The system cannot choose to rearrange the instructions within a thread but because threads can be pre-empted at any time all interleavings of the instructions are possible.
Again, in this example, there is no real problem (yet). It doesn't matter where you put the pre-empts or whether you leave them out -- the ATM system will function properly.
Now let's say you've thought about this for a good long while and you come up with a scheme. You get a good friend, you give them your ATM PIN number and a GPS synchronized watch, and you say "at exactly 12:00, withdraw $200." 12:00 rolls around and you and your friend both go to separate ATMs and simultaneously withdraw $200. Let's say, further, that you are lucky, your account contains $1000 to begin with, and that the bank's computers makes two threads:
thread_0 thread_1 ------- -------- ld r1,@richs_balance ld r2,@richs_balance sub r1,$200 sub r2,$200 st r1,@richs_balance st r2,@richs_balanceBecause you are lucky and you've gotten the bank to launch both threads at the same time, the following interleaving takes place
ld r1,@richs_balance ;; thread_0 *** pre-empt *** ld r2,@richs_balance ;; thread_1 *** pre-empt *** sub r1,$200 ;; thread_0 *** pre-empt *** sub r2,$200 ;; thread_1 *** pre-empt *** st r1,@richs_balance ;; thread_0 *** pre-empt *** st r2,@richs_balance ;; thread_1
What is the contents of richs_balance when both threads finish?
It should be $600, right? Both you and your friend withdrew $200 each from your $1000 balance. If this were the way things worked at your bank, however, richs_balance would be $800.
Why?
Look at what happens step by step. The first ld loads 1000 into r1. thread_0 gets pre-empted and thread_1 starts. It loads 1000 into r2. Then it gets pre-empted and thread_0 runs again. r1 (which contains 1000) is decremented by 200 so it now contains 800. Then thread_1 pre-empts thread_0 again, and r2 (which contains 1000 from the last load of r2) gets decremented by 200 leaving 800. Then thread_0 runs again and stores 800 into richs_balance. Then thread_1 runs again and stores 800 into richs_balance and the final value is 800.
This problem is called a race condition. It occurs when there is a legal ordering of instructions within threads that can make the desired outcome incorrect. Notice that there are lots of ways thread_0 and thread_1 could have interleaved in which the final value of richs_balance would have been $600 (the correct value). It is just that you are lucky (or you tried this trick enough so that the law of averages eventually worked out for you) to cause one of the $200 withdrawals to disappear.
race1 nthreads stringsize iterationsThis is a pretty simple program. The command line arguments call for the user to specify the number of threads, a string size and a number of iterations. Then the program does the following. It allocates an array of stringsize+1 characters (the +1 accounts for the null terminator). Then it forks off nthreads threads, passing each thread its id, the number of iterations, and the character array. Each thread is a user-level thread, so threads are non-preemptive. Now each thread loops for the specified number of iterations. At each iteration, it fills in the character array with one character -- thread 0 uses 'A', thread 1 uses 'B' and so on. At the end of an iteration, the thread prints out the character array. So, if we call it with the arguments 4, 4, 1, we'd expect the following output, and indeed that is what we get:
UNIX> race1 4 4 1 Thread 0: AAAA Thread 1: BBBB Thread 2: CCCC Thread 3: DDDDSimilarly, the following make sense:
UNIX> race1 4 4 2 Thread 0: AAAA Thread 0: AAAA Thread 1: BBBB Thread 1: BBBB Thread 2: CCCC Thread 2: CCCC Thread 3: DDDD Thread 3: DDDD UNIX> race1 4 30 2 Thread 0: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Thread 0: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Thread 1: BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB Thread 1: BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB Thread 2: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC Thread 2: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC Thread 3: DDDDDDDDDDDDDDDDDDDDDDDDDDDDDD Thread 3: DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDNow, look at race2.c. The only difference here is that I've inserted an artificial delay loop so that lucky happenstance is more likely.
Look at the output of the same calls to race2:
UNIX> race2 4 4 1 Thread 0: AAAA Thread 1: BBBB Thread 2: CCCC Thread 3: DDDDThis looks the same as before, but what's wrong with this picture?
UNIX> race2 4 4 2 Thread 0: AAAA Thread 0: AAAA Thread 1: BBBB Thread 1: BBBB Thread 2: CCCC Thread 3: DDDD Thread 3: DDDD Thread 2: DDDDOr this one?
UNIX> race2 2 40 1 Thread 0: BBBBBBBBBBBBBBBBBBBBBBBBBBBBAAAAAAAAAAAA Thread 1: BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB UNIX>
What is happening is that threads can be preempted anywhere. In particular, they may be preempted while they are filling in the string buffer (which is shared among all threads -- see how 's' is passed to each thread in race2.c) which means that another thread can modify s, and then when the original thread actually calls the printf() statement, the values of s are not what the thread thought they were.
These kinds of bugs or race conditions are extremely difficult to debug. Consider the output from
UNIX> race2 4 40 2 Thread 2: CCCCCCCCCCCCCCBBBBAAAAAAAAAAACCCCCCCCCCC Thread 0: CCCCCCCCCCCCCCBBBBAAAAAAAABBBBBBBAAAAAAA Thread 1: AAACCCCCCCCCCCCCCAAAAAAAAABBBBBBBAAAAAAB Thread 1: DDDDDDDDDDDDDBBBBBBBBBBBBBBBBBBBBBBBBBBB Thread 3: DDDDDDDDDDDDDBBDDDDDDDAAAAAAAAAAADDDDDDD Thread 2: DDDDDDDDDDDDDBBDDDDDDDAAAAAAAAACCCCCCCCC Thread 0: DDDDDDDDDDDDDDDDDDDDDDDDDDDAAAACCAAAAAAA Thread 3: DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD UNIX>In the code, the array s is always filled from left to right. Assume that printf() is atomic (cannot be interrupted) and that the lines come out in the order shown. How is this pattern possible given that the last write is what is shown. The line:
Thread 2: CCCCCCCCCCCCCCBBBBAAAAAAAAAAACCCCCCCCCCCcan be explained as "A" gets written first out to the last "B" and then thread 0 stops. Then "B" is written out to the first "B" by thread 1. Thread 2 now runs and writes Cs to the second C. Thread 1 wakes, writes Bs all the way through s. Thread 0 wakes and writes A to the end. Finally thread 2 wakes and writes Cs to the end. This all makes sense until
Thread 1: AAACCCCCCCCCCCCCCAAAAAAAAABBBBBBBAAAAAABwhich is printed after the previous line. How did the As wind up at the beginning?
I'm not really sure of the answer, but I suspect it has to do with printf() and whether/when it copies its arguments completely onto the stack. We won't go into this any farther. Let it suffice to say that even for a simple program, figuring out the behavior of race conditions can be somewhat difficult.
In our race program, we can fix the race condition by enforcing that no thread can be interrupted by another thread when it is modifying and printing s. This can be done with a mutex, sometimes called a ``lock'' or sometimes a ``binary semaphore.'' There are three procedures for dealing with mutexes in pthreads:
pthread_mutex_init(pthread_mutex_t *mutex, NULL); pthread_mutex_lock(pthread_mutex_t *mutex); pthread_mutex_unlock(pthread_mutex_t *mutex);You create a mutex with pthread_mutex_init(). Then any thread may lock or unlock the mutex. When a thread locks the mutex, no other thread may lock it. If they call pthread_mutex_lock() while the thread is locked, then they will block until the thread is unlocked. Only one thread may lock the mutex at a time.
So, we fix the race program with race3.c. You'll notice that a thread locks the mutex just before modifying s and it unlocks the mutex just after printing s. This fixes the program so that the output makes sense:
UNIX> race3 4 4 1 Thread 0: AAA Thread 1: BBB Thread 2: CCC Thread 3: DDD UNIX> race3 4 4 2 Thread 0: AAA Thread 0: AAA Thread 2: CCC Thread 2: CCC Thread 1: BBB Thread 1: BBB Thread 3: DDD Thread 3: DDD UNIX> race3 4 70 1 Thread 0: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Thread 1: BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB Thread 2: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC Thread 3: DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD UNIX> race3 10 70 100 > output3
Race condition: the possibility in a program consisting of potentially concurrent threads that all legal instruction orderings do not result in exactly the same output.
About the only things that should be ambiguous to you here is the term "potentially concurrent." In our examples, we have been thinking about threads running on a single processor with pre-emption. Notice, though that you can think of these threads as running on separate processors simultaneously. Think about it. If there were multiple processors and only one memory, all of the race conditions we have discussed so far would still exist.
virtual concurrencyFor this reason, pre-emptive threads (and timesharing, for that matter) are sometimes referred to as implementing "virtual concurrency." Even though there is only one processor, because of pre-emption it appears to you, the programmer, that there are multiple processors or vice versa.
/*
* CS170: rr_mutex.c
* this code is a modification of race3.c to use mutexes to ensure
* round robin scheduling
*/
#include < unistd.h >
#include < stdlib.h >
#include < stdio.h >
#include < pthread.h >
int MyTurn = 0;
typedef struct
{
pthread_mutex_t *lock;
int id;
int size;
int iterations;
char *s;
int nthreads;
} Thread_struct;
void *infloop(void *x)
{
int i, j, k;
Thread_struct *t;
t = (Thread_struct *) x;
for (i = 0; i < t->iterations; i++)
{
/*
* don't try this at home
*/
pthread_mutex_lock(t->lock);
while((MyTurn % t->nthreads) != t->id)
{
pthread_mutex_unlock(t->lock); /* give it up */
pthread_mutex_lock(t->lock); /* get it again */
}
for (j = 0; j < t->size-1; j++)
{
t->s[j] = 'A'+t->id;
for(k=0; k < 50000; k++); /* delay loop */
}
t->s[j] = '\0';
printf("Thread %d: %s\n", t->id, t->s);
MyTurn++;
pthread_mutex_unlock(t->lock);
}
return(NULL);
}
int
main(int argc, char **argv)
{
pthread_mutex_t lock;
pthread_t *tid;
pthread_attr_t *attr;
Thread_struct *t;
void *retval;
int nthreads, size, iterations, i;
char *s;
if (argc != 4)
{
fprintf(stderr, "usage: race nthreads stringsize iterations\n");
exit(1);
}
pthread_mutex_init(&lock, NULL);
nthreads = atoi(argv[1]);
size = atoi(argv[2]);
iterations = atoi(argv[3]);
tid = (pthread_t *) malloc(sizeof(pthread_t) * nthreads);
attr = (pthread_attr_t *) malloc(sizeof(pthread_attr_t) * nthreads);
t = (Thread_struct *) malloc(sizeof(Thread_struct) * nthreads);
s = (char *) malloc(sizeof(char *) * size);
for (i = 0; i < nthreads; i++)
{
t[i].nthreads = nthreads;
t[i].id = i;
t[i].size = size;
t[i].iterations = iterations;
t[i].s = s;
t[i].lock = &lock;
pthread_attr_init(&(attr[i]));
pthread_attr_setscope(&(attr[i]), PTHREAD_SCOPE_SYSTEM);
pthread_create(&tid[i], &(attr[i]), infloop, (void *)&(t[i]));
}
for (i = 0; i < nthreads; i++)
{
pthread_join(tid[i], &retval);
}
return(0);
}
Note that both the increment and test of MyTurn take place while one
thread is in the mutual exclusion region, so there is no race condition for
the counter. Why would there be otherwise?
relies on fairness -- The boldfaced while loop directs each thread to wait its turn assuming that the implementation schedules threads fairly. If it does not, this code will deadlock if a thread, constantly grabbing and releasing the lock, starves the other threads out.
efficiency -- Even if the implementation is fair, the threads that are waiting to fill the s buffer "burn" their timeslice by executing nothing but the lock-test-unlock sequence. A more efficient synchronization primitive would allow a thread to block without consuming time slices until it is released by another thread.
Pthreads solves the fairness and efficiency problems using a synchronization abstraction known as a condition variable. A condition variable allows a thread to
int pthread_cond_init(pthread_cond_t *cond, pthread_cond_attr_t *cond_attr); int pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex); int pthread_cond_signal(pthread_cond_t *cond);The first call initializes a condition variable (the attribute field, I will let you read about). The second takes the condition variable and a mutex lock as arguments. It is expected that the caller will have successfully acquired the specified lock. When pthread_cond_wait() is called, the calling thread is put to sleep and the lock specified as the second argument is released. When a different thread calls pthread_cond_signal() one of the threads waiting on the condition variable is selected and reawakened. It then re-acquires the lock so when it returns from pthread_cond_wait() it, once again holds the lock.
The utility of these semantics can be a little obscure until you've used them a bit. Consider the code in rr_condvar.c:
/*
* CS170: rr_condvar.c
* this code is a modification of race3.c to use condition variables to ensure
* round robin scheduling using condition variables
*/
#include < unistd.h >
#include < stdlib.h >
#include < stdio.h >
#include < pthread.h >
int MyTurn = 0;
typedef struct
{
pthread_mutex_t *lock;
pthread_cond_t *wait;
int id;
int size;
int iterations;
char *s;
int nthreads;
} Thread_struct;
void *infloop(void *x)
{
int i, j, k;
Thread_struct *t;
t = (Thread_struct *) x;
for (i = 0; i < t->iterations; i++)
{
/*
* do try this at home
*/
pthread_mutex_lock(t->lock);
while((MyTurn % t->nthreads) != t->id)
{
pthread_cond_wait(t->wait,t->lock);
}
for (j = 0; j < t->size-1; j++)
{
t->s[j] = 'A'+t->id;
for(k=0; k < 50000; k++); /* delay loop */
}
t->s[j] = '\0';
printf("Thread %d: %s\n", t->id, t->s);
MyTurn++;
pthread_cond_broadcast(t->wait);
pthread_mutex_unlock(t->lock);
}
return(NULL);
}
int
main(int argc, char **argv)
{
pthread_mutex_t lock;
pthread_cond_t wait;
pthread_t *tid;
pthread_attr_t *attr;
Thread_struct *t;
void *retval;
int nthreads, size, iterations, i;
char *s;
if (argc != 4)
{
fprintf(stderr, "usage: race nthreads stringsize iterations\n");
exit(1);
}
pthread_mutex_init(&lock, NULL);
pthread_cond_init(&wait, NULL);
nthreads = atoi(argv[1]);
size = atoi(argv[2]);
iterations = atoi(argv[3]);
tid = (pthread_t *) malloc(sizeof(pthread_t) * nthreads);
attr = (pthread_attr_t *) malloc(sizeof(pthread_attr_t) * nthreads);
t = (Thread_struct *) malloc(sizeof(Thread_struct) * nthreads);
s = (char *) malloc(sizeof(char *) * size);
for (i = 0; i < nthreads; i++)
{
t[i].nthreads = nthreads;
t[i].id = i;
t[i].size = size;
t[i].iterations = iterations;
t[i].s = s;
t[i].lock = &lock;
t[i].wait = &wait;
pthread_attr_init(&(attr[i]));
pthread_attr_setscope(&(attr[i]), PTHREAD_SCOPE_SYSTEM);
pthread_create(&(tid[i]), &(attr[i]), infloop, (void *)&(t[i]));
}
for (i = 0; i < nthreads; i++)
{
pthread_join(tid[i], &retval);
}
return(0);
}
Here, each thread blocks on its own condition variable until it is signalled
by another thread to continue. Notice that each thread waits while holding
the lock. If pthread_cond_wait() did not release the lock, the code
would deadlock as soon as a thread tried to acquire the critical section out
of order.
how much more efficient? -- Here is an example of the performance difference: rr_mutex 40 30 3 ran in 48.4 seconds on ella.cs.ucsb.edu, while rr_condvar 40 30 3 took only 4.75 seconds. On homer.cs.ucsb.edu rr_mutex 40 30 3 took 10 minutes and 40 seconds, but rr_condvar 40 30 3 too a measely 2.8 serconds. I'm not exactly sure why the difference is that big, but there you have it. Condition variables are your friend.