The concept of semaphores as used in computer synchronization is due to the Dutch computer scientist Edsgar Dijkstra. They have the advantages of being very simple, but sufficient to construct just about any other synchronization function you would care to have; we will cover a few of them here. There are several versions of the semaphore idea in common use, and you may run into variants from time to time. The end of these notes briefly describe two of the most common, binary semaphores and the SYSV IPC semaphores.
A semaphore is an integer with a difference. Well, actually a few differences.
There are various ways that these operations are named and described, more or less interchangeably. This can be confusing, but such things happen in computer science when we try to use metaphors, especially multiple metaphors, to describe what a program is doing. Here are some:
typedef struct sem {
int value;
other_stuff
} *Sem;
There are two actions defined on semaphores:
P(Sem s) and V(Sem s). (The book calls them P()
and V()). P and V are the first letters of
two Dutch words proberen (to test) and verhogen
(to increment) which, on balance, makes about as much (or as little)
sense as any other set of monikers. The inventor of semaphores was
Edsger Dijkstra who was very Dutch.
initialize(i)
{
s->value = i
return
}
P(Sem s)
{
s->value--;
if(s->value < 0)
block on semaphore
return
}
V(s)
{
s->value++;
if(s->value <= 0)
unblock one process or thread that is blocked on semaphore
return
}
You should
understand these examples to be protected somehow from preemption,
so that no other process could execute between the decrementing and
testing of the semaphore value in the P() call, for instance.
If you consider semaphores carefully, you might decide that they are like mutexes, but they don't "lose" extra signals. This is a good way to look at them, but not the only way.
As you can see, the definition of counting semaphores is simple. This has its advantages. For one thing, on many hardware platforms, there are primitive instructions to make implementation easy and efficient. For another, there are no complications to confuse the programmer. As a result, with some care, solutions implemented with semaphores can have a clarity of purpose that makes the code clean and minimizes the chances for bugs to creep in. It is critical to understand, however, that the semaphore operations P() and V() must be performed automically. That is, the manipulation of the counters and the blocking and unblocking operations must be non-interruptable. Can you see why? If you can't immediately, you might spend a little more time since the question is an excelllent test question.
Much of the research that went into the design of these primitives centered on how "elegantly" they solved different synchronization problems that appear to be common to many asynchronous systems. In addition to the bounded buffer problem, there are a few others.
For example, consider the design of a web server. You probably want one kind of thread to be responsible for reading a request from the network, checking its validity, making sure it is from an IP address you recognize, etc. and a second kind of thread to be responsible to servicing the request. The request-checker thread must run before the servicer thread. To pull this off, you need to write your program to include synchronization that ensures the servicer thread will not try and service a request before a checker thread has checked it. Think of the situation as a bounded buffer problem where the buffer size is one. The request checker produces a request that the servicer must consume. A bit more generally, let's call the two threads A and B, and assume that the operation in thread A has to happen first. We can use a semaphore with an initial value of 0:
| Initialization | sem = 0 | ||
| Thread code | a1 statement a2 sem.V() | b1 sem.P() b2 statement |
You can download a PowerPoint animation.
Notice that the signal from A to B correctly implements the sematics you are hoping for, regardless of when A and B actually run. When you write threaded code, this type of reasoning is exactly what you must go through for each and every synchronization opportunity. Thread execution order cannot determine your outcome or you have a race condition (if the program is deterministic).
| Initialization | aArrived = 0 bArrived = 0 | ||
| Thread code | a1 statement a2 bArrived.P() a3 aArrived.V() a4 statement | b1 statement b2 aArrived.P() b3 bArrived.V() b4 statement |
What's wrong? By waiting before signalling we ensure neither process can proceed to the point signalling. This is a classic deadlock, and it's always going to happen, so it's not even a race condition. We can fix it by switching the order of the signal and wait calls in either thread A or thread B. Here, we'll switch statements b2 and b3.
| Initialization | aArrived = 0 bArrived = 0 | ||
| Thread code | a1 statement a2 bArrived.P() a3 aArrived.V() a4 statement | b1 statement b2 bArrived.V() b3 aArrived.P() b4 statement |
This is better. It may happen that thread A will block without signalling thread B, but thread B will eventually wake up thread A, so things will proceed. But this solution is still not the best. It can happen in a single-processor system that the processor swithces between the two threads more often than is strictly necessary. We really should reverse the order of signal and wait in both threads.
| Initialization | aArrived = 0 bArrived = 0 | ||
| Thread code | a1 statement a2 aArrived.V() a3 bArrived.P() a4 statement | b1 statement b2 bArrived.V() b3 aArrived.P() b4 statement |
Now we've got it right. We'll revisit this idea a bit later for the full-scale barrier problem.
In the previous two examples, the semaphore was initialized to zero, so that if the first operation on the semaphore was a P() call, the calling process would block. But for a mutex, we want the first process to proceed, but block any subsequent P() call until the first process uses V() to indicate that it is finished with the critical section of code. For that, we initialize the semaphore to one instead of zero. We'll call our semaphore mutex to show how we're using it, and indent the calculation to show it's contained within the critical section.
| Initialization | mutex = 1 | ||
| Thread code |
a1 mutex.P()
a2 wolski.balance =
wolski.balance - 400
a3 mutex.V() |
b1 mutex.P()
b2 wolski.balance =
wolski.balance - 400
b3 mutex.V() |
The first thread to reach the P() call will decrement the mutex to 0, but will proceed into the critical section. If the other thread arrives at the P() call before the first one leaves the critical section, it will decrement the mutex to -1 and block. This second thread will become unblocked when the first thread calls V(). This should all look very familiar, because this is exactly what our Pthread mutexes were doing.
Here's what the code looks like, in the format that we were using in the printer simulation lecture notes, if you make allowances for the liberties I'm taking by pretending that semaphores are provided by the Pthreads package.
/* NOTE: this code is notional only; the Pthreads package does not include
* support for semaphores. They are imagined here for purposes of
* presentation only. This code would never compile, let alone run.
*/
#include < stdio.h >
#include < pthread.h >
#include "printqsim.h"
typedef struct {
Job **jobs;
int head;
int tail;
pthread_semaphore_t *headmutex;
pthread_semaphore_t *tailmutex;
pthread_semaphore_t *full;
pthread_semaphore_t *empty;
} Buffer;
void initialize_state(SimParameters *p)
{
Buffer *b;
b = (Buffer *) malloc(sizeof(Buffer));
b->jobs = (Job **) malloc(sizeof(Job *)*p->bufsize);
b->head = 0;
b->tail = 0;
b->headmutex = (pthread_semaphore_t *) malloc(sizeof(pthread_semaphore_t));
b->tailmutex = (pthread_semaphore_t *) malloc(sizeof(pthread_semaphore_t));
b->full = (pthread_semaphore_t *) malloc(sizeof(pthread_semaphore_t));
b->empty = (pthread_semaphore_t *) malloc(sizeof(pthread_semaphore_t));
pthread_semaphore_init(b->headmutex, 1);
pthread_semaphore_init(b->tailmutex, 1);
pthread_semaphore_init(b->full, 0);
pthread_semaphore_init(b->empty, p->bufsize);
p->state = (void *) b;
}
void submit_job(Agent *s, Job *j)
{
SimParameters *p;
Buffer *b;
/*
* get the sim parameters from the agent
*/
p = s->p;
/*
* get the queue from the sim parameters
*/
b = (Buffer *) p->state;
/*
* wait until the job will fit
*/
pthread_semaphore_P(b->empty);
/*
* insert it at the head; protect the head pointer
*/
pthread_semaphore_P(b->headmutex);
b->jobs[b->head] = j;
b->head = (b->head + 1) % p->bufsize;
pthread_semaphore_V(b->headmutex);
/*
* signal one additional slot has a job
*/
pthread_semaphore_V(b->full);
return;
}
Job *get_print_job(Agent *s)
{
SimParameters *p;
Buffer *b;
Job *j;
/*
* get the sim parameters
*/
p = s->p;
/*
* get the buffer from the parameters
*/
b = (Buffer *)p->state;
/*
* wait for work
*/
pthread_semaphore_P(b->full);
/*
* get the one at the tail; protect the tail pointer
*/
pthread_semaphore_P(b->tailmutex);
j = b->jobs[b->tail];
b->tail = (b->tail + 1) % p->bufsize;
pthread_semaphore_V(b->tailmutex);
/*
* signal an additional slot is empty
*/
pthread_semaphore_V(b->empty);
return j;
}
The pattern in general looks like this:
| Initialization | inputmutex = 1 outputmutex = 1 fullslots = 0 emptyslots = queue capacity | ||
| Thread code | // source threads s1 emptyslots.P() s2 inputmutex.P() s3 add to queue s4 inputmutex.V() s5 fullslots.V() | // consumer threads c1 fullslots.P() c2 outputmutex.P() c3 remove from queue c4 outputmutex.V() c5 emptyslots.V() |
You may notice that the while-loops, the if-then-else conditions and the condition variables that were present in the real Pthreads solution are gone from this solution, mostly because the semaphores take care of counting for us. Waiting on the b->full semaphore waits for a slot that has a job, as well as taking care to count the job as removed as soon as the process proceeds.
Another point to note about this solution is that user threads no longer share data with printer threads, and they don't use the same mutex semaphores. Accordingly, user threads and printer threads don't block each other except when the required resources (empty and full queue slots, respectively) are truly unavailable. Users insert jobs using the head pointer, and protect it from concurrent access with the headmutex mutex, but that only blocks other user threads. The printer threads only use the tail pointer, and protect it with the tailmutex mutex, but again that only blocks other printer threads. The count of jobs is kept implicitly in the empty and full semaphores, and the threads incrementing those semaphores never block because semaphore incrementation is a non-blocking call.
All these wonderful qualities of the solution do not relieve us of the responsibility to be careful about race conditions, however. You should convince yourself that this solution works. In doing that, it may be helpful to convince yourself of the following intermediate-level properties of the solution: the slots in the queue are filled and emptied in order, that each decrement of empty is matched by an increment of full and vice versa, the counts of full and empty slots never overstate the associated property of the queue, and temporarily understating these properties does no harm.
For this purpose, we'll need an integer to count threads, a mutex to protect the count, and a semaphore to use like the Signalling pattern we used in the first example. The signal will be given only when all the threads have arrived at the barrier. We just have to be careful, or we'll get it wrong, like this example:
| Initialization | int count = threadcount semaphore mutex = 1 semaphore barriersignal = 0 |
| Thread code | 1 mutex.P() 2 count -- 3 mutex.V() 4 if (count == 0): 5 barriersignal.V() 6 barriersignal.P() 7 barriersignal.V() |
So what's wrong this time? There are two problems. The first is that the semaphore barriersignal might be signalled only once, by the last thread to enter this code, but the barrier needs one signal for each thread that should pass. The second is a little more subtle. If count is a global variable, notice that one or more threads may read count to be the same number, zero being particularly difficult. To see why, imagine the next-to-last thread coming through the critical section and decrementing count to 1. Say it exits the critical section and then gets descheduled due to pre-emption.
| Initialization | int global_count = threadcount semaphore mutex = 1 semaphore barriersignal = 0 |
| Thread code | 1 mutex.P() 2 global_count-- 3 local_count = global_count 4 mutex.V() 5 if (local_count == 0): 6 global_count = thread_count 7 barriersignal.V() 8 else 9 barriersignal.P() 10 barriersignal.V() |
The sequence of a P() and V() in quick succession like this is called a turnstile because it lets one thread through at a time. It's like an empty mutex, just used for traffic control. Each thread, except the last one to decrement global_count, will call P(). After each thread returns from a P() operation it will call V() so that another thread can complete its P(). The last thread kicks off this sequence, though, by calling V() indicating that the first -- uh -- P()ing thread can proceed.
Note that you can't use this primitive in a loop. Why? Well, when the barrier is initialized, the barriersignal semaphore is initialized to 0. Notice, though, that P() will be called thread_count - 1 times, whereas V() will be called thread_count times. Thus after the barrier code is completed by all threads, the value of barriersignal will be 1 -- not 0. If the threads were to loop around and try to use this code again, the P() calls in the else branch will not stop all but the last thread.
Think about how you might fix this issue.
For one thing, the SYSV IPC semaphores are created in groups, and you can operate on more than one at a time atomically. This means that you can release (signal) one or more semaphores at the same time that you get (wait for) one or more others, and that none of this happens unless it all happens. You can do the same thing with the semaphores we have been disussing, but it is complicated to do it right.
For another thing, you can wait or signal by more than a +1 or -1 at a time. This is not common, but when you need it (for instance, to obtain or release more than one of a given resource), this feature is very handy.
Moreover, you are allowed to obtain the current value of the semaphore so you can decide whether your thread wishes to perform an operation. For example, you could decide to V() a semaphore only when enough threads have blocked. This feature is often very handy.
Finally, you can use semaphores in a non-blocking mode, getting an erroor condition back in the cases when the semaphoore would block.