Controlling the Order of Execution in Nachos
Many bugs in concurrent code are dependent on the order in which
threads happen to execute at runtime. Sometimes the program will run
fine; other times it will crash out of the starting gate. A program
that works one time may fail the next time because the system happened
to run the threads in a different order. The exact interleaving may
depend on all sorts of factors beyond your control, such as the OS
scheduling policies, the exact timing of external events, and the
phases of the moon. The Nachos labs require you to write a lot of
properly synchronized code, so it is important to understand how to
test your code and make sure that it is solid.
Context Switches
On a multiprocessor, the executions of threads running on different processors may
be arbitrarily interleaved, and proper synchronization is even more important. In
Nachos, which is uniprocessor-based, interleavings are determined by the timing of
context switches from one thread to another.
On a uniprocessor, properly synchronized code should work no matter when and in what order the
scheduler chooses to run the threads on the ready list.
The best way to find out if your code
is "properly synchronized" is to see if it breaks when you run it repeatedly in a way that
exhaustively forces all possible interleavings
to occur.
To experiment with different interleavings,
you must somehow control when the executing program makes context switches.
Context switches can be either voluntary or involuntary. Voluntary context
switches occur when the thread that is running explicitly calls Thread::Yield
or some other routine to causes the scheduler to switch to another thread.
Note that the thread must be
running within the Nachos kernel in order to make a voluntary context switch. A thread running in the
kernel might initiate a voluntary switch
for any of a number of reasons, e.g., perhaps as part of
an implementation of some higher level facility, or maybe the programmer was just being nice.
In contrast, involuntary context switches occur when the inner Nachos modules
(Machine and Thread)
decide to switch to another thread all by themselves. In a real system,
this might happen when a timer interrupt signals that the current thread is hogging
the CPU. Nachos does involuntary context switches by taking an interrupt from a
simulated timer, and calling (you guessed it) Thread::Yield when the timer
interrupt handler returns.
Voluntary Context Switches with Thread::Yield
One way to test concurrent code is to pepper it with voluntary context switches
by explicitly calling Thread::Yield at various interesting points in the
execution.
These voluntary context switches emulate what would happen if the
system just happened to do an involuntary context switch via a timer
interrupt at that exact point.
Properly synchronized concurrent code should run
correctly no matter where the yields happen to occur.
At the lowest levels of the system, there is some code that absolutely cannot
tolerate an unplanned context switch,
e.g., the context switch code itself. This code protects
itself by calling a low-level primitive to disable timer interrupts.
However, you should be able to put an explicit call to Thread::Yield
anywhere that interrupts are enabled, without causing your code
to fail in any way.
Involuntary Context Switches with the -rs Flag
To aid in testing, Nachos has a facility that causes involuntary
context switches to occur in a repeatable but unpredictable way. The
-rs command line flag causes Nachos to call
Thread::Yield on your behalf at semi-random times. The exact
interleaving of threads in a given nachos program is determined by the
value of the "seed" passed to -rs. You can force different
interleavings to occur by using different seed values, but any
behavior you see will be repeated if you run the program again with the same seed value.
Using -rs with various argument values is an effective way to
force different orderings to occur deterministically.
In theory, the -rs flag causes Nachos to decide whether or not
to do a context switch after each and every instruction executes.
The truth is that -rs won't help much, if at all, for the first few assignments.
The problem is that Nachos only makes these choices for instructions executing on
the simulated machine, i.e., "user-mode" code in later assignments.
In the synchronization assignments, all of the code is executing within the Nachos "kernel".
Nachos may still interrupt kernel-mode threads "randomly" if -rs is
used, but these interrupts can only occur at well-defined times: as it turns
out, they can happen only when the code
calls a routine to re-enable interrupts on the simulated machine. Thus
-rs may change behavior slightly, but many
other damaging interleavings can be forced if you add explicit calls
to Thread::Yield in your code.
Jeff Chase