Controlling the Order of Execution in Nachos

Many bugs in concurrent code are dependent on the order in which threads happen to execute at runtime. Sometimes the program will run fine; other times it will crash out of the starting gate. A program that works one time may fail the next time because the system happened to run the threads in a different order. The exact interleaving may depend on all sorts of factors beyond your control, such as the OS scheduling policies, the exact timing of external events, and the phases of the moon. The Nachos labs require you to write a lot of properly synchronized code, so it is important to understand how to test your code and make sure that it is solid.

Context Switches

On a multiprocessor, the executions of threads running on different processors may be arbitrarily interleaved, and proper synchronization is even more important. In Nachos, which is uniprocessor-based, interleavings are determined by the timing of context switches from one thread to another. On a uniprocessor, properly synchronized code should work no matter when and in what order the scheduler chooses to run the threads on the ready list. The best way to find out if your code is "properly synchronized" is to see if it breaks when you run it repeatedly in a way that exhaustively forces all possible interleavings to occur. To experiment with different interleavings, you must somehow control when the executing program makes context switches.

Context switches can be either voluntary or involuntary. Voluntary context switches occur when the thread that is running explicitly calls Thread::Yield or some other routine to causes the scheduler to switch to another thread. Note that the thread must be running within the Nachos kernel in order to make a voluntary context switch. A thread running in the kernel might initiate a voluntary switch for any of a number of reasons, e.g., perhaps as part of an implementation of some higher level facility, or maybe the programmer was just being nice.

In contrast, involuntary context switches occur when the inner Nachos modules (Machine and Thread) decide to switch to another thread all by themselves. In a real system, this might happen when a timer interrupt signals that the current thread is hogging the CPU. Nachos does involuntary context switches by taking an interrupt from a simulated timer, and calling (you guessed it) Thread::Yield when the timer interrupt handler returns.

Voluntary Context Switches with Thread::Yield

One way to test concurrent code is to pepper it with voluntary context switches by explicitly calling Thread::Yield at various interesting points in the execution. These voluntary context switches emulate what would happen if the system just happened to do an involuntary context switch via a timer interrupt at that exact point.

Properly synchronized concurrent code should run correctly no matter where the yields happen to occur. At the lowest levels of the system, there is some code that absolutely cannot tolerate an unplanned context switch, e.g., the context switch code itself. This code protects itself by calling a low-level primitive to disable timer interrupts. However, you should be able to put an explicit call to Thread::Yield anywhere that interrupts are enabled, without causing your code to fail in any way.

Involuntary Context Switches with the -rs Flag

To aid in testing, Nachos has a facility that causes involuntary context switches to occur in a repeatable but unpredictable way. The -rs command line flag causes Nachos to call Thread::Yield on your behalf at semi-random times. The exact interleaving of threads in a given nachos program is determined by the value of the "seed" passed to -rs. You can force different interleavings to occur by using different seed values, but any behavior you see will be repeated if you run the program again with the same seed value. Using -rs with various argument values is an effective way to force different orderings to occur deterministically.

In theory, the -rs flag causes Nachos to decide whether or not to do a context switch after each and every instruction executes. The truth is that -rs won't help much, if at all, for the first few assignments. The problem is that Nachos only makes these choices for instructions executing on the simulated machine, i.e., "user-mode" code in later assignments. In the synchronization assignments, all of the code is executing within the Nachos "kernel". Nachos may still interrupt kernel-mode threads "randomly" if -rs is used, but these interrupts can only occur at well-defined times: as it turns out, they can happen only when the code calls a routine to re-enable interrupts on the simulated machine. Thus -rs may change behavior slightly, but many other damaging interleavings can be forced if you add explicit calls to Thread::Yield in your code.


Jeff Chase