6.033 2011 Lecture 7: Threads and Condition Variables administrivia: quiz next friday [first, finish up last lecture: how to implement acquire?] topic: threads last time: one CPU per program, locks to allow safe sharing of data structures today: more programs than CPUs only one CPU (no busy looping!) or a few CPUs, many more programs also fewer programs than CPUs (CPUs may need to be idle!) goal: virtualize the processor multiplex CPU among many "threads" threads a fundamental O/S abstraction tool for concurrency -- doing more than one thing big influence on rest of O/S design thread abstraction state of a runnable program two main operations suspend - save cpu state -> mem resume - load state from mem -> cpu so CPU multiplexing == suspend X, resume Y, suspend Y, resume X what state needs to be saved? how to save state for suspend? how to resume from saved state? send() from previous lecture illustrates why we want threads and multiplexing [send slide] loops waiting for BB to be non-full burns up a lot of CPU time if one CPU, maybe receive will never run! we'd like to let receive() run... send w/ yield [send/receive slide] yield() gives up the CPU lets other threads run e.g. a receive() may have been waiting and called yield() someday yield() will return after other thread yield()s e.g. it tries to receive() but BB is empty how to implement yield()? yield() is the guts of the thread implementation suspend one, resume another data: threads[] table: state, sp thread stack 0, stack 1, ... cpus[] table: thread t_lock (coarse granularity...) [yield version 1 slide] what happens in yield? send calls yield how does it know what thread is running? per-CPU register CPU() contains cpu # cpus[] says what's happening on each CPU RUNNING -> RUNNABLE RUNNING means some CPU executing it now RUNNABLE means not executing, but could save SP (the CPU register) look for a different thread to run ignore the RUNNING ones mark "new" thread as RUNNING, so no other CPU runs it restore saved SP of new thread that is, load it into CPU's SP register return Questions? Example: cpu1 cpu2 ---- ---- A... B... yield() save A's SP set RUNNABLE runs C yield() save B's SP SP = A's SP set RUNNING return runs A show: A's stack send()'s variables RA into send() B's stack threads[] A SP A state B SP B state Questions: Worth noting in example: enter yield() "in" one thread, return in another! 2nd half of yield() really completes some *previous* yield() stack switch *is* the thread switch only need to save/switch SP, other state on stack saved PC on stack controls where yield() returns to questions: this use of SP might not work, depends on compiler I'm assuming compiled code does not change SP in body of yield() so SP saved at start still correct when restored at end SP manipulations in hand-coded machine code in real life what does t_lock protect? indivisible set of .state and .sp indivisible find RUNNABLE and mark it RUNNING don't let another CPU grab current stack until we've switched motivate notify / wait [send with yield slide] send() and receive() still chew up CPU time e.g. send() waits for receive to free up a slot in BB e.g. receive() waits for BB to be non-empty repeated yield() expensive if many threads waiting want send to suspend itself have receive wake it up when there is space i.e. suspend, wake up only when specific *condition* is true "condition variable" cvar object names a condition that threads can wait for two methods: wait(cvar, lock) returns after some other thread calls notify(cvar) release lock mark thread WAITING yield() (until notify) acquire lock notify(cvar) wake up all threads currently in wait(cvar) notify has no memory: if no threads wait()ing, no effect at all wait() and then notify(): wait returns notify() and then wait(): wait does not return each BB has two condition variables: bb.full (send waits on this if full) bb.empty (recv waits on this if empty) [send with wait/notify] if full, waits, receive will someday free up a slot and notify(bb.full) waits in a loop, re-checks after wait returns maybe multiple senders waiting, but only one slot freed up that is, wait() returning is only a hint you always ought to explicitly check the condition notifies bb.empty in case one or more receives are waiting no harm if no-one is waiting holds lock across while test and use of buffer so no other send() can sneak in and steal buffer[] slot why does wait() release bb.lock? why not have send() release it? i.e. why not while True: if bb.in - bb.out < N: ... release wait(bb.full) acquire notify might occur between release and wait no effect, since no threads waiting at that point then send()'s wait() won't return, even though there's space for a msg! this is the "lost notify" problem avoiding lost notifies wait(cvar, lock) caller must hold the lock wait() atomically releases lock and marks thread as waiting so no notify can intervene re-acquires lock before returning (not critical, just for symmetry) notify(cvar) caller must notify() after changing data structure of course, must hold the lock while changing data structure this ensures that, if data structure changed after check+wait(), notify will happen after as well so, implicitly, condition variable always associated with a particular lock implementing wait thread table additions: new state: WAITING threads[].cvar (so notify can find us) big Q: where to release the lock? [wait() slide] acquire t_lock first, then release the lock, then WAITING b.t.w. need to modify yield() [wait+notify() slide] notify() called after updating bb, notify() acquires t_lock consider possible cases between send() and receive() threads if receive() frees up space before send() does check+wait: send() acquire waits until receive is done send() will see empty slot and not wait if after: receive()'s notify() will run after send went to sleep notify() will see WAITING send thread, and mark it RUNNABLE lock ensures receive() cannot free up space between send's check+wait but now we must revisit yield() [yield v1 again] wait() acq t_lock, sets state, so delete that from yield() yield might find there is nothing RUNNABLE!!! (harder) this thread WAITING, but receive() running on another CPU loops forever while holding t_lock so no other CPU can execute notify() so no thread will ever be RUNNABLE system will hang how to fix yield()? [yield version 2 slide] don't acq/rel t_lock, don't set state to RUNNABLE release+acquire in "idle loop" still spins indefinitely while no runnable threads BUT lets other CPUs execute notify() note I've also set the SP to a per-CPU stack, before idle loop why? yield() v1 runs idle loop on calling thread's stack someone might notify() it some other CPU in idle loop might run the thread now two CPUs are executing on the same stack e.g. calling functions like acquire, which modify the stack thus per-CPU stack for yield() to use when not in any thread --- for next lecture --- pre-emptive scheduling what if a thread never calls yield()? we are in trouble, no way to multiplex that CPU compute-bound, or long code paths, or broken user programs too annoying to require programmer to insert yield()s we want forcible periodic yield how to pre-empt? timer h/w, generates an interrupt 100 times per second interrupt saves state on thread's stack, calls yield() timer interrupt: push PC push registers acquire(t_lock) threads[...].state = RUNNABLE yield() release(t_lock) pop registers pop PC what actually happens? yield() switches thread stacks so interrupt returns to a different thread what if timer interrupt while you are in yield already? would call yield recursively deadlock: already holding t_lock acquire should disable interrupts release should re-enable not just for here, but all uses of locks what if timer interrupt after idle loop releases t_lock? again, recursive yield() but invalid cpus[][CPU()].thread so fix yield() to null out .thread and fix timer interrupt to yield only if valid .thread closing threads at the heart of most operating systems allow lots of concurrent activities on few CPUs e.g. let you run multiple programs