6.005 — Software Construction
#### Software in 6.005
Safe from bugsEasy to understandReady for change
Correct today and correct in the unknown future. Communicating clearly with future programmers, including future you. Designed to accommodate change without rewriting.
#### Objectives After reading the notes and examining the code for this class, you should be able to implement a threadsafe datatype using synchronization, and be able to use message passing (with synchronous queues) instead of shared memory for communication between threads. The material in this reading is inspired by an excellent book: Brian Goetz et al., *[Java Concurrency in Practice](http://jcip.net/)*, Addison-Wesley, 2006. ## Pairing two models for concurrency with two strategies for thread safety In our introduction to concurrency, we saw [two models for concurrent programming](../17-concurrency/#two_models_for_concurrent_programming): *shared memory* and *message passing*. +
multiprocessor shared memory
In the **shared memory** model, concurrent modules interact by reading and writing shared objects in memory. Creating multiple threads inside a single Java process is our primary example of shared-memory concurrency. + In the **message passing** model, concurrent modules interact by sending messages to one another over a communication channel.
network message passing
The [client/server pattern](../19-sockets-networking/#clientserver_design_pattern) uses message passing. Clients and servers are running concurrently, often on different machines. The communication channel is a [network socket](../19-sockets-networking/#network_sockets). The message passing model has many advantages over shared memory concurrency. We'll discuss in this reading how to implement message passing within a single process, as opposed to between processes over the network. We [defined thread safety](../18-thread-safety/#what_threadsafe_means) for a data type or a function as *behaving correctly when used from multiple threads, regardless of how those threads are executed, without additional coordination*. Here's the general principle: **the correctness of a concurrent program should not depend on accidents of timing**. To achieve that correctness, we enumerated [four strategies for making code safe for concurrency](../18-thread-safety/): 1. [**Confinement**](../18-thread-safety/#strategy_1_confinement): don't share data between threads. 2. [**Immutability**](../18-thread-safety/#strategy_2_immutability): make the shared data immutable. 3. [**Use existing threadsafe data types**](../18-thread-safety/#strategy_3_using_threadsafe_data_types): use a data type that does the coordination for you. 4. **Synchronization**: prevent threads from accessing the shared data at the same time. This is what we use to implement a threadsafe type, but we didn't discuss it at the time. For this reading we have two topics: + Using **synchronization** to implement your own data type that is **safe for shared-memory concurrency**. + Using **blocking queues** (an existing threadsafe type) to implement **message passing** between threads within a process. ## [Part 1: Synchronization](synchronization/) ## [Part 2: Message Passing with Threads & Queues](message-passing/) ## Concurrency in practice What strategies are typically followed in real programs? + **Library data structures** either use no synchronization (to offer high performance to single-threaded clients, while leaving it to multithreaded clients to add locking on top) or the monitor pattern. + **Mutable data structures with many parts** typically use either coarse-grained locking or thread confinement. Most graphical user interface toolkits follow one of these approaches, because a graphical user interface is basically a big mutable tree of mutable objects. Java Swing, the toolkit we'll be looking at in an upcoming class, uses thread confinement. Only a single dedicated thread is allowed to access Swing's tree. Other threads have to pass messages to that dedicated thread in order to access the tree. + **Search** often uses immutable datatypes. Our [Boolean formula satisfiability search](../11-recursive-data-types/#another_example_boolean_formulas) would be easy to make multithreaded, because all the datatypes involved were immutable. There would be no risk of either races or deadlocks. + **Operating systems** often use fine-grained locks in order to get high performance, and use lock ordering to deal with deadlock problems. We've omitted one important approach to mutable shared data because it's outside the scope of this course, but it's worth mentioning: **a database**. Database systems are widely used for distributed client/server systems like web applications. Databases avoid race conditions using *transactions*, which are similar to synchronized regions in that their effects are atomic, but they don't have to acquire locks, though a transaction may fail and be rolled back if it turns out that a race occurred. Databases can also manage locks, and handle locking order automatically. For more about how to use databases in system design, 6.170 Software Studio is strongly recommended; for more about how databases work on the inside, take 6.814 Database Systems. And if you're interested in the **performance** of concurrent programs --- since performance is often one of the reasons we add concurrency to a system in the first place --- then 6.172 Performance Engineering is the course for you. ## Summary Producing a concurrent program that is **safe from bugs**, **easy to understand**, and **ready for change** requires careful thinking. Heisenbugs will skitter away as soon as you try to pin them down, so debugging simply isn't an effective way to achieve correct threadsafe code. And threads can interleave their operations in so many different ways that you will never be able to test even a small fraction of all possible executions. + Make thread safety arguments about your datatypes, and document them in the code. + Acquiring a lock allows a thread to have exclusive access to the data guarded by that lock, forcing other threads to block --- as long as those threads are also trying to acquire that same lock. + The *monitor pattern* guards the rep of a datatype with a single lock that is acquired by every method. + Blocking caused by acquiring multiple locks creates the possibility of deadlock. + Rather than synchronize with locks, message passing systems synchronize on a shared communication channel, e.g. a stream or a queue. + Threads communicating with blocking queues is a useful pattern for message passing within a single process.