6.033 2011 Lecture 18: Isolation

Recall: goal is transactions: all-or-nothing and before-or-after atomicity.
  Both forms are about hiding implementation details.
  Last week: techniques for achieving all-or-nothing atomicity.
    Shadow copy, logging.
  Today: how to achieve before-or-after atomicity.
    Much easier to reason about what happens when multiple actions run.
    Allows concurrent actions to have all-or-nothing atomicity.
    (If actions see each other's intermediate states, then not all-or-nothing..)
    In database terminology, called "serializability" or "serial equivalence".

One simple way to achieve before-or-after atomicity: serial execution.
  Run all transactions in some serial order.
  Use one system-wide lock to ensure no concurrent operations occur.
  Why do we want anything more?
  Performance: multiple CPUs and I/O overlapping.
  But manually inserting fine-grained locks is error-prone,
    requires significant effort.
  So we're going to come up with a more automated plan to acquire locks.

Consider two concurrent operations in a bank application:
  [ slide: concurrent actions ]
  xfer(A, B, amt):
    begin
    A = A - amt
    B = B + amt
    commit

  int(rate):
    begin
    for each account x:
      x = x*(1+rate)
    commit

What happens if we have A=100, B=50, and concurrent xfer(A,B,10) & int(0.1)?
  Serial executions:
    .. -> [xfer] -> A=90,  B=60 -> [int]  -> A=99,  B=66
    .. -> [int]  -> A=110, B=55 -> [xfer] -> A=100, B=65
  Unless there's some apriori ordering of xfer and int, either outcome is OK
    if xfer() & int() are executed concurrently.

What could happen if we run these operations concurrently?
  xfer:       int:
   RA [100]
   WA [90]
               RA [90]
               WA [99]
   RB [50]
   WB [60]
               RB [60]
               WB [66]
  This is called a "schedule" for these two transactions.
  The two transactions didn't run serially, but is this schedule serializable?
  Yes: outcome is the same as if they ran sequentially.

  xfer:       int:
   RA [100]
               RA [100]
   WA [90]
               WA [110]
   RB [50]
   WB [60]
               RB [60]
               WB [66]
  Not serializable: could not have possibly gotten (A=110,B=66) serially.

How to tell if a schedule is equivalent to some serial schedule?
  (In other words, has "serial equivalence"?)
  Problem: both transactions read & wrote A; both read before the other wrote.
    In a serial execution, one reads & writes after another reads & writes.
  Formally: two read/write operations T1.o1 and T2.o2 are conflicting if:
    - both access the same object
    - at least one is a write [i.e., order of two reads doesn't matter]
  "Conflict" here just means a pair of operations whose order matters.
  A schedule is "conflict serializable" if:
    - for all pairs of transactions T1, T2,
    - for all conflicting op pairs T1.o1 and T2.o2,
    - they are ordered in the same way (either T1->T2 or T2->T1)
  If this condition isn't met, could have a problem:
    for some ops, T1 appears to be first, and for others, T2 appears first.
  (Technically, could be serializable even if not conflict-serializable,
    but typically not used in practice.)
  [ draw conflicts as bi-directional arrows in above non-serializable schedule ]

Achieving conflict serializability: locking protocol.
  We don't want to rely on programmers to get locking correct.
  Need to devise a plan that guarantees serial-equivalent execution.
  Plan: Associate a lock with every variable; grab lock before accessing var.
  [ slide: locking protocol ]
  Q: When should we release?
    Should not release right after read/write operation.
    Can we release after txn is done with some variable (e.g., A or B)?
    xfer:       int:
     RA [100]              \ T1 holds A.lock
     WA [90]               /
                 RA [90]      \ T2 holds A.lock
                 WA [99]      /
                 RB [50]      \ T2 holds B.lock
                 WB [55]      /
     RB [55]               \ T1 holds B.lock
     WB [65]               /
    No good: (99,65) is not serial-equivalent.
  One solution that works is to release upon commit (or abort).
  [ slide: locking protocol with release ]
  Once T1 grabs A.lock, no other transaction that touches A can run before T1.

Deadlock
  What if the int() function accessed B first, then A?
  T1 (xfer) grabs A.lock, T2 (int) grabs B.lock.
  Now both want each other's locks, and neither can make progress.
  What are the solutions for deadlock?
    - Lock ordering: not a great approach if set of locks not known apriori.
    - Abort one of the deadlocked txns: make use of all-or-nothing atomicity.
  How to detect deadlocks?
    - Form a wait dependency graph: what each transaction has & wants.
        T1: holds A, wants B (-> T2)
        T2: holds B, wants A (-> T1)
      If a cycle is found, then we have deadlock.
      Requires a fair amount of book-keeping, analysis algorithm.
    - Assume that any acquire that takes more than threshold is deadlock.
      Simple, but doesn't deal well with long-running transactions.

Optimization 1: releasing locks early.
  Observation: once a txn has acquired all of its locks, it will not block.
    The point at which all locks have been acquired is called the "lock point".
  Once T1 reaches lock point, any conflicting transaction T2 will run later.
    Intuition: if any other txn wants one of T1's locks, it will have to run
      later, because T1 has already acquired all of its locks.
  The serial order we get is the order in which txns reach their lock points.
  If txn reached lock point and will no longer access x, can release x.lock.
    Will not affect serialization order.
    Will not affect x.
  Our earlier example:
    xfer:       int:
     RA [100]              \ T1 holds A.lock
     WA [90]               /
      ...                  > T1 acquires B.lock, lock point, releases A.lock
                 RA [90]      \ T2 holds A.lock
                 WA [99]      /
                 lock B: wait
     RB [50]               \ T1 holds B.lock
     WB [60]               /
                 RB [60]      \ T2 holds B.lock, lock point, had to wait for T1
                 WB [66]      /
  This is called "two-phase locking" (2PL).
    Phase 1: acquire locks, until transaction reaches lock point.
    Phase 2: release locks, after lock point & done with object.
  To make this optimization work, need to know when we're done with object.
    Requires input from programmer, or program analysis.

What happens if we release locks before commit/abort?
  Problem arises if some transaction doesn't commit (i.e., abort or crash).
  If T1 releases a read lock and then aborts, not a problem.
    Even if T2 acquired read or write lock, and committed, still serializable.
  If T1 releases a write lock and then aborts, could reveal uncommitted values.
    E.g., in our example above, if T1 (xfer) aborts, we would get (A=99, B=55).
    Big problem: not all-or-nothing / before-or-after.
    T2 saw intermediate effects of concurrent execution; T1's abort incomplete.
  Two possible solutions:
    - Hold write locks until commit (strict 2PL): do not reveal new values
      until we know the transaction commits.  Typical.
    - Abort any transaction that saw uncommitted results (e.g., T2).
      This strategy is known as cascading aborts.
      Requires bookkeeping to track readers,
      Commit sometimes blocks, if txn read values that weren't committed yet.
      Complicates crash recovery, ...

Optimization 2: reader-writer locks.
  Recall that conflict required at least one operation to be a writer.
  Multiple readers should be able to operate on same data concurrently.
  We can use the idea of reader-writer locks to implement this.
  [ slide: locking with rw locks ]
  Instead of one acquire function, we have two: acquire_reader() and _writer().
  A reader can acquire lock at the same time as other readers (but no writers).
  A writer can acquire lock only if no other readers or writers.
  One possible problem: fairness.
    What if readers keep acquiring the lock, and a writer is waiting?
    A naive impl might never allow the writer to proceed, if always 1+ reader.
    Typical solution: if a writer is waiting, new readers wait too.
  This is an instance of the more general lock compatibility idea.
    Determine what pairs of operations are safe to execute concurrently.
    Allow concurrent locks by those ops (& disallow any that are unsafe).
    In general, need to understand operation semantics to determine compat.
    Reader/writer locks don't require understand semantics.

Summary.
  Technique for reasoning about concurrent actions: before-or-after atomicity,
    or serializability.
  Can check for serializability by considering ordering of conflicting ops.
  2PL can ensure serializability by using locks to order conflicting ops.
  Can recover from deadlock by aborting one of the deadlocked transactions.