- distributed transactions - prereqs - hierarchical transactions - rpc - at least once - at most once - 2pc - 1 coord, n workers - algo - coord and workers do pre-commit work/comm - e.g.: decrement account at A, increment account at B, ... (several more times) - coord bcasts PREPARE - means: "I'm ready to commit. Are you ready to commit?" - workers creates durable log of PREPARED (tentative commit) and sends PREPARED - means: "I'm ready to commit." - also can send ABORT: "No, I'm not ready to commit." - coord bcasts COMMIT - means: "I've committed. Thanks for your help." - failures - coord fails at any point before sending COMMIT - when coord restarts, persistent senders at workers notify coord of ongoing transaction; coord bcasts ABORT - worker fails at any point before logging PREPARED - when worker restarts, coord's persistent sending of PREPARE notifies worker of ongoing transaction - if coord doesn't hear from workers within some time: bcast ABORT - worker fails after logging PREPARED - worker will see this in its recovery and restart the persistent sending of PREPARED; coord will notify it if there was a COMMIT or ABORT - problem: once workers send PREPARED, they are stuck - must get a response from the coord - if coord aboard sinking ship, then hosed - can't abort, can't commit; the system is held up - dilemma of the two generals - only simultaneous attack will succeed (generals must commit to launching the attack) - how many acks do you need? turns out no such protocol exists - the problem: they need to attack at the same time. otherwise one could just go - consistency - ties in with transactions - models - strict consistency: nobody outside the modular boundaries will see data inconsistencies - eventual consistency - observers can see inconsistency; different observers can see different results - but once updates stop, make a best-effort drive toward consistency - when performance/availability > temporary inconsistency - e.g. web browser; lib catalog - cache coherence: for higher performance - read/write coherence - timer expiration: eventual consistency (eg DNS) - "don't cache me" (eg http, java's volatile) - snoopy cache - distributed replicas: for greater durability - replicated state machine - must ensure same inputs and same order: consensus (hard problem) - decay events can cause drifting apart - heterogeneous OS's or environments - homogeneity: bugs - single state machine - master makes update decisions; slaves obey - better if only occasional updates - single point of failure - distributed reads - for strict consistency: use quorums - Qr + Qw > num replicas - reconciliation - for occasionally connected replicas - pessimistic: lock/check out the files; block other update attempts - optimistic: detect & resolve conflicts on connect - system-wide generation numbers - system starts at sys#1; all files start at gen#0 - on updates: raise gen# to current sys# - on reconciliation - gen# != gen#: difference; easy, just propagate change - both gen# = sys#: conflict - finally, increment sys# - no decay detection; requires FS change