Preparation for Recitation 17
Read ARIES: A Transaction Recovery Method Supporting Fine
Granularity Locking and Partial Rollbacks Using Write-Ahead Logging.
(reading #14 in the course packet). Read Sections 1, 1.1, 2, and 3.
We will continue discussing this paper next recitation, when you will
be reading sections 4, 5, and 6 - 6.3. Note that it is essential
that you have read Sections 9.A-C of the text before reading the ARIES paper,
as it makes heavy use of terminology defined in those chapters.
ARIES is a log-based recovery scheme designed to be used with database
systems. For the purpose of this paper, think of a database of a collection
of tables, each of which consists of a number of records, where
a record is a single row of a table. Actions may read, insert, or
modify records in these tables. Note that the paper makes heavy
use of the term transaction, which as noted on page 9-8
of the notes is used by the database community to
refer to an an action that is atomic, isolated, consistent and durable.
We have not studied the consistency and durability properties yet
in 6.033.
For the purposes of this paper, however,
it is sufficient to think of a transaction as an isolated atomic
action.
As you read the ARIES paper, you will encounter a number of terms you
may not be familiar with. Take note of them, and come to class prepared
to ask questions! Here are a few that are especially conspicuous:
-
Latch : In database terminology, a latch is a lock that
is held for a short period of time (e.g., just to protect a variable
that may be concurrently read/modified by more than one thread); in
databases, locks are usually held until the end of the first
phase of the two-phase locking protocol. Note that even though ARIES
never explicitly says it is using a two-phase locking protocol, it
most certainly is!
- Granularity of Locking : This refers to the size of the objects protected by locks.
Common lock granularities include record, page, or table. A system
that uses record-level locking acquires a lock for every record that it accesses, whereas
a system that uses page- or table-level locks is only required to acquire a lock when
it accesses a particular disk page or table for the first time. Many more locking
operations will be required when using record-level locks, but there will be more
opportunities for concurrency since transactions will not have to wait as frequently
for each other's locks.
- Buffer pool (BP) : A buffer pool is a cache of pages that the database
system has in memory. Pages in the buffer pool may be dirty, indicating the have
been modified since they were last written to disk. Written pages may also be forced,
indicating that they are written directly to disk rather than being cached in the buffer pool.
Database designers have worked very hard to avoid force-writes because they slow down the
system significantly. ARIES uses the term buffer manager (BM) to refer to the
code that is responsible for deciding what to store in the buffer pool. The BM is said
to allow stealing pages if dirty pages can be written to disk before the transaction
that dirtied them has committed.
- Two-phase commit : Two-phase commit is a technique used to make a single atomic action
out of several nested atomic actions. One use of two-phase commit is to allow transactions to be
distributed across several machines. We will discuss this technique later in 6.033.
- Index : As in Google, database systems often maintain indices over tables so that values
that match a particular condition can be looked up quickly. ARIES needs to ensure that these
indices are recovered in addition to the data tables.
The introduction to the paper discusses a number of features described in the
later sections of the paper that we will not discuss. In particular, we will
not discuss topics of partial or nested roll-back, shadow pages, high-concurrency
locking, recovery logging, or nested top actions. Don't worry if you don't
completely understand what the paper says about these advanced topics!
As you read the paper, try answer the following questions:
- What do the author's mean when they say ARIES "repeats history"? In what ways does repeating
history simplify the design of ARIES?
- What is the purpose of the CLR records that ARIES writes? What could go wrong if ARIES
didn't write CLR records?
- Suppose the buffer manager of the database system always forced all writes to disk. How would
that affect the work that had to be done during recovery? Suppose it never stole any pages. How
would that affect recovery?
|