6.893 Architecture of Database Systems

Assignment 2, due: March 22, 2002

Suppose a relational DBMS stores all data in "inverted file" format. Hence, each row in an N column table is given a record identifier (RID). Then, the storage system stores the table as N structures of the form {(value, RID)}, where each structure is sorted in value order and indexed using a B-tree.

For example, if you have the following collection of records:

(Mike,  29, 40000, shoe)
(Bill,  32, 20000, toy)
(Sam,   45, 30000, candy)

then you would assign record ids to the three records, say 1,2, and 3 and store the following structures:

(Bill, 2)  (29, 1)  (20000, 2)  (candy, 3)
(Mike, 1)  (32, 2)  (30000, 3)  (shoe, 1)
(Sam, 3)   (45, 3)  (40000, 1)  (toy, 2)

Each structure would be indexed on the data value,using a B-tree. The records themselves would NOT be stored at all. For better performance you can also assume, if you want to, that each of the above four structures are also stored in RID order in a second B-tree.

Sketch in 2 pages or less how a Selinger-style optimizer (Chapter 2, paper 6) would change to adapt to this environment. If you feel the urge, then sketch how you could improve the performance of the resulting system.