M.I.T. DEPARTMENT OF EECS

6.033 - Computer System Engineering (Spring 2007)

February 13, 2007

Please find additional clarifications and details in the DP1 FAQ.

6.033 Design Project 1: The Tag Storage System

I. Assignment

There are two deliverables for Design Project 1:

Two copies of a design proposal not exceeding 800 words, due on Tuesday, February 27, 2007.

Two copies of a design report not exceeding 2,500 words, due on Thursday, March 22, 2007.

A design report is a different beast from a quiz. As in real life, the 6.033 design projects are under-specified, and it is your job to complete the specification in a sensible way given the overall requirements of the project. As with designs in practice, the requirements often need some adjustment as the design is fleshed out. We strongly recommend that you get started early so that you can iterate your design. A good design is likely to take more than just a few days to put together.

II. The Problem

II.1. Introduction

Tagging has become a popular approach for cataloging information about pictures, sounds, video, and pages on the Web. It is used heavily in sites like Flickr and del.icio.us to allow users to annotate content with user-supplied information -- for example, a user might tag a picture of a marmoset (a kind of South American monkey) with the labels "monkey" and "funny." Then, other users who want to find pictures of monkeys might search for all images with the label "monkey" and get this picture back in return.

In this design project, your goal is to design a storage system for a large database of such tags. Rather than simply storing <filename,tag> pairs, your design will be for a more expressive tagging system. In particular, tags in your system will consist of triplets of the form <subject, relationship, object>. Each of these three elements is a string. These strings may encode references to files or other resources, or may simply be arbitrary constants. For example, the triplet:

  <file://marmoset.jpg, isa, monkey>

might be used to tag the marmoset image described above. (Note that we use Web-style "uniform resource identifiers" (URIs) like "file://" or "http://" to encode references.) Notice that this tagging approach allows arbitrarily complicated relationships to be represented because the object field can also be a reference. For example, the triplets:

  <uri://mysafarialbum, contains, file://marmoset.jpg>
  <uri://mysafarialbum, contains, file://bonobo.jpg>
  <uri://mysafarialbum, backgroundcolor, black>
  <uri://myalbumcollection, contains, uri://mysafarialbum>

indicate that uri://mysafarialbum contains two images, that when the photo album is displayed it should be rendered with a black background, and that the photo album is itself a part of a collection of albums.

Your storage system will have to manage a very large collection of these triplets -- far larger than can fit into the memory of a computer. For example, in Flickr, there are 890,000 images tagged with the term "people", 850,000 tagged with the term "flower", and 60,000 tagged with the term "monkey" -- if each triplet takes about 100 bytes to store, then the triplets for these three terms alone occupy almost 200 megabytes of storage! Flickr contains thousands of such terms, which occupy many gigabytes of storage. Thus, the primary challenge you face is to decide how to physically represent this triplet data on disk in such a way that it can be efficiently accessed and updated, just as the file systems we have studied in class (see class notes, Appendix 2.A.) attempt to lay out file data in an efficient manner.

However, there are many differences between this triplet storage system and a file system. In particular:

In a file system, each file has a name, and is stored in a given location in the directory hierarchy. In a tag storage system, each object (e.g., a image or a user) has a unique name (a URI) that identifies it, but the triplets themselves do not have names. They just represent the relationships between objects. Furthermore, these triplets are not stored according to any hierarchy, unlike files in a directory hierarchy.
Because triplets don't represent individual objects, users of a tag storage system are not usually interested in accessing a single triplet at a time. Instead, they want to access a collection of properties about a given object (e.g., all the triplets for a given image), or a collection of objects that share a given relationship type (e.g., all the images uploaded by a particular user).
Individual triplets usually contain far less data than a file. Hence, a disk block will typically contain many triplets.

Note that you are not required to design the file system that contains the URIs that triplets refer to (like marmoset.jpg), just the tag storage system.

In the remainder of this design project we describe the interfaces you must provide and design requirements that your system must satisfy.

II.2. Interfaces

Your system must implement an API for applications to use. We have specified that API for you. You will need to describe how your design implements this API, and argue that your implementation is efficient across a range of workloads (discussed in more detail in Section III.1)

We have also provided you with a basic interface to the computer's disk and memory that you may assume is available to your storage system.

II.2.1. API

Your storage system should support the following API. This API is designed to allow applications to look up triplets that match certain criteria, and to add and remove triplets. You may change this API if needed, but you should provide a good reason for doing so.

Your storage system should run as a server. Applications (clients) send requests to this server via RPC.

INSERT(subject String, relationship String, object String)

Add a new triple to the storage system. After INSERT is called, any subsequent call to FIND should return this new record.

REMOVE(subject String, relationship String, object String)

Remove a triple from the storage system. After REMOVE is called, no subsequent call to FIND should find this triple.

FIND(subject String, relationship String, object String, start integer, count integer) returns array of triple

triple structure {
  subject String;
  relationship String;
  object String;
}

Find all triplets that match the specified subject, object, and relationship. Each of field may optionally be the wildcard string "*" that will match any value. The FIND call will return only count of the matching triplets, beginning with triplet start.
For example, to find the first 1000 triplets that indicate that the subject has a "isa" relationship with the object "monkey", the following request could be issued:
results = FIND("*", "isa", "monkey",0,1000)
As another example, consider finding triplets that indicate the first 1000 objects posted by the user "uri:\\kaashoek":
results = FIND("uri:\\kaashoek", "postedby", "*",0,1000)
In both of these cases, the results are returned into an array that is in the address space of the calling application. No state needs to be kept in the storage system about the results of this request. You may assume that the application retrieves triples in segments that will fit in its own memory -- that is, if a really large dataset is needed, the application will step through it in chunks. Your design should ensure that as long as no intervening INSERT or DELETE calls are made, repeated calls to FIND that iterate through all such chunks of a large result will eventually lead to the application receiving all triples that satisfy the FIND request.

SHUTDOWN()

Write all state of the storage system to disk in preparation for turning off the computer.

Note that we are not concerned with the correctness or performance of these methods under failure. If the system crashes, it is OK for the state of your tag system to be arbitrarily corrupted or garbled. In the second half of 6.033 we will discuss how to build systems that can tolerate various types of failures.

Note, however, that you do need be able to support a clean shutdown, which will require you to put all of the state of your storage system on disk.

II.2.2. Hardware / OS Interface

You are given a block-based interface to disk (see sidebar 2-12 on page 2-15 of the course notes), as follows:

block byte[4096]
READ_BLOCKS(blocks array of block, start_block integer, num_blocks integer) returns integer
WRITE_BLOCKS(blocks array of block, start_block integer, num_blocks integer)

Here, blocks are 4 KBs, and these routines can be used to sequentially read or write an arbitrary number of blocks to or from disk. You should assume that a disk seek is always required between consecutive calls to either of these routines. Because one of the ways we will evaluate your design is on its efficiency (see Section VII below), it is very important for you to understand the difference between sequential and random disk I/O (see section A.7. of Chapter 6 of the course notes). The following numbers should help you:

disk access time (seek + rotation): 12.17 milliseconds
block read time: .06 milliseconds

Hence, the total time to read n blocks via a single READ_BLOCKS command would be 12 + .06n, whereas the time to read n blocks via n READ_BLOCKS commands would be 12.23n.

You can assume you have 1 TB (1,000 GB) of sequential disk blocks, numbered starting at 0, available for the exclusive use of your storage system. Assume that the disk system does no caching of blocks (so two consecutive calls to READ_BLOCKS requesting the same blocks require exactly the same amount of time.)

You can also assume that you have 1 GB of main memory available to your storage system, and that the total storage required for your database is about 100 GB (though inserts and deletes may cause it to grow or shrink substantially from this amount over time), and that each triplet occupies about 100 bytes -- hence the database contains about 1 billion triplets. You may use this main memory however you like -- for example, as a cache for triplets or to keep parts of your storage system resident in memory.

III. Design Considerations

Your design report will need to describe how you implement the API given in Section II.2 as well as how data is represented on disk. You should support your design with diagrams, pseudocode, flowcharts, and examples.

III.1. Evaluation Criteria

We will be evaluating your design based on how you address the following questions and issues:

What is the overall structure of your storage system? Do all blocks have the same format, or are there directory or index blocks that are different than data blocks?
How do you keep track of which disk blocks are free and which are occupied? How do you keep track of how much of each disk block is currently occupied by triplets?
How do you support the FIND call described in the API? Do you have a way to efficiently discover blocks that contain triplets with particular subjects, objects, or relationships? Are blocks with similar subjects or objects co-located in some way so that you don't have to seek all over the disk to retrieve related records?
How do you insert triplets in the system? What block or blocks do newly inserted triplets go into? Do you ever need to re-organize blocks when an insert happens because they are full?
How do you delete triplets from the system? How do you locate the triplets that need to be deleted? When a triplet is deleted, do you need to reorganize the block or blocks that triplet is on in some way?
With respect to the workloads given in the next section, about how long would it take for the system to evaluate each workload? In evaluating your design on these workloads, you should try to provide back of the envelope calculations and intuitive arguments; you do not need to provide proofs of optimality, closed-form analytical models, or simulation or implementation results!
How does you design compare -- in terms of typical time to process the workloads -- to at least one obvious, naive approach? The simplest such naive approach is one that reads the entire collection of triplets one by one and identifies those triplets that satisfy a particular find or delete request.

It's okay, and sometimes desirable, to say, "no, my design doesn't do that" as long as you can identify what your design can and can't do, and explain why you made the trade-offs you did. Remember: a simple, correct design that meets the requirements laid out above is more important than a complex one that is hard to understand and does a flaky job. When in doubt, leave it out!

III.2. Workloads to consider

In addition to explaining your design, you will need to analyze its performance under different workloads. This section describes several ways in which your system will be used. You should describe how each of these workloads is handled and estimate the time taken to process them. You should use the hardware parameters (seek time and block read time) given in Section II.2.2 to estimate this processing time (you can assume that the only time consuming operation in your system is disk I/O -- modern CPUs and memories are so fast that operations with them are inconsequential in comparison to the time to access disk.) For the purposes of these performance estimates, you may ignore the effects of caching or other optimizations you propose, if doing so simplifies your analysis.

The workloads come from two applications, a photo-sharing application like Flickr and a library catalog application.

You shouldn't design two different systems for these two different applications, but should come up with a general purpose design. Your system may be better at one workload -- if so, that's OK, but you should be sure to describe why. Your system also shouldn't be so tailored to these two workloads that it couldn't possibly support other types of data or distributions of requests (though, of course, you don't need to optimize your design for such unforeseen workloads.)

You can assume that these workloads are such that there is some idle time in the system -- for example, for a few hours every night, the number of requests is low and the system is relatively unburdened (allowing you, for example, to occasionally reorganize the data, if your design calls for that.)

III.2.1. Flickr++ Application

In Flickr++, some users upload photos, create albums and tag photos. Other users look for photos with particular triplets and by particular users. Occasionally, administrators delete all the photos from a particular user, or delete particular photos.

Triplet structure:

All objects (e.g., image, album, user) are tagged with a type. For example:

  <file://marmoset.jpg, istype, image>
  <uri://mysafari, istype, album>
  <uri://kaashoek, istype, user>

Albums can contain images. Users can own images and albums:

  <uri://mysafari, contains, file://marmoset.jpg>
  <uri://kaashoek, owns, file://marmoset.jpg>
  <uri://kaashoek, owns, uri://mysafari>

Note that images and albums can have additional descriptive triplets associated with them via the "isa" reference, e.g.:
```
  <file://marmoset.jpg, isa, monkey>
```

Workload:
INSERT, REMOVE, and FIND commands are mixed together in this workload. They occur with the following frequencies:

5% of the total requests involve adding or deleting an image. Note that to add an image to an album, at least three inserts are required (an "istype" triplet, a "contains" triplet, and a "owns" triplet.)
5% of the total requests involve tagging an image using INSERT to add a triplet that relates a given image as subject to a descriptive tag via an "isa" relationship.
75% of the total requests involve looking up images with tagged with a particular "isa" triplet (e.g., FIND("*", "isa", "monkey", 0, 1000)).
15% of the total requests involve looking up all of the objects associated with a given user (e.g., FIND("uri:\\kaashoek", "owns", "*", 0, 1000)).

III.2.2. Library Application

In the Library application, a library loads all of their digital library information into your system, and then makes a lookup system available to users so they can find books and authors of interest. Records are never deleted.

Triplet structure:

Books and authors are tagged with a type, as follows:

  <uri://mybook, istype, book>
  <uri://liskov, istype, author>

Authors have names and affiliations:

  <uri://liskov, name, Barbara Liskov>
  <uri://liskov, affiliation, MIT>

Books have authors, titles, and publishers:

  <uri://mybook, title, The Heart of Darkness>
  <uri://mybook, publisher, W.W. Norton>
  <uri://mybook, author, uri://conrad>

Workload:
First, library data is loaded, and then a number of FIND requests are performed.

Load: Books are loaded one at a time via a number of calls to INSERT. Each book requires several inserts (for its title, author(s), and publisher.) You can assume that URIs for books and authors are given to you.
Lookup system: Lookups take one of four forms:
35% of the requests are for all books of a given title. This will require a FIND request to find all books with the given title as the object.
30% of the requests are for the titles of all books by a given author. This will require a FIND request to find the URI of a given author name, followed by a FIND call to find books with an author triplet with that author's URI as the object.
25% of the requests are for the books from an author from a particular institution. This will require a FIND request to locate all authors with a given name, and a second FIND request to locate all authos from a particular institution. Then, the client will locate any author URIs that are in both result sets, and perform FIND requests for each of those common entries to list books by those authors.
10% of the requests are for the books from a given publisher.

IV. Your Design Proposal

The design proposal should be a concise summary (800 words) of your overall system design. It's a good idea to start out with a system diagram to show how you plan to make the overall system work. Your proposal should also make clear how your system represents triplets on disk, as well as how INSERT, DELETE, and FIND are implemented at a high level.

You do not have to present a detailed rationale or analysis. However, if any of your design decisions is unusual (particularly creative, experimental, risky, or specifically warned against in the assignment), or you want to change the API substantially, it would be wise to describe such changes in your proposal.

You will receive feedback from both your TA and the Writing Program in time to adjust your final report.

V. Your Design Report

Here are some items we are interested in seeing in your final report.

Storage plan diagram: A diagram (or a series of diagrams) that show the overall structure of your storage design. This should include the structure of blocks on disk, as well as the formats of different types of blocks. If you have any indexing structures or tables to support indirection these should also be described.
Overview: An overview explanation that describes the design from a high level, referring to and introducing your diagrams.
API Implementation: A description of your proposed implementation of each of the API calls (INSERT, REMOVE, FIND, SHUTDOWN). If you have designed performance optimizations, use a modular approach: start by explaining the unoptimized case before describing how optimizations fit in.
Workload analysis: Describe how your system performs on the workloads given in Section III.2.
Rationale: Explain the reasoning behind your system design. How do elements of your design provide good performance on your workloads? Do you see any limitations to your design that might crop up, perhaps as the system scales?

VI. Your written work

The following are some tips on writing your design report. You can find other helpful suggestions in the 6.033 writing section "How to Write Design Reports" (scheduled for Friday March 16).

Audience: You may assume that the reader has read the DP1 assignment but has not thought carefully about this particular design problem. Give enough detail that your project can be turned over successfully to an implementation team.

Report Outline (use this organization for your report):

Title Page: Give your design report a title that reflects the subject and scope of your project. Include your name, recitation time and section, and the date on the title page.
No Table of Contents.
Introduction: Summarize the salient aspects of your design, the trade-offs you made, a brief rationale for the design you have chosen, and the results from your analysis.
Design Description and Analysis: Explain and elaborate your solution. Show how your solution satisfies the design constraints and solves the design problem (or how it fails to do so). Be sure to identify trade-offs you made, and justify them. Clearly describe your analysis and the conclusions you have drawn from your analysis. You will want to sub-divide this section into subsections and use figures or code, where needed, to support your design choices.
Conclusion: Provide a short conclusion that provides recommendations for further actions and a list of issues that must be resolved before the design can be implemented.
Acknowledgments and References: Give credit to individuals whom you consulted in developing your design. Reference any sources cited in your report. The style of your references should be in the format described in the Mayfield Handbook's explanation of the IEEE style.
Word count. Please indicate the word count of your document at the end of the project.

VII. How do we evaluate your report?

When evaluating your report, your recitation instructor will look at both content and writing and assign a letter grade. The writing program will separately grade the proposal and report for writing and assign a letter grade.

The most important aspect of your design is that we can understand how it works and that you have clearly answered the questions listed in Section III.1. Complicated designs that we cannot understand will not be graded favorably. A clear presentation with effective use of diagrams, pseudocode and / or flowcharts is essential.

On the other hand, excessively simple designs are unlikely to perform well and will also be penalized. For example, a design that requires you to scan every block in the storage system to resolve a FIND request is not a good design. You will need to devise some way to answer FIND and REMOVE requests such that reading all blocks of the disk is rarely required, since such scans are very expensive. Similarly, you should also think about techniques to keep down the number of disk seeks, since seeking to many different blocks to answer a FIND request is likely to be quite slow.

Some overall content considerations:

Does your solution actually address the stated problem?
How complex is your solution? Simple is better, yet sometimes simple will not do the job. On the other hand, unnecessary complexity is bad.
Is your analysis correct and clear?
Are your assumptions and decisions reasonable?

Some writing considerations:

Is the report well-organized? Does it follow standard organizational conventions for technical reports? Are the grammar and language sound?
Do you use diagrams and/or figures appropriately? Are diagrams or figures appropriately labeled, referenced, and discussed in the text?
Does the report use the concepts, models, and terminology introduced in 6.033? If not, does it have a good reason for using different vocabulary?
Does the report address the intended audience?
Are references cited and used correctly?

VIII. Collaboration

This project is an individual effort. You are welcome to discuss the problem and ideas for solutions with your friends, but if you include any of their ideas in your solution you should explicitly give them credit. You must be the sole author of your report.

IX. Tasks and Due Dates

Two copies of design proposal (800 words or fewer) Due: February 27, 2007.

Two copies of detailed design report (2500 words or fewer) Due: March 22, 2007.

Please choose a font size and line spacing that will make it easy for us to write comments on your report (e.g., 11 pt font or larger; 1.5-spaced or greater). Please use 1-inch margins.