M.I.T. DEPARTMENT OF EECS

6.033 - Computer System Engineering (Spring 2007)

February 13, 2007


Please find additional clarifications and details in the DP1 FAQ.

6.033 Design Project 1: The Tag Storage System

I. Assignment

There are two deliverables for Design Project 1:

  1. Two copies of a design proposal not exceeding 800 words, due on Tuesday, February 27, 2007.
  2. Two copies of a design report not exceeding 2,500 words, due on Thursday, March 22, 2007.

A design report is a different beast from a quiz. As in real life, the 6.033 design projects are under-specified, and it is your job to complete the specification in a sensible way given the overall requirements of the project. As with designs in practice, the requirements often need some adjustment as the design is fleshed out. We strongly recommend that you get started early so that you can iterate your design. A good design is likely to take more than just a few days to put together.

II. The Problem

II.1. Introduction

Tagging has become a popular approach for cataloging information about pictures, sounds, video, and pages on the Web. It is used heavily in sites like Flickr and del.icio.us to allow users to annotate content with user-supplied information -- for example, a user might tag a picture of a marmoset (a kind of South American monkey) with the labels "monkey" and "funny." Then, other users who want to find pictures of monkeys might search for all images with the label "monkey" and get this picture back in return.

In this design project, your goal is to design a storage system for a large database of such tags. Rather than simply storing <filename,tag> pairs, your design will be for a more expressive tagging system. In particular, tags in your system will consist of triplets of the form <subject, relationship, object>. Each of these three elements is a string. These strings may encode references to files or other resources, or may simply be arbitrary constants. For example, the triplet:

  <file://marmoset.jpg, isa, monkey>

might be used to tag the marmoset image described above. (Note that we use Web-style "uniform resource identifiers" (URIs) like "file://" or "http://" to encode references.) Notice that this tagging approach allows arbitrarily complicated relationships to be represented because the object field can also be a reference. For example, the triplets:

  <uri://mysafarialbum, contains, file://marmoset.jpg>
  <uri://mysafarialbum, contains, file://bonobo.jpg>
  <uri://mysafarialbum, backgroundcolor, black>
  <uri://myalbumcollection, contains, uri://mysafarialbum>

indicate that uri://mysafarialbum contains two images, that when the photo album is displayed it should be rendered with a black background, and that the photo album is itself a part of a collection of albums.

Your storage system will have to manage a very large collection of these triplets -- far larger than can fit into the memory of a computer. For example, in Flickr, there are 890,000 images tagged with the term "people", 850,000 tagged with the term "flower", and 60,000 tagged with the term "monkey" -- if each triplet takes about 100 bytes to store, then the triplets for these three terms alone occupy almost 200 megabytes of storage! Flickr contains thousands of such terms, which occupy many gigabytes of storage. Thus, the primary challenge you face is to decide how to physically represent this triplet data on disk in such a way that it can be efficiently accessed and updated, just as the file systems we have studied in class (see class notes, Appendix 2.A.) attempt to lay out file data in an efficient manner.

However, there are many differences between this triplet storage system and a file system. In particular:

Note that you are not required to design the file system that contains the URIs that triplets refer to (like marmoset.jpg), just the tag storage system.

In the remainder of this design project we describe the interfaces you must provide and design requirements that your system must satisfy.

II.2. Interfaces

Your system must implement an API for applications to use. We have specified that API for you. You will need to describe how your design implements this API, and argue that your implementation is efficient across a range of workloads (discussed in more detail in Section III.1)

We have also provided you with a basic interface to the computer's disk and memory that you may assume is available to your storage system.

II.2.1. API

Your storage system should support the following API. This API is designed to allow applications to look up triplets that match certain criteria, and to add and remove triplets. You may change this API if needed, but you should provide a good reason for doing so.

Your storage system should run as a server. Applications (clients) send requests to this server via RPC.

INSERT(subject String, relationship String, object String)

Add a new triple to the storage system. After INSERT is called, any subsequent call to FIND should return this new record.
REMOVE(subject String, relationship String, object String)
Remove a triple from the storage system. After REMOVE is called, no subsequent call to FIND should find this triple.
FIND(subject String, relationship String, object String, start integer, count integer) returns array of triple

triple structure {
  subject String;
  relationship String;
  object String;
}
Find all triplets that match the specified subject, object, and relationship. Each of field may optionally be the wildcard string "*" that will match any value. The FIND call will return only count of the matching triplets, beginning with triplet start.

For example, to find the first 1000 triplets that indicate that the subject has a "isa" relationship with the object "monkey", the following request could be issued:

     results = FIND("*", "isa", "monkey",0,1000)

As another example, consider finding triplets that indicate the first 1000 objects posted by the user "uri:\\kaashoek":

     results = FIND("uri:\\kaashoek", "postedby", "*",0,1000)

In both of these cases, the results are returned into an array that is in the address space of the calling application. No state needs to be kept in the storage system about the results of this request. You may assume that the application retrieves triples in segments that will fit in its own memory -- that is, if a really large dataset is needed, the application will step through it in chunks. Your design should ensure that as long as no intervening INSERT or DELETE calls are made, repeated calls to FIND that iterate through all such chunks of a large result will eventually lead to the application receiving all triples that satisfy the FIND request.

SHUTDOWN()
Write all state of the storage system to disk in preparation for turning off the computer.

Note that we are not concerned with the correctness or performance of these methods under failure. If the system crashes, it is OK for the state of your tag system to be arbitrarily corrupted or garbled. In the second half of 6.033 we will discuss how to build systems that can tolerate various types of failures.

Note, however, that you do need be able to support a clean shutdown, which will require you to put all of the state of your storage system on disk.

II.2.2. Hardware / OS Interface

You are given a block-based interface to disk (see sidebar 2-12 on page 2-15 of the course notes), as follows:

block byte[4096]
READ_BLOCKS(blocks array of block, start_block integer, num_blocks integer) returns integer
WRITE_BLOCKS(blocks array of block, start_block integer, num_blocks integer)

Here, blocks are 4 KBs, and these routines can be used to sequentially read or write an arbitrary number of blocks to or from disk. You should assume that a disk seek is always required between consecutive calls to either of these routines. Because one of the ways we will evaluate your design is on its efficiency (see Section VII below), it is very important for you to understand the difference between sequential and random disk I/O (see section A.7. of Chapter 6 of the course notes). The following numbers should help you:

disk access time (seek + rotation): 12.17 milliseconds
block read time: .06 milliseconds

Hence, the total time to read n blocks via a single READ_BLOCKS command would be 12 + .06n, whereas the time to read n blocks via n READ_BLOCKS commands would be 12.23n.

You can assume you have 1 TB (1,000 GB) of sequential disk blocks, numbered starting at 0, available for the exclusive use of your storage system. Assume that the disk system does no caching of blocks (so two consecutive calls to READ_BLOCKS requesting the same blocks require exactly the same amount of time.)

You can also assume that you have 1 GB of main memory available to your storage system, and that the total storage required for your database is about 100 GB (though inserts and deletes may cause it to grow or shrink substantially from this amount over time), and that each triplet occupies about 100 bytes -- hence the database contains about 1 billion triplets. You may use this main memory however you like -- for example, as a cache for triplets or to keep parts of your storage system resident in memory.

III. Design Considerations

Your design report will need to describe how you implement the API given in Section II.2 as well as how data is represented on disk. You should support your design with diagrams, pseudocode, flowcharts, and examples.

III.1. Evaluation Criteria

We will be evaluating your design based on how you address the following questions and issues:

It's okay, and sometimes desirable, to say, "no, my design doesn't do that" as long as you can identify what your design can and can't do, and explain why you made the trade-offs you did. Remember: a simple, correct design that meets the requirements laid out above is more important than a complex one that is hard to understand and does a flaky job. When in doubt, leave it out!

III.2. Workloads to consider

In addition to explaining your design, you will need to analyze its performance under different workloads. This section describes several ways in which your system will be used. You should describe how each of these workloads is handled and estimate the time taken to process them. You should use the hardware parameters (seek time and block read time) given in Section II.2.2 to estimate this processing time (you can assume that the only time consuming operation in your system is disk I/O -- modern CPUs and memories are so fast that operations with them are inconsequential in comparison to the time to access disk.) For the purposes of these performance estimates, you may ignore the effects of caching or other optimizations you propose, if doing so simplifies your analysis.

The workloads come from two applications, a photo-sharing application like Flickr and a library catalog application.

You shouldn't design two different systems for these two different applications, but should come up with a general purpose design. Your system may be better at one workload -- if so, that's OK, but you should be sure to describe why. Your system also shouldn't be so tailored to these two workloads that it couldn't possibly support other types of data or distributions of requests (though, of course, you don't need to optimize your design for such unforeseen workloads.)

You can assume that these workloads are such that there is some idle time in the system -- for example, for a few hours every night, the number of requests is low and the system is relatively unburdened (allowing you, for example, to occasionally reorganize the data, if your design calls for that.)

III.2.1. Flickr++ Application

In Flickr++, some users upload photos, create albums and tag photos. Other users look for photos with particular triplets and by particular users. Occasionally, administrators delete all the photos from a particular user, or delete particular photos.

Triplet structure:

Workload:
INSERT, REMOVE, and FIND commands are mixed together in this workload. They occur with the following frequencies:

III.2.2. Library Application

In the Library application, a library loads all of their digital library information into your system, and then makes a lookup system available to users so they can find books and authors of interest. Records are never deleted.

Triplet structure:

Workload:
First, library data is loaded, and then a number of FIND requests are performed.

IV. Your Design Proposal

The design proposal should be a concise summary (800 words) of your overall system design. It's a good idea to start out with a system diagram to show how you plan to make the overall system work. Your proposal should also make clear how your system represents triplets on disk, as well as how INSERT, DELETE, and FIND are implemented at a high level.

You do not have to present a detailed rationale or analysis. However, if any of your design decisions is unusual (particularly creative, experimental, risky, or specifically warned against in the assignment), or you want to change the API substantially, it would be wise to describe such changes in your proposal.

You will receive feedback from both your TA and the Writing Program in time to adjust your final report.

V. Your Design Report

Here are some items we are interested in seeing in your final report.

VI. Your written work

The following are some tips on writing your design report. You can find other helpful suggestions in the 6.033 writing section "How to Write Design Reports" (scheduled for Friday March 16).

Audience: You may assume that the reader has read the DP1 assignment but has not thought carefully about this particular design problem. Give enough detail that your project can be turned over successfully to an implementation team.

Report Outline (use this organization for your report):

VII. How do we evaluate your report?

When evaluating your report, your recitation instructor will look at both content and writing and assign a letter grade. The writing program will separately grade the proposal and report for writing and assign a letter grade.

The most important aspect of your design is that we can understand how it works and that you have clearly answered the questions listed in Section III.1. Complicated designs that we cannot understand will not be graded favorably. A clear presentation with effective use of diagrams, pseudocode and / or flowcharts is essential.

On the other hand, excessively simple designs are unlikely to perform well and will also be penalized. For example, a design that requires you to scan every block in the storage system to resolve a FIND request is not a good design. You will need to devise some way to answer FIND and REMOVE requests such that reading all blocks of the disk is rarely required, since such scans are very expensive. Similarly, you should also think about techniques to keep down the number of disk seeks, since seeking to many different blocks to answer a FIND request is likely to be quite slow.

Some overall content considerations:

Some writing considerations:

VIII. Collaboration

This project is an individual effort. You are welcome to discuss the problem and ideas for solutions with your friends, but if you include any of their ideas in your solution you should explicitly give them credit. You must be the sole author of your report.

IX. Tasks and Due Dates

Please choose a font size and line spacing that will make it easy for us to write comments on your report (e.g., 11 pt font or larger; 1.5-spaced or greater). Please use 1-inch margins.