6.033 Spring 2004: Design Project 1

M.I.T. DEPARTMENT OF EECS

6.033 - Computer System Engineering (Spring 2004)

Design Project 1 - Video Surveillance System

Check the DP1 FAQ for updates and clarifications. You may also want to read the proposal grading notes and the slides from the writing lecture.

Assignment

There are two deliverables for 6.033 Design Project #1 (details below):

a design proposal not exceeding 800 words (~2 pages), due on February 26, and:
a report not exceeding 5000 words (~12 pages), including an executive summary, due on March 18.

Introduction

For this project, you will design the user-level software for a surveillance system that retrieves and demultiplexes several live video streams arriving from cameras via a network, processes the video streams, and extracts a set of still images. The system serves these images to Web browsers via the HTTP protocol. The central computer's hardware, the remote cameras, the Web clients, and the video processing software have been developed by outside vendors. All of these components will have to be incorporated into your system design without modifications. You are in charge of designing the software for the central computer; your design should carefully specify how all the components of the system interact with one another.

You will need to integrate two software subroutines that have been developed by other programmers. The first subroutine is the video decoding software that converts the video from a proprietary format to a standard one. The second subroutine, written by your friends taking 6.034, is a fancy computer vision program that tries to detect anomalous snippets of video. These two subroutines have bugs in them, so it is important that your system achieve fault isolation to prevent program errors from propagating among modules and crashing the entire system.

Also, because the run time of the video processing and computer vision subroutines is variable, your design must be able to cope with and recover from sudden load spikes. Your design must ensure that every incoming camera video stream is checked periodically and that client requests are serviced promptly, even under uneven load conditions.

For the purpose of this project, you should focus on the robustness and performance of the software; you do not need to address user-interface (UI) issues. (This oversight is not because the human interaction aspects of this system are unimportant! On the contrary, there are enough thorny UI issues here to fill a book, but we want to concentrate on your understanding of how operating system kernels work.) We will learn how to do recovery systematically in chapter 8 of the class notes; for this project, a human operator will notice such crashes and restart those modules. Thus, you should focus on making sure that errors do not propagate among modules, rather than on correcting errors that occur in modules.

The application: Surveillance@Home

Large, spread-out infrastructure facilities, such as refineries and shipyards, are particularly vulnerable to sabotage or attack. Companies often cannot afford to hire security guards to patrol the entire compound all night, and even those with the resources to hire watchmen can't monitor every area of the compound. By necessity, some areas are left unattended for long periods. Surveillance cameras can help a security guard to monitor multiple areas simultaneously. However, there is a limit to the number of screens a single guard can watch attentively. In practice, the high cost of uniformed guards means that many facilities considered critical to the national infrastructure are left fairly wide open to terrorism or industrial sabotage.

According to "Priceline.com founder envisions a nation of Web cam watchers guarding homeland", a company is developing an Internet-based surveillance system to solve this problem. Digital security cameras are installed around the facilities in all areas where there are supposed to be no personnel ("no-man zones"). The cameras send their video feeds over the network to a central computer, where vision software processes the video streams to pick out possibly suspicious activity. For example, it might detect motion, infrared emission (heat), or human-seeming silhouettes.

When the system's software flags a suspicious camera feed, the system does not immediately raise an alarm, because today's computer vision algorithms are far from accurate. Instead, the system uses the results of the vision algorithm to select certain key still images (also called "frames") from the video. These images are sent over the Internet to the Web clients of human "spotters" monitoring the system from their home PCs.

For each frame, the spotters click on one of three buttons, labeled "Suspicious", "No problem", and "Uncertain", based on whether there are people, or vehicles, or other suspicious entities present in the picture. If multiple spotters agree that something is wrong, the central computer raises an alarm by contacting the police.

While it's clear that this technology could be used in ways that would cause serious privacy concerns, this should not be a problem in our application scenario, in which the system only monitors "no-man zones". Therefore, you should not worry about privacy concerns in your design.

Hardware and software components

The components of interest are (1) the camera hardware and network interface, (2) the proprietary transcoder code that converts from the camera's video format to a standard one, (3) the computer vision code, (4) the central computer's operating system, (5) the central computer's hardware, (6) the human spotters (and their software), and (7) the Web server and interface software you will design.

Camera hardware and network interface

A typical deployment of Surveillance@Home could have anywhere between 100 and 1000 cameras, all communicating with a single central computer.

The digital surveillance cameras used in the system are a commercial off-the-shelf product. You are not really in a position to demand changes to the camera's hardware or software; luckily, you have chosen a reasonably "smart" network-capable model that has an embedded Web server. Each camera contains its own CPU, which is responsible for both compressing the raw video data and for sending it over the network as a Web stream. The camera's vendor compresses the raw video into a proprietary format, so your central computer has to convert the incoming video data to a standard format. Fortunately, the vendor has given you a software module that accomplishes this task.

Your central computer's software should use HTTP to retrieve video from the cameras by making GET requests to the camera's Web server. When the camera's Web server receives an HTTP GET request for the URL "/video", it responds with a video feed, starting with the most recently captured frame. The video frames arrive in packets over a reliable transport (TCP) connection. (You may find the material in chapter 4 of the class notes useful for understanding TCP, although the details should not be relevant to this project.) You may assume that any sufficiently long (more than 1 second) sequence of packets is a valid video segment, although a single packet is typically too short to contain even a single full frame of video.

To stop receiving the video stream, the receiving client (which is the central system you are designing) can call close_stream() on the TCP connection, which will cause the stream to stop. To restart the stream, the client must send a new "GET /video" request.

Your system's software should ensure that no camera is starved, because the last thing you want to do under high load is to leave parts of the compound unmonitored.

Camera video transcoder module

The camera sends data compressed using a proprietary algorithm. The camera vendor has given you a software module, compiled for the architecture and operating environment of your central computer, to decompress the video streams. This module, called the transcoder, converts a sequence of video packets in the vendor's proprietary format to a sequence of non-proprietary JPEG images. You can link the transcoder to any program you write and use the transcoder's API in your program. This API has one call you care about:

  transcode( camera_buffer, camera_buffer_size, jpeg_buffer, jpeg_buffer_size ) ;

A call to transcode reads the video frame buffer stored in the memory locations starting at the pointer camera_buffer and writes a sequence of JPEG images to the area of memory pointed to by the pointer jpeg_buffer. (Don't worry about figuring out how much memory space to allocate for the output; it's not really relevant to the goals of this project.)

As a trivial example, if the "proprietary" video format were nothing but a series of JPEGs, the transcoder could be simply a copy operation:

  transcode( camera_buffer, camera_buffer_size, jpeg_buffer, jpeg_buffer_size )
    for i from 0 to min( camera_buffer_size, jpeg_buffer_size ) - 1
      jpeg_buffer[ i ] <-- camera_buffer[ i ] ;

In reality, the transcoder implements some signal processing to perform the conversion. The bit-rate of the incoming video stream is variable and depends on things like the amount of activity or movement in the video sequence. The incoming stream's rate varies between 50 kbps and 1 Mbps. You may assume that the time taken to transcode a snippet of video is proportional to the number of bytes in the snippet. Fortunately, your network hardware and operating system are able to keep up with the 1000 cameras streaming at the peak rate of 1Mbps. However, the transcoder and vision software vary in the time to process their inputs, and you find that they sometimes become performance bottlenecks and slow the entire system down.

Computer vision software

The computer vision software module's role is to flag a frame of video as "suspicious" if it indicates unusual activity. Unfortunately for you, your company decided to contract this work out to a 6.034 student group who are also doing it as their AI final project. They are happy to give you the source, but it's fifty thousand lines of undocumented spaghetti code, and you can't seem to change anything without causing stranger bugs to appear. Really, you're lucky that the code compiles at all and produces sensible output most of the time.

In any case, because the code seems to work well enough most of the time, you decide to use it. It does crash seemingly at random, usually scribbling all over memory when it does so. You can't come up with any way to predict crashes; sometimes they seem to be caused by particular input data, sometimes they are just mysterious. You don't have any hope of fixing all the bugs in this code, so you will have to isolate the module and protect the rest of the system from errors.

Do not worry about recovering from crashes in this code. Every few hours, the subroutine will crash on some input stream, and a system operator will have to move in and pick up the pieces manually, restarting the faulty pieces. Your goal should be to minimize the disruption to the system's other tasks, such as processing other input video streams and fulfilling client requests, by preventing the error from propagating into other modules.

Like the transcoder, the vision software is provided as a precompiled module. The API contains one function:

  struct { threat, frame_pointer, frame_size } vision_result ;
  vision_result <-- detect_anomaly( jpeg_buffer, jpeg_buffer_size ) ;

A call to detect_anomaly processes a sequence of JPEG images stored in the memory region beginning at the jpeg_buffer pointer. It returns a structure which contains a number vision_result.threat between zero (0.0) and one (1.0). vision_result.threat indicates the likelihood that the sequence contains some sort of suspicious activity. The subroutine also chooses the single "most suspicious" frame from the sequence and returns a pointer vision_result.frame_pointer to the beginning of that JPEG image within the stream, along with the size vision_result.frame_size of that image.

A questionable implementation of this function might simply always return zero ("no threat") and the first image in the sequence:

  detect_anomaly( jpeg_buffer, jpeg_buffer_size )
    result.threat <-- 0.0 ;
    result.frame_pointer <-- jpeg_buffer ;
    result.frame_size <-- calculate_memory_size_of_first_image( jpeg_buffer ) ;
    return result ;

Unsurprisingly, detect_anomaly takes longer to run on longer video sequences; thus, your design may be able to control the amount of time it takes to process sequences by varying their lengths. However, keep in mind that detect_anomaly may take much longer to process some sequences than others due to features of the input. For example, an input that contains a scarecrow might require the silhouette detector to do a lot of work.

Operating system kernel

You are planning to use a fairly typical UNIX-like kernel on your central computer; your application-specific software will run as user processes and threads on this kernel. You may modify limited parts of the kernel for well-justified reasons if you want, but this is not necessary. If you decide to modify the kernel's interface or implementation, you should clearly describe the differences between your kernel and a typical kernel as well as your justifications for these differences. Do not write about how a typical kernel works; instead, conserve your precious report real estate by referencing the class notes or readings.

The default kernel is accessed from user-space via the following set of system calls ("syscalls"):

  n <--  read_stream( stream_id, buffer, size ) ;
  n <-- write_stream( stream_id, buffer, size ) ;
  close_stream( stream_id ) ;

  struct { read_stream_id, write_stream_id } pipe_streams ;
  pipe_streams <-- pipe() ;

  listen( local_port ) ;
  stream_id <-- accept_stream( local_port ) ;
  stream_id <-- open_stream( remote_host_address, remote_port ) ;

  pid <-- fork() ;
  pid <-- fork_thread() ;
    (fork_thread corresponds roughly to the clone() syscall)
  exit( status ) ;
  status <-- wait() ;
  resize_heap( size ) ;
    (resize_heap corresponds to the standard brk() syscall)

  microseconds <-- time() ;

You should recognize most of these system calls from the class notes and the UNIX paper.

The read_stream, write_stream, and close_stream syscalls are the same as in the UNIX paper: read_stream and write_stream transfer data between a stream and a region of memory, and close_stream indicates that a process is finished using a stream.

As described in the UNIX paper, pipe creates a pair of stream_ids which are connected to each other. Any data written to the write_stream_id can later be read from the read_stream_id.

The syscalls listen, accept_stream, and open_stream are similar to the stream API described in section 4.E of the class notes, and are also similar to the "sockets" API used by UNIX programs to access the network. The open_stream syscall, like the function described on page 4-75 of the class notes, establishes a TCP connection to the service running on the specified host and listening on the specified port. However, once the connection is established, our OS's open_stream returns a two-way ("duplex") stream_id. A duplex stream is like a two-way pipe: anything written to a duplex stream_id will be transmitted over the network and can later be read from the stream_id on the other end of the connection.

Calling listen(80) tells the OS that incoming TCP connections to port 80 will be handled by the current process. The process can then call accept_stream(80), which will block until another host attempts to connect to port 80 on this host (using the open_stream syscall). accept_stream will then return a duplex stream_id. The stream_id returned by accept_stream will be connected to the stream_id returned by open_stream on the other host, and can be used in the same way.

The fork syscall behaves exactly as described in the UNIX paper: it creates a new process with its own address space and controlling thread, as an exact clone of the current process. Child processes inherit all of the stream_ids of their parents. The exit and wait syscalls also behave exactly as in the UNIX paper.

The fork_thread syscall creates a new thread running in the same address space as its parent thread. The OS automatically allocates a separate stack and a PID for the new thread, but otherwise the two threads are identical.

The resize_heap syscall directs the kernel to allocate additional memory pages for the current address space, to ensure that the memory available to a process is at least a certain size.

Finally, the time syscall simply returns the current time in microseconds since midnight, January 1, 1970.

Central computer hardware

Your central computer is similar in architecture to a modern commodity PC. Its key components are a central processing unit, a network interface card, a clock chip, and a bus which connects these three modules. While many of the underlying hardware features are abstracted away by the operating system calls, we summarize them here to eliminate ambiguity.

The central processing unit (CPU) consists of a modern microprocessor (e.g., Intel x86, PowerPC, or 6.004 Beta) coupled with a large main memory (as of February 2004, that would be several gigabytes). The microprocessor contains a memory management unit (MMU) as well as bus-triggerable interrupts; these features are described in sections 2.C and 2.D of the 6.033 class notes.

The network interface card (NIC) is a standard 1000 megabit-per-second Gigabit Ethernet card; although 6.033 does not cover Ethernet until week 5, the details are not important for this project. The NIC communicates with the CPU via a memory-mapped I/O interface (discussed in section 2.A of the class notes); it can operate in either a polling or an interrupt mode (see section 2.D) to notify the CPU of incoming packets.

The clock chip also communicates with the CPU via memory-mapped I/O and interrupts. The CPU can program the clock to send a regular "heartbeat" of interrupts with a programmable delay, or it can query the clock for the current time (in microseconds since booting).

You may specify that the computer has a hard disk drive installed as another device on the main bus, but this additional complexity should not be necessary. All of the data that the system handles ought to fit within, for instance, 4 gigabytes, which is a perfectly reasonable amount of main memory in 2004. The computer is able to download its boot image from the network via its NIC, so a hard drive is not necessary for booting.

Spotters

The system's "spotters" are simply employees sitting at home in front of their PCs; their role is to prevent the system from triggering false alarms. Because maintaining custom software on everyone's home PC would be inconvenient, the spotters simply use a standard Web browser as their client. Consequently, your central computer must act as a Web server, conforming to the HTTP specification.

The Web interface to the surveillance system is not unlike the popular "hotornot.com" site: the server presents an image to the spotters, along with the question "is there a person or vehicle in this picture? [YES/NO/UNCERTAIN]." (Or, perhaps a more vague question: "is there something dangerous in this picture?") The spotters answer the question by clicking on the appropriate button, and the server feeds them a new image and question. (To keep these employees awake and alert, the server might occasionally substitute an image from a library instead of a live image, but you do not need to worry about that.)

While addressing Web security would be an important part of a full system design, do not worry about that for now; again, we are focusing on fault isolation and performance for this project.

Web server

The Web server module's role is to accept incoming requests from spotters' browsers, using the listen and accept syscalls, and to respond with an image that has recently been flagged as suspicious. Keep in mind that requests may arrive more quickly than you can send out the images one-by-one, so your server will need to serve responses concurrently.

The design of the Web server will be up to you. We recommend that you design it as a user-level process, but you may decide to move part or all of it into the kernel if you have compelling reasons. As with all design choices, you should justify your decision in your report.

Do not worry about the details of parsing and producing HTTP messages; you may assume that a 6.170 team will write code to do this in response to a spec you provide. However, you will still need to understand how HTTP works in order to design a system which receives and sends HTTP messages concurrently.

You may find the Flash paper useful for thinking about different styles of concurrency; however, keep in mind that our setting is different than the one examined in that paper, and you might want to make different design decisions than the Flash authors made.

System composition

The Web server is only one subsystem among several; just as important is the way the components will be composed to form the whole system. Your primary task is to design the interconnections among the software modules on the central computer, i.e., both the interface definitions and any "glue" code (for example, queue managers) that will be necessary. (In practice, modern software systems frequently consist primarily of glue code which governs the interactions among functional subunits.)

When the computer is initially turned on, it will boot with a single user program image in memory and a single process running that image. You should describe how this initial process evolves into the steady-state system (by creating pipes, spawning threads and processes, setting up network connections, and so on).

Since modules within the computer must communicate amongst themselves via pipes or sockets, you should also specify the format of the messages they send and receive. You may find the discussion of remote procedure call (RPC) and marshaling in section 2.B of the class notes useful, although you should keep in mind that the full power of RPC may not be necessary for this project.

Design goals

Correctness: it has to work! (Do not underestimate this requirement!)
Fault isolation: errors should not propagate among the central computer's software components.
Scalability: the software should be able to handle as many cameras and web clients as possible, given the capabilities of the underlying hardware.
Graceful performance degradation: in the event of heavy load from web clients or from cameras, the system should:
1. continue to make progress on serving client requests, and:
2. continue to check each camera for suspicious activity as regularly as possible.
Note that it is OK for the system to take some performance hit from overload, as it is expected to be a temporary situation. (If the system is under chronic overload, then the hardware needs to be upgraded!) Achieving the second subgoal will require considering the issues on sharing resources raised in section 3.D of the class notes.

Issues to address

Your design report should address the following questions.

If a call to transcode or detect_anomaly triggers a bug and causes a stray memory write (or a million), how would other modules be affected?
If a module crashes while processing one video stream, are any other streams affected as well? How many client requests are disrupted, if any?
Could CPU usage by the codec or vision modules interfere with the timely processing of network traffic? Could heavy network load prevent the CPU-bound modules from making progress?
If a blizzard in Minnesota suddenly causes detect_anomaly to run very slowly on video streams from Minnesota, will the system continue to process the inputs from cameras in Florida regularly, or will it starve them of attention?
What are the primary factors that contribute to:
- the time between the server receiving a Web client request and sending out the response?
- the latency between a camera capturing a frame and the server being able to send it out to spotters?
- the system's maximum throughput of video data and of Web requests?

Your design report should also include the following information:

a diagram of the overall architecture of the system and its decomposition into modules;
a description of how threads and address spaces correspond to modules;
a description of how modules communicate with each other;
a detailed diagram or description of the path a frame of video data takes through the central computer;

Required reading

Hypertext transfer protocol -- HTTP/1.1. RFC 2616.
Dennis M. Ritchie and Ken L.Thompson. The UNIX time-sharing system. In Bell System Technical Journal 57, 6, part 2 (1978) pages 1905-1930. (Reading #6 from the reading list.)
Vivek S. Pai, Peter Druschel, and Willy Zwaenepoel. Flash: An efficient and portable Web server. In Proc. of 1999 Annual Usenix Technical Conference, June, 1999, pp 199-212. (Reading #7 from the reading list.)

Your written work

We now provide some suggestions on writing style and outline the standard structure of a design report.

Suggestions on writing style

Who is the audience for this paper? Write for an audience consisting of colleagues who took 6.033 five years ago. These are readers who understand the underlying system and network concepts and have a fair amount of experience applying those principles, but they have not thought carefully about the particular problem you are dealing with. Assume that your paper will also be used to convince readers that you have a feasible design. Finally, give enough detail that your project can be turned over to an implementation team with some confidence that you will not be surprised by the result. One qualitative way that 6.033 reports are evaluated is by asking the question, "Do we want this person on our team? Can this designer provide us an accurate description of his/her design?"

Following are some tips on the organization of a design report. You can find other helpful suggestions on writing this kind of report in the 6.033 online lecture "How to Write Design Reports" (see the 2003 webpage). You may also want to look at the Mayfield Handbook's explanation of IEEE documentation style. A good book on writing style is: "The Elements of Style," by William Strunk Jr. and E. B. White, Third Ed., MacMillan Publishing Co., New York, NY, 1979. (Also available from the MIT libraries.)

Report Outline

Following is a suggested outline for your report. The full report (including executive summary) should be no longer than 5000 words (approximately 12 pages) long, single-spaced.

Executive Summary

Think of an executive summary as a long abstract. The executive summary for DP1 should be no more than 1200 words (approximately 3 pages) in length. The executive summary is a summary of the entire paper. It is not an outline of the organization of the paper! It states the essential points of your solution, the rationale for your approach, and a brief justification for your design. The executive summary will be read by your recitation instructor, so make sure that you provide him/her with an accurate, concise statement of your project. Write the executive summary after you have written your report. Include the title of your design, your name, recitation time and section, and the date on top of the executive summary.

Title Page

Give your design report a title that reflects the subject and scope of your project. Include your name, recitation time and section, and the date on the title page.

Table of Contents

1.0 Report Introduction

Explain the rationale for the project. Provide a problem statement and a statement of purpose. You may assume that the reader has read the DP1 assignment; you do not need to restate the problem in detail.

2.0 Design Overview

Explain the approach or architecture conceptually before delving into details, components, and mechanics. (Remember, you are not writing a mystery novel!) Present any analysis clearly and concisely. Make sure that you include a figure of your design architecture.

3.0 Design Description

Explain and elaborate your solution. Show how your solution satisfies the constraints and solves the problem (or how it fails to do so). Explain how your design choices are reasonable or desirable in the context of the problem statement. Include analysis of the estimated (or measured, if it applies) performance characteristics of your solution. (Some writers add a separate section to their report that specifically addresses performance analysis.) You may need to use figures or pseudocode in this section. If you use pseudocode to illustrate your solution, be sure to describe what the pseudocode does in English as well.

4.0 Feasibility

Describe the alternative approaches that you considered and rejected, and why you rejected those approaches. Your paper will be more convincing if you explain why your design is appropriate and why your design is better than the alternatives. (For example, if another approach would meet all of the objectives perfectly but the cost would be 100 times higher, then you should mention the exorbitant cost as a reason for choosing your less general but cheaper approach.) Comparisons with another design should not assume an incompetant version of the other design, but rather one that has had at least as much careful optimization as your design.

References

Document your sources, giving a list of all references (including personal communications). The style of your citations (references) and bibliography should be in IEEE format.

How do we evaluate your report?

When evaluating your report, your instructor will look at both content and writing.

Some content considerations:

Does your solution actually address the stated problem?
Do you explain your decisions and the trade-offs?
How complex is your solution? Simple is better, yet sometimes simple will not do the job. On the other hand, unnecessary complexity is bad.
Is your solution a good fit with the goals of the system?
Is your analysis clear?

Some writing considerations:

Does the report provide a context for the project before describing the design details?
Is the report well organized? Does it follow standard organizational conventions for technical reports?
Does the report use diagrams where appropriate? Are the diagrams appropriately labeled and referenced in the text?
Does the report use the concepts, models, and terminology introduced in 6.033? If not, does it have a good reason for using a different universe of discourse?
Does the report address the intended audience?
Is there a list of References?

Phase Two writing considerations (Seniors)

If you are enrolled in the 6.033 writing practicum, you do not need to do anything special; your practicum instructor will explain how the report will get you credit for the Phase II writing requirement.

CI-M Considerations (Juniors)

Your design report will constitute part of your writing grade for the Communication Intensive requirement.

Collaboration

This project is an individual effort. You are welcome to discuss the problem and ideas for solutions with your friends, but if you include any of their ideas in your solution you should explicitly give them credit. You must be the sole author of your report.

Tasks and Due Dates

Design proposal (800 words or less, approx. 2 pages). Due: February 26, 2004.
This should be a concise summary of your design choices and of the overall system design. Also, if any of your design decisions are "unusual" (particularly creative, experimental, or risky, or specifically warned against in the assignment), it would be wise to describe them here. You will receive feedback from both your TA and the Writing Program in time to adjust your final report.
Executive summary (1200 words or less, approx. 3 pages). Due: March 18, 2004.
This should be submitted with your full report.
Detailed technical report (5000 words or less, approx. 12 pages). Due: March 18, 2004.
The first three pages of your report include your executive summary. That is, the total number of pages should be 3+9 = 12, not 3+12 = 15.

Please use 1-inch margins and single-space. Remember to use diagrams where appropriate.

6.033 home // Questions or comments: // Last updated $Date: 2004/03/13 00:04:04 $ by $Author: ctl $