DP2 : Mars Rover

M.I.T. DEPARTMENT OF EECS

6.033 - Computer System Engineering (Spring 2004)

Design Project 2 - Mars Rover

Assignment

There are three deliverables for 6.033 Design Project #2 (details below):

a design proposal not to exceed 800 words (~2 pages), due on April 13,
a report not to exceed 5000 words (~12 pages), including an executive summary, due on May 6th, and
a presentation on May 11th (details will be given as the date approaches).

On this project, you will work in teams of 3 students (all of whom must have the same recitation instructor). Send your list of team members to your recitation instructor via email by 11:59 PM on April 5.

Each team should submit 2 copies of the proposal (one for your recitation instructor and one for your TA).

0. Updates

This design project was last updated April 25, 2004. For more information, please see the version history and FAQ.

1. Introduction

Distributed Rovers: Fast, Cheap, and Out of Control

The recent success of The National Aeronautics and Space Administration's (NASA's) Mars rovers, Spirit and Opportunity, has led NASA to begin planning another robotic mission to Mars. This time, NASA engineers have decided to send 50 small, cheap rovers that will be able to spread out and explore a 12-kilometer diameter impact crater. Because these rovers are small and inexpensive, they have limited hardware capabilities. In particular, NASA has decided not to equip the rovers with costly communication hardware that can transmit off of the planet. Instead, rovers are equipped only with short range radios, capable of transmitting a hundred meters at a time at low data rates. To get the data off-planet, rovers must navigate to a control center in the middle of the crater and radio data to it. The control center, which is much more expensive and fault-tolerant, then uses a powerful radio to contact Earth. The advantage of this design is that the costly control center is stationary, so can be large and well protected. Additionally, lack of moving parts make the control center less failure prone, since mechanical parts tend to fail much earlier than other components. Furthermore, although rovers are individually unreliable, their large number means it will take a long time for all of them to fail, and that a large portion of the crater can be explored.

Your job is to design the command and communication systems for the rovers and the control center. NASA engineers have already taken care of the hardware design and many of the low-level robotics problems. Each rover is equipped with a navigational system that can guide the rover to a particular location on the planet's surface, as well as a set of low-level communications primitives that can broadcast radio messages (details on these subsystems are given below). You must integrate these components into a larger system that provides three core functions:

Mission Distribution. NASA will periodically upload new "missions" to the control center for rovers to execute. Each mission consists of a list of geographic coordinates to visit, as well as a number of "experiments" to run. Experiments consist of tasks such as taking a photograph at a certain heading or obtaining a mineral sample. Details of the mission format are given below. You must ensure that missions are distributed from the control center to rovers such that any missions NASA uploads will eventually be executed (assuming there are still any working rovers.)
Mission Execution. Once a rover has been given one or more missions, it can begin executing them. To complete a mission, rovers will have to travel long distances, out of radio communications range of other rovers or the control center. This travel time can take days or hours. Each mission experiment can also take many hours to complete.
Result Collection. Eventually, after completing part or all of a mission, rovers should deliver their data back to the control center. Rovers may pass mission results to other rovers, rather than delivering them directly to the control center.

There are a number of additional problems and issues that you must consider when designing your solution. These include:

Rover failure. A rover can permanently fail for one of several reasons (for example, batteries can malfunction and cease to charge, software can crash, etc.) The control center needs to detect that a rover has failed and reallocate its missions to other rovers. It is also possible that a rover can fail while communicating with the control center or another rover; you must design your system to account for such failures.
Unreliable Communication. Because messages can be lost or garbled when communicating, you must devise a way of making communications reliable.
Faulty Sensors. The sensors used in some experiments will produce incorrect readings. NASA identifies experiments where this is particularly likely to happen; you will need to devise a method for reducing the probability of delivering incorrect data in such cases.
Mission amendments. NASA engineers will sometimes change the specifications of ("amend") missions that have already been uploaded to the control center. Some of these amendments will be to missions that have already been distributed to and partially or completely executed by rovers. Amendments do not usually invalidate an entire mission but instead add additional experiments or alter a few existing experiments. Amendments do not apply to missions that have passed their commit point. The commit point of a mission is when the Control Center confirms that it has accepted all of that mission's results.
The mission and amendment format is given below.
Competing Missions. The European Space Agency (ESA) has launched a follow-on to its recently failed Beagle 2 rover, called "Poodle". NASA and the ESA have hired the same sub-contractors so that the two rovers share many common hardware components. As a result, with Poodle rovers also (possibly) on the planet, NASA engineers are concerned that messages from Poodles not be mixed with data from their rovers. Poodles are not, however, hostile; the concern is simply that NASA's data be distinguishable from the ESA's.

2. Hardware and Software Components

The hardware components of the system include the rovers and control center. The software components of the system consist of the navigational subsystem and the communication subsystem.

2.1 The Rovers

There are a number of architectural aspects of rover design that you may find useful in designing your system:

Rovers are powered by solar cells and backup batteries. The batteries have enough energy storage capacity that the rovers may continue to operate through the night.
Rovers have a 1 gigabyte flash memory.
Rovers have a low power radio that can transmit about 100 meters.
Rovers have a mean time to failure (MTTF) of 100 (Martian solar) days (regardless of whether the rover is idle or active.) Failures are fail-fast, and rovers cannot be repaired.

2.2 The Control Center

The control center's architectural features include:

Solar cells and battery backup as with the rovers.

20 gigabytes of (very expensive!) fault tolerant Flash memory.
A radio interface capable of communication with nearby rovers.
A high-gain antenna tuned to communicate with Earth.

NASA's engineers have designed the control center such that they expect it to outlast all of the rovers, so you do not need to worry about control center failures.

Most of the time, it takes no more than 12 hours for Mars to rotate such that it is aligned with Earth; however, NASA expects that Martian dust storms and solar flares will sometimes knock out connectivity between the control center and Earth for days at a time. Thus, the control center must be able to make intelligent, autonomous decisions about how to dispatch the rovers on the planet's surface.

2.3 Navigational Subsystem

NASA has equipped the rovers with a sophisticated navigation system that can (usually) guide the rovers from any location in the crater to any other location. You are not required to design any aspect of the navigation system. Your application will make non-blocking requests to the navigation system. When a destination is reached, an upcall is made from the navigation system into the application.

To navigate to a specific location, the application calls the navigateTo procedure, as follows.

procedure navigateTo(long, lat)

Where long and lat are coordinates of a location on the planet's surface.

When the rover has reached a specific destination, the navigation subsystem calls the destinationReached procedure in the application, as follows:

procedure destinationReached(long, lat)

Note that it can take several days to travel across the crater. The navigational system also provides a simple API that can be used to estimate the travel time from one location to another, as follows:



procedure travelTimeInHours(src_long,
src_lat, dest_long, dest_lat) returns
integer

Each rover has a procedure currentLocation() that returns its current coordinates, as well as a global variable CC_LOC that contains coordinates of the control center. You may assume that two rovers that navigate to the same location (or a rover that navigates to CC_LOC) can communicate with each other (or with the control center.)

2.4 Communication Subsystem

Rovers and the control center include basic communications software that provides a simple API, as follows:

procedure sendPacket(data, length)

This command sends a message, msg (of length length) as a broadcast message over the radio. This message may be received by one or more rovers or the control center if they are within communication range (about 100 meters). The sendMsg command is a blocking call that returns to the caller when the message has been sent. The exact probability of any receiver successfully receiving this message depends on the distance between the sender and receiver, as well as surrounding geographic features; you may assume that the probability is 90% or better per in-range receiver per attempt. Messages may be up to 1500 bytes in length. You will need to design a protocol to segment experimental results into packets, and to properly deliver results despite loss. You may use a TCP-like reliable protocol if you wish; be sure to state why such a protocol is or is not appropriate for your design.

You may assume the time to communicate (even if sending large amounts of data) is negligible compared to the hours and days required to travel and run experiments.

When a message is received by a rover or the control center, an upcall to a function deliver_message is made from the communication subsystem into the application. (See page 4-74 of the class notes for an example of a similar interface between the end-to-end layer and the application in the 6.033 network stack.) The specification for this function is:

procedure deliver_message(packet, length);

You will need to design several features on top of this basic interface. In particular, you should consider:

How you will name the control center and rovers in messages.
How a rover or the control center will determine what other devices are nearby.
How you will ensure that missions and mission results are successfully exchanged between pairs of devices despite high communication loss rates.

2.4.1 Message Authentication

Because of the presence of the Poodle rovers, NASA wants to ensure that messages from Poodles are not accidentally interpreted as a message from one of its rovers or the control center. You will need to design a message authentication mechanism that can be used on all messages transmitted by your system. You do not need to seal messages, but must assure (with very high probability) that any message received by a rover or the control center was sent by another (NASA) rover or the control center.

3. Mission Format

Missions consist of a list of experiments to be run in a specified order, each at a particular geographic location, as well as information about the expected execution time of each experiment. The total mission execution time is the sum of the execution time of all the experiments, plus the travel time between each of the experiment sites. Each mission has a unique identifier as well as a "type" (see Section 3.1: Mission Types). For some experiments, there is a non-zero probability that a sensor fault will occur (see Section 3.2: Faulty Sensors). The mission format is shown in Table 1. For purposes of illustration, missions and experiments are numbered. You may use a different identification scheme if you desire.

Field

Example Value

Mission ID

Mission Type

Random

Location

Lat: 37.33, Lon: -121.03

Prob(sensor fault)

Experiments

< ID 1: panoramic photo from 220° to 270° (3 hours) at 37.33, -121.03,

ID 2: drill hole at depth of 8 centimeters (5 hours) at 37.40, -121.70,

ID 3: acquire mineral spectrograph at depth 8 centimeters (2 hours) at 37.40, -121.70,

...

Table 1: Mission format and example values

Recall that missions may be amended. Amendments consists of a list of experiments to be added and a list to be deleted from a particular mission. Each new experiment includes the ID of the experiment they should follow (see Table 2).

Field

Example Value

Mission ID

Deleted Experiments

{2}

Added

< ID 2: drill hole at depth of 12 centimeters (6 hours) at 37.40, -121.70; follows experiment ID 1

...

Table 2: Amendment format and example values

3.1 Mission Types

The mission type field specify information about the geographic distribution of experiments in missions. There are three types:

Scattered. Experiments are not expected to be located near each other.
Clustered: Experiments are tightly grouped in a small geographic area.
Scattered-Clusters: Experiments consist of several clusters that are geographically distributed.

You may wish to use the different mission types as a way to decide when to save partially completed missions, either by offloading results to another rover or bringing them back to the control center. (See Sections 5 and 6 below for a discussion of mission execution and result collection.)

Figure 1 illustrates these three types of missions.

Figure 1: Diagram of circular crater with control center at the middle. Examples of the three types of missions are illustrated, with black lines showing order of experiments.

3.2 Faulty Sensors

Many of the experiments involve sensors that measure, scan, survey, or otherwise acquire information about the environment. For each sensor, NASA provides a range and accuracy specification, and a probability of sensor failure. Thus the specification for a thermometer might be:
range: -55C to +225C
accuracy: +/- 1.5C
failure probability: 0.07 per reading

The faults that cause sensor failures are transient, and when they occur the sensor fails by producing a random output value, selected uniformly from that sensor's specified range. When a sensor failure occurs, there is no direct warning that the sensor result is bad. The rover continues to function normally, and if it repeats the experiment the probability of failure on next use of the same sensor is completely independent of what happened on the previous use.

Even complex sensors such as cameras conform to this model. A camera, for example, might, in addition to the data of the photograph, return a measure of the variance of the picture density, with a range of zero to 1.0, an accuracy of +/- 0.1 and a probability of failure of .15 per photograph taken. Just as with the thermometer, two photographs of the same scene taken at about the same time would be expected to have the same density variance measure, within the accuracy specification, and if the camera fails, the density variance measure will have a random value within the specified range.

NASA has asked you to design, as a part of your system, a technique to detect bad sensor readings and thereby assure that no more than 5% of the experimental results returned to Earth are victims of sensor failure. It may be necessary for a rover to run some experiments more than once to accomplish this goal.

3.3 Result Sizes and Format

Missions are constructed such that the entire set of experimental results from a mission requires about 600 megabytes of storage. Thus, each rover can store results from one entire mission, plus the results from a few experiments from other missions. You may assume that each experiment produces between 1 and 100 megabytes of data.

You will need to describe the mission result format data structure in your report. Remember that amendments can require you to discard some already collected results, so you will need to show how amendments are applied to your results data structure.

4. Mission Distributor

The mission distributor runs on the control center and is responsible for handing missions out to rovers. You must design this system, including a description of:

Your approach for allocating missions to rovers. Deploying additional rovers for a given mission may improve your ability to tolerate faults or increase reliability.
How you will deliver mission amendments. Bear in mind that amendments can arrive at any time, after a mission has been dispatched to a rover or even after a mission has been committed. Consequently, some experiments deleted by an amendment are run anyway; you must ensure that no experiments added by amendments before the commit point fail to be run. Because experiments are required to be run in a particular order, when an amendment is received, all experiments following any added or deleted experiment must be rerun. You will need to design data structures for both the rovers and control center that track which experiments still need to be run, given a set of amendments and experimental results from fully or partially executed missions. Missions may be amended multiple times, and the probability that an amendment will arrive during any period of time is independent of when an amendment was last received (so there is no benefit to waiting to execute a mission until after an amendment has been received.)
The protocol used to communicate missions and amendments between the control center and rovers or between pairs of rovers. You may assume a mission can be described in a single radio message. Make sure you describe how your protocol deals with the possibility of transmission failure, including what it does about the possibility of duplicates.

NASA will continue uploading missions as long as there are still unfailed rovers. At any given time, however, there will be about 25 missions either waiting to be distributed or being executed. As mission results are sent back to Earth, NASA engineers analyze the experimental results and transmit new missions.

5. Mission Execution

You must also design the software that executes the mission plans on the rovers. Mission execution requires that the rovers travel to each of the experimental execution sites, and carry out the mission experiments for each site. Experiments must be executed in the order specified in the mission, as they represent serial actions on the planet's surface. As with travel time, the experiments in a mission can take days to execute. You may wish to consider having several rovers cooperate to complete a mission.

You may assume that rovers are equipped with a high level API for executing each of the experiments in the mission. These are blocking function calls, such as:

procedure takePanorama(startAngle, endAngle) returns PanoramicPhoto

You will need to make sure to address the following issues:

How does the control center detect that a rover has failed to execute a mission? How does it respond to failures?
How are mission amendments for currently executing missions handled?
How do you use mission type to help with mission execution, if the additional information about types is used at all?
What data structure will you use for collecting experimental results?

6. Result Collection

You will need to design the protocol for delivering experimental results from mission execution between rovers and the control center, or (if needed) for exchanging results between a pair of rovers. This protocol must be robust to failures in one (or both) of the rovers. You should make it clear how your system assures that results from completed experiments eventually reach the control center (even if the rover carrying experimental results fails while running experiments, while in transit, or while transferring results to the control center). Make sure you describe how your protocol deals with the possibility of transmission errors, including what it does about the possibility of duplicate messages generated by any retransmissions. Also, describe how you will handle sensor faults.

Each experiment in a mission produces some results, which must be delivered back to the control center (see Section 6 below.) Experimental results for a single mission are not required to be delivered together. If a mission is partially executed when a rover fails, but the rover has managed to offload results from some experiments before failing, another rover may restart experiments from where the failed rover left off (of course, that other rover must have to have some way of knowing where it should restart.)

The control center will deliver results to Earth when the Earth is visible from the control center. You do not need to design this portion of the system; you may assume that the control center has enough bandwidth in a typical day to deliver all of the results stored in it at a given time. Once results have been transmitted, they are deleted from the memory of the control center.

Note that mission amendments will not arrive after the results for a mission have been offloaded to Earth. You should make sure that your solution for applying amendments works for committed missions, however, as amendments may arrive while mission results are waiting to be sent to Earth.

7. Your Design

In this section, we discuss the critical aspects of your design that you should be sure to cover in your writeup.

7.1 Design Goals

The primary goal of your design should be to ensure that missions uploaded to the control center are successfully executed and the results will be delivered back to the control center, despite failed rovers, including any amendments made to missions.
You should also try to mitigate faults and avoid repeating work as much as possible, as NASA wants to be able to explore as much of the crater as it can before all the rovers fail.

7.2 Issues to Address

In addition to the issues mentioned in Sections 4, 5, and 6, make sure your report addresses the following additional topics:

Does your report clearly describe the following protocols (using timing diagrams, pseudocode, and English descriptions)? There may be others, but all design projects should include:

A protocol for distributing missions to rovers.
A protocol for distributing amendments to rovers.
A protocol for transferring experimental results from the rovers to the control center.

How do you name rovers for communication purposes (see Section 2.4)?
How do your protocols for exchanging missions and experimental results tolerate lossy communication? When can a rover delete results for a particular experiment from its memory? Can results be delivered multiple times? Can missions be executed multiple times?
How well does your approach continue to function as rovers fail and their total number dwindles?
When an amendment is applied to a partially completed mission, how do you ensure that any parts of the mission following the amended portions are redone?
How does your approach assure that received messages actually originated from another over or the control center (and not a Poodle rover)?

Your report should describe an integrated system design that clearly describes how you will address the three key problems (mission distribution, mission execution, and result collection). Be sure to describe your solution in enough detail that we can clearly see that it will work. You may want to include:

a diagram of the overall architecture of the system and its decomposition into modules;
pseudocode for appropriate algorithms (such as the algorithm for allocating missions to rovers [see Section 4]);
protocol diagrams, as described above;
data structures (e.g, for managing missions and amendments).

When possible, you should rely on protocols or algorithms from the text or papers rather than reproducing them in your report (but be sure to tell us why they are appropriate). Finally, please refrain from extensive mathematical modeling. It is fine to include back-of-the-envelope calculations to justify a design decision, but (at most!) you should need only simple algebra and basic probability theory.

8. Recommended Reading

In addition to concepts covered in 6.033 before Spring Break, this project involves concepts from chapters 6, 7, and 8, some of which you will not have covered in class before you begin working on this project. Rather than reading all of the chapters before starting your project, you may wish to use a different strategy: look up unfamiliar terms in the indexes of those chapters and read just the sections where they are defined or described. In addition, review the glossary definitions of those terms. Also look at the schedule of reading assignments and lecture topics to see when relevant topics will be covered in class.

If particular, you may want to look up the following terms in the text:

Chapter 6:	sign, authentication
Chapter 7:	fault tolerance design process, forward error correction, fault, failure, error, fail-vote, fail-fast, masking, detectable error, mean time to failure, transient failure
Chapter 8:	commit point, undo, redo, checkpoint, recoverability, idempotent, atomic

For additional background, see this Wired.com article on how Mars Opportunity and Spirit Rovers get data back to Earth:
http://www.wired.com/news/technology/0,1282,62409,00.html

9. Your written report

We now provide some suggestions on writing style and outline the standard structure of a design report.

Suggestions on writing style

Who is the audience for this paper? Write for an audience consisting of colleagues who took 6.033 five years ago. These are readers who understand the underlying system and network concepts and have a fair amount of experience applying those principles, but they have not thought carefully about the particular problem you are dealing with. Assume that your paper will also be used to convince readers that you have a feasible design. Finally, give enough detail that your project can be turned over to an implementation team with some confidence that you will not be surprised by the result. One qualitative way that 6.033 reports are evaluated is by asking the question, "Do we want this person on our team? Can this designer provide us an accurate description of his/her design?"

Following are some tips on the organization of a design report. You can find other helpful suggestions on writing this kind of report in the 6.033 online lecture "How to Write Design Reports". You may also want to look at the Mayfield Handbook's explanation of IEEE documentation style. A good book on writing style is: "The Elements of Style," by William Strunk Jr. and E. B. White, Third Ed., MacMillan Publishing Co., New York, NY, 1979. (Also available from the MIT libraries.)

Report Outline

Following is a suggested outline for your report. The full report (including executive summary) should be no longer than 5000 words (approximately 12 pages), single-spaced.

Executive Summary

Think of an executive summary as a long abstract. The executive summary for DP2 should be no more than 1200 words (approximately 3 pages) in length. The executive summary is a summary of the entire paper. It is not an outline of the organization of the paper! It states the essential points of your solution, the rationale for your approach, and a brief justification for your design. You may include a figure, if appropriate, in your Executive Summary.

The Executive Summary is a stand alone document that is placed BEFORE the title page. Because it is placed before the title page, your name and title of your project should be placed on the Executive Summary. The Executive Summary should be organized into enumerated chunks and given descriptive headings. Do not staple the Executive Summary to the report. Write the executive summary after you have written your report.

Title Page
Give your design report a title that reflects the subject and scope of your project. Include the names of your team members, recitation time and section, and the date on the title page.

1.0 Design Overview
State the design purpose, list specific design considerations and then briefly state your approach to each of those considerations. You may assume that the reader has read the DP2 assignment; you do not need to restate the problem in detail. In your Design Overview or early in your Design Description, provide a figure of your design architecture. Your Design Overview will probably be no longer than 1 or 2 paragraphs.

2.0 Design Description
Explain and elaborate your solution. A clear, detailed design description is paramount to convincing readers that your design choices are well-justified. Show how your solution satisfies the constraints and solves design problems (or how it fails to do so). Explain how your design choices are reasonable in the context of the problem statement.

Design descriptions also typically explain alternative approaches that you considered and rejected, and why you rejected those approaches. Finally, design description often include an analysis of the estimated (or measured, if it applies) performance characteristics of your solution.

You may need to use figures or pseudocode in this section. If you use pseudocode to illustrate your solution, be sure to describe what the pseudocode does in English as well. The Design Description should be chunked into sections with subheadings and organized by design function or system module.

3.0 Conclusion
Evaluate your design in the Conclusion. Summarize design problems you solved,identify problems in your design, and justify why your design does not address these problems.

References
Document your sources, giving a list of all references (including personal communications). The style of your citations (references) and bibliography should be in IEEE format.

How do we evaluate your report?

Your report will be evaluated for content and writing fluency.
Some content considerations:

Is the design described unambiguously?
Does the design achieve design goals?
Are your design decisions well justified?
Do you fulfill all the design requirements?

Some writing considerations:

Is the report well-organized within and across sections?
Is it professionally presented?
Are text and figures integrated?
Is the writing clear?

Collaboration

On this project, you will work in teams of 3 students. Send your list of team members to your recitation instructor via email by 11:59 PM on April 5.

Tasks and Due Dates

Design proposal (800 words or less, approx. 2 pages). Due: April 13, 2004.
This should be a concise summary of your design choices and of the overall system design. Also, if any of your design decisions are "unusual" (particularly creative, experimental, or risky, or specifically warned against in the assignment), it would be wise to describe them here.
Executive summary (1200 words or less, approx. 3 pages). Due: May 6, 2004.
The executive summary should be submitted with your full report.
Detailed technical report (5000 words or less, approx. 12 pages). Due: May 6, 2004.
The first three pages of your report include your executive summary. That is, the total number of pages should be 3+9 = 12, not 3+12 = 15.
a presentation on May 11th (details will be given as the date approaches).

Please use 1-inch margins and use single-sided printing. Include your name on each page of your report (the footer is a good place to add your name). Remember to use diagrams where appropriate.