6.173 Lab #1: The Traveling Salesman Problem

Deadlines

Useful links

Goal

The hardware for Beehive is synthesized from a Verilog description and run on a Xilinx XUPV5 board connected to the lab server. The XUPV5 board contains a large Virtex 5 FPGA, 2GB of local DDR2 DRAM, a gigabit Ethernet (used to initialize the 2GB main memory), and a serial connection (used for console input/output). Beehive implements a pipelined ring interconnect to which are attached

a number of RISC cores each with simple I & D caches and hardware for message passing and locks.
a modified RISC core connected to the gigabit Ethernet interface which performs packet transfers requested via intercore messages.
a modified RISC core that peforms DMA operations requested via intercore messages.
a custom module that controls the ring and performs cache-line reads/writes of main memory in response to memory requests from the RISC cores.

The software for Beehive is a simple multicore runtime written (mostly) in C. The tool chain includes a version of GCC, an assembler, a linker, and run-time library wrapping the hardware features of Beehive.

Development Environment

There are special hours around holidays -- please see the signs posted in the lab.

The tools will probably work from your own machine if you are running Ubuntu or Debian and have AFS access. Sorry, no guarantees though!

Task 1: Sign-up for lab server account

Please complete this task by Friday, 9/10.

In order that we may set up an account for you on the lab server, please email the following information to 6.173-labs@mit.edu:

Athena username
initial password (you can change it on first login)
(optional) Athena username of preferred partner

Since we have more students than XUPV5 boards, we'll be asking you to share access to a board with another student in the class. If you care who you are sharing with, please send the Athena username of your preferred partner. We'll be happy to assign a partner if you have no preference.

Task 2: Set up and try tool chain

On a debathena workstation, start a bash shell and initialize the environment for the 6.173 tool chain (you'll do this each time you login to work on 6.173):

eelab01% bash
you@eelab01$ add 6.173
you@eelab01$ source /mit/6.173/tools.sh

One time only: check out the files you'll need from our git repository -- this will create a subdirectory with the C, assembler and Verilog files for building the Beehive and main memory images:

you@eelab01$ cd ~/Private    (or wherever you want to work...)
you@eelab01$ git clone /mit/6.173/f2010_student.git 6.173_repo
you@eelab01$ cd 6.173_repo

Now perform the initial build of the Beehive runtime, which, among other things, will construct o/hello.img, the first program we'll run on the Beehive.

you@eelab01$ cd sw
you@eelab01$ make
... lots of print out
you@eelab01$ ls o
... listing of files you just built

The results of the compilations are left in the directory sw/o. The .img files are suitable for loading into main memory of the Beehive; the corresponding .out.map file shows the location of all the routines and data in main memory.

Next, synthesize the Beehive hardware from the Verilog source. This will take a while (15 mins?); in order to make things a little faster, you'll be building the version of Beehive with three RISC cores (there's also a 13-core version).

you@eelab01$ cd ../hwv5
you@eelab01$ make risc_3.bit
... lots of print out

Assuming your account has been set up on the lab server (beectl.csail.mit.edu), you're ready to copy the results of the compilation and synthesis to the server.

you@eelab01$ cd ..      (top-level of your class directory)
you@eelab01$ scp sw/o/hello.img beectl.csail.mit.edu:
...
you@eelab01$ scp hwv5/risc_3.bit beectl.csail.mit.edu:
...

Now ssh into the lab server and determine which XUPV5 board you've been assigned.

you@eelab01$ ssh beectl.csail.mit.edu
...
you@beectl:~$ groups
you boardXXX

where XXX is a number between 1 and 14. At this point it's convenient to define a bash script called run that will set up the board and download environment:

you@beectl:~$ cat >run
N=`groups | sed 's/.*board\([0-9]*\).*/\1/'`
source /opt/Xilinx/11.1/settings32.sh
cp $1 /tftpboot/board$N
sed -e "s%risc.bit%$2%" /home/boards/download.cmd >download.cmd
impact -batch download.cmd && kermit /home/boards/board$N.kermit
Ctrl-D
you@beectl:~$ chmod 755 run

Now you can use this script to set up the main memory image file for downloading, program the XUPV5 board with your .bit file, and then connect to the board's serial line. Remember to coordinate sharing of the board with your partners.

you@beectl:~$ ./run hello.img risc_3.bit
...print out from FPGA download...
...print out from kermit...

If all goes well with the FPGA programming and kermit initialization, at this point you're talking over the serial line to the Beehive shell:

Type <ENTER> and you should see the Beehive controller prompt: 0>
Type 1s<ENTER> (the s isn't echoed) to connect the console to core #1 and you should see the boot loader prompt: 1 2>
To initialize main memory using TFTP from the lab server type z (the z isn't echoed). You should see the spew from the TFTP loader followed by another prompt:
```
123456........   .......!
1 2>
```
Note that sometimes the TFTP will fail (you won't see the "..." or the "!"). Just try the z command several times until the download completes succesfully and you see the "!" at the end of the printout.
Now transfer control to the downloaded program by typing x1000g (the g isn't echoed). You should see the print out from the hello.c program:
```
3 cores, clock speed is 100 MHz
1: mqinit
corenum 1
corenum 2
corenum 3
```
You can terminate the kermit connection by typing Ctrl-\ followed by C.

If things went awry somewhere along the way, ask for help from a member of the course staff.

Whenever you want to run a program on the Beehive, just repeat the above starting with the invocation of the ./run script. There's no way to simply restart your program -- you have to repeat the process of running the script and the subsequent dialog over the serial line.

Task 3: Implement TSP in C for a single core

The Traveling Salesman Problem (TSP) is a "simple" optimization problem: given a list of towns and the length of any roads that connect pairs of towns, find a shortest tour that visits each town only once. A tour ends in a different town than the one it started in; it is not a cycle. TSP is NP-hard, meaning that the worst case running time of the best known solution algorithm grows exponentially with the number of towns.

The skeleton of a C program tsp.c is in lab1 in your git repo. You should complete the function tsp() so that it performs an exhaustive search to find a minimum cost tour. That is, you should modified it so that it explores all tours. Hint: call tsp() recursively. The argument len should store the length of the path so far (i.e., the number of towns visited so far). The argument cost should store the cost of the path so far (i.e., the sum of the weights on the edges). The argument path[] should store the path so far. The argument visited[] stores how many times we saw a tour of this length. The argument best_path[] stores the best tour find so far. The argument min is a pointer to a variable that stores the cost of the best path.

Complete the tsp function so that the program completes an exhaustive search of all possible tours. Compile and run the program on your workstation (not on the beehive). You can compile the program by typing:

you@eelab01$ gcc -o tsp tsp.c

If you've completed the code correctly, then after minute or so you should see something like the following, and the tour you find should have cost 245:

you@eelab01$ tsp
lowest path cost is 245
level  visited
0      1
1      11
2      110
3      990
4      7920
5      55440
6      332640
7      1663200
8      6652800
9      19958400
10     39916800
11     39916800
a best path found: 0 7 3 2 5 1 8 11 10 4 9 6

An exhaustive search is impractical when the number of towns is large -- the number of tours considered is (N-1)! where N is the number of towns. With 12 towns the number of visits to tours of length 11 is 11!.

We can prune the search tree considerably by abandoning the search (i.e., returning from tsp) when the cost of a (partial) tour exceeds the cost of the minimum cost tour found so far. This type of search optimization is called branch and bound. Change your program to incorporate this optimization. If you code up your exhaustive version nicely, you'll only need to add a single line!

Running your optimized program, you should see something like:

you@eelab01$ tsp
lowest path cost is 245
level  visited
0      1
1      11
2      110
3      975
4      7214
5      38865
6      132739
7      257345
8      251427
9      110983
10     17436
11     31
a best path found: 0 7 3 2 5 1 8 11 10 4 9 6

Incorporate the printouts from your original and optimized programs as comments in the front your code and submit your C file on-line (use the "Submit file" link on the course webpage). No need to submit the header file.

Task 4: Run TSP on the Beehive

The directory lab1 also contains a file beehive_main.c, which is a skeleton of a C program targeted for Beehive. Executive summary: to create a program that will run on Beehive add your one-time initialization code to the body of mc_init() and your multi-threaded code to the body of mc_main().

There are several differences worth noting about the Beehive C environment:

When running shared code on multiple cores, the cores can differentiate themselves one from another by calling the function corenum() which returns a unique integer in the range 1 to NCORES.
The Beehive runtime first calls mc_init() using core #1, which allows any shared data structures to be initialized before multicore execution begins. After mc_init returns, core #1 directs cores #2 and above to each call mc_main() concurrently. Then core #1 runs a RPC server that receives xprintf requests from other cores and serializes them onto the RS232 serial line.
Use a call to xprintf(format,args...) to print out a message from any core on the console (serial port). xprintf acquires a global lock to ensure that the messages from different cores aren't interleaved.
Main memory is shared, but there is no hardware mechanism that guarantees coherency between cores. This somewhat complicates the writing of multicore programs :) More about this in lecture...

Create a version of your optimized code from Task 3 in the file beehive_tsp.c, and modify beehive_main.c to call your code. To compile, run in the sw directory:

you@eelab01$ make

If your program compiles correctly, this will create a file o/tsp.img. Run it on the Beehive following the steps from Task 2, but using o/tsp.img instead of o/hello.img.

you@eelab01$ scp o/tsp.img beectl.csail.mit.edu:
...
you@beectl:~$ ./run tsp.img risc_3.bit

You'll need to ensure the program is run on only one of the cores, e.g., core 2. When your program executes successfully, incorporate the counts from the metering system as a comment at the beginning of your program and submit your code on-line.

Finally, in a separate text file lab1_answers.txt please enter the answers to the following questions and submit the file on-line. Consider the metering report generated when you ran your optimized tsp on the Beehive.

Explain when Dwrite requests are generated by a core. Why is the number of WriteData ring slots always a multiple of 8?
Explain how the number of I requests for each core is related to the number of instructions the core has executed.

6.173 Lab #1: The Traveling Salesman Problem

End of Lab #1!