6.173 Lab #3: Message-passing Parallel TSP

Deadlines

Useful links

Goal

Your goal in this lab will be to create a parallel version of your Lab 1 TSP for the Beehive, using messages for all interactions between cores. You'll think about partitioning, load balance, and inter-core coordination of minimum costs and termination.

We will announce who gets the best performance!


Task 1: Update your git repo

You must pull the lab 3 code into your git repository. As with lab 2, you should first commit your changes and then pull the new lab 3 files:

you@eelab01$ git status
the current state of your git repo
you@eelab01$ git add files that you want to commit
you@eelab01$ git commit -a
you@eelab01$ git pull

You should now see a sw/lab3 directory with a beehive_main.c, beehive_tsp.c, and 12_cities.h (identical to the 12_cities.h from lab 1).


Task 2: Time your uniprocessor TSP

Since this lab's goal is to run TSP faster on N cores than a uniprocessor, you should start by timing how fast your lab 1 TSP executes. Your code can read cycleCounter on each core to see how many cumulative cycles that core has been executing for. You should add code to you lab 1 TSP to record the number of cycles before the guts of your TSP computation, and after it is does, print out the current cycle count minus the recorded value. You can see an example of this in sw/lab3/beehive_tsp.c.

Hint: don't include printf or xprintf statements between the reads of cycleCounter.

Warning: cycleCounter is 32 bits and thus wraps around after 4 billion cycles, so you can't time anything that takes longer than 42 seconds. You could use your wrist watch for times in excess of that.

Now that you know how long your uniprocessor TSP takes (the staff solutions are somewhat more than a billion cycles), you know your goal: ideally TSP on N Beehive cores would take 1/Nth of your uniprocessor time. That is, ideally each core in your N-core parallel TSP will use 1/Nth the number of cycles used by your uniprocessor TSP.


Task 3: Parallel message-passing TSP

Your job is to implement a parallel TSP for the Beehive. All interaction between cores should be via messages; please do not use shared memory.

You will need to partition the work among the cores. One possibility is to partition by path prefix (first few cities in the path); then each core would explore all paths starting at that prefix.

The cores should announce new minimum costs to each other as they find them, to implement branch and bound.

Branch and bound may cause different partitions of the work to take different amounts of time, so you may need some form of load balancing. Perhaps a work list of unexplored path prefixes, from which each core takes a new prefix as soon as it has finished with the last one.

Your program should print just one result (cost and path) at the end, which may require coordination among the cores.

Remember, all interactions among cores should proceed via explicit message passing (message_send/recv or your lab 2 broadcast message support).

We will want you to report performance on a full-sized Beehive, which you can create with make risc_13.bit in hwv5. This will give you 13 cores on which to run your software.

Please report run-times for the topology in 12_cities.h, for both starting cities 0 and 1. You should report the longest of the numbers of cycles that any of the cores spend executing your code.

Hint: you probably don't want to modify global variables, since they are shared by all cores. Each core executes with a different stack, however, so it is safe to use stack variables.

Hint: one approach to load balance is to dedicate one core (the master) to maintaining a work list, and having other cores send it messages asking it for work as needed.

Hint: one way to distribute the current minimum known cost for branch-and-bound is to piggyback it on messages to/from the master.

Hint: a more efficient way to distribute minimum known costs might be to broadcast them using your lab 2 hardware.

Hint: maybe the master could perform part of the TSP computation as well as maintaining the work list. Maybe you can avoid a master altogether and distribute the work list.

Start with a simple solution, get that working correctly, understand its performance, and then improve it incrementally. If you have more ideas than you can implement before the deadline, don't worry: you have another chance in lab 5 (parallel TSP with shared memory).


What to hand in Please answer the following questions in a few sentences in a lab3_answers.txt file:

Submit lab3_answers.txt along with your beehive_tsp.c.

End of Lab #3!