6.173 Lab #2: Message passing and broadcast

Deadlines

Useful links

Goal

This lab makes you familiar with sending messages on Beehive. First, you will implement a function to broadcast messages to other cores in software. This implementation will send n messages over the ring where n is the number of cores in beehive. Second, you will modify the Verilog of the messenger module of Beehive to support broadcast directly in hardware. This implementation will send 1 message to deliver a broadcast message to n cores. In lab 3, you will use message passing and broadcast to implement a program that solves the traveling salesman problem in parallel.


Task 1: Update your git repo

You must pull the lab 2 code into your git repository, but you probably want to first commit any changes you made for lab1. Go to the directory that contains the 6.173_repo and type:

you@eelab01$ git status
the current state of your git repo
you@eelab01$ git add files that you want to commit
you@eelab01$ git commit -a

Now your previous changes are committed in your local git repository. You can see the history of your changes by typing:

you@eelab01$ gitk

Remind yourself of the changes you have made by exploring the history using gitk.

Now that lab1 is nicely packaged up, run the following command in your local repository to get the updated lab 2 code:

you@eelab01$ git pull
... changes to your git repo

Now your repository will have a lab2 directory and some changes in sw/lib/ and a modified GNUMakefile. If you want to inspect the changes, run gitk again. (You may notice that your repo already had an implementation of software broadcast and that we removed it; just don't look at this removed version--it is better if you figure out for yourself how to do it.)

It's unlikely, but git may report a conflict, which means you and we have made conflicting changes and you will have to resolve the conflict first before proceeding. See the git manual to learn how to resolve conflicts.


Task 2: software broadcast

Look in the file lib/msg.c, and you will see one unimplemented function: bcast_send. Your job is to complete the implementation using Beehive's primitives for sending and receiving message from one core to another.

The function bcast_send broadcasts len words (stored in buf) to all cores except the initiator. Thus, if your board is running Beehive with 14 cores and you invoke bcast_send on core 2, then your implementation should deliver the message to cores 3 through 13. (Recall that core 1, the ethernet core, and the copier core don't run application code.)

The function test1 in lab2/beehive_main.c tests your implementation of broadcast. So, if you are confused what the specification of bcast_send and bcast_recv is, study how test1 uses them. Note that the test assumes that the len argument is the first word of the message.

Beehive provides primitives to send a message over the ring to a particular core and for another core to receive it. The interface is defined in shared/intercore.h along with an assembly stub that invokes the hardware messages. (Search intercore.h for 'send'.) The hardware is described in the Beehive document (see useful links above). You can see some example usages of sending and receiving messages in lab/beehive_main.c in the functions incdone, waitdone, and slave.

You can invoke make to compile and link bcast.img. Once you successfully have built this file, load it on your Beehive board, as in lab 1, and run it. Use the risc_13.bit file (i.e., the 13 core implementation of Beehive) from lab 1. Lab 1 describes how to build it and copy it to beectl.csail.mit.edu.

If your code is correct, you will see the output similar to the following:



1 2>x1000
13 cores, clock speed is 100 MHz
1: mqinit
core #1 started
core #2 started
core #12 started
core #9 started
core #5 started
core #10 started
core #11 started
core #8 started
core #3 started
core #6 started
core #4 started
core #7 started
core #13 started
test1: start hw=0


**** METERING REPORT ****

Count of ring slots by type:
  554 ring slots of type "Token"
  220 ring slots of type "Address"
  40 ring slots of type "WriteData"
  9801 ring slots of type "Null"
  22 ring slots of type "Message"

Count of Address slots by core:
  core  1: I=0, Dwrite=0, Dread=0
  core  2: I=40, Dwrite=5, Dread=76
  core  3: I=7, Dwrite=0, Dread=2
  core  4: I=7, Dwrite=0, Dread=2
  core  5: I=7, Dwrite=0, Dread=2
  core  6: I=7, Dwrite=0, Dread=2
  core  7: I=7, Dwrite=0, Dread=2
  core  8: I=7, Dwrite=0, Dread=2
  core  9: I=7, Dwrite=0, Dread=2
  core 10: I=7, Dwrite=0, Dread=2
  core 11: I=7, Dwrite=0, Dread=2
  core 12: I=7, Dwrite=0, Dread=2
  core 13: I=7, Dwrite=0, Dread=2
  core 14: I=0, Dwrite=0, Dread=0
  core 15: I=0, Dwrite=0, Dread=0
test1: passed
test2: start hw=0
test2: passed
test1: start hw=1
wait_done: master shouldn't receive its bcast

2 is dead

Note that the program calls test1 and test2 twice, once with the software implementation and once with the hardware implementation. Since you haven't completed the hardware implementation yet (it is the topic of task 3), the second invocation of test1 will not pass the test and terminate the execution.

Recall that you can terminate the kermit connection by typing Ctrl-\ followed by Ctrl-C.


Task 3: Hardware broadcast

In task 3 you will modify the hardware to support broadcast. To see how applications can invoke the hardware broadcast, look again in the file lib/msg.c, and you will see the function hw_bcast_send. It invokes hardware broadcast by calling message_send with the destination equal to the sender, the current core. Your job is to ensure that if the destination is equal to the source, then the hardware should deliver the messages to all computing cores (except the sending core). That is, hw_bcast_send is a drop-in replacement for bcast_send.

To modify Beehive to support broadcast, you will have to study the file NewMessenger.v in the directory hwv5/RISCsrc of your beehive git repository. The file isn't large and you should try to understand it before attempting to design your solution. While studying the file, it will be helpful to have the Beehive document and the slides from lecture 2 open (see links "Useful links" at the top of this page), so that you can match the description of the ring and the messenger module in the documentation to the Verilog in NewMessenger.v.

Once you have a good understanding of NewMessenger.v, consider ways you could modify the Verilog to support broadcast. There are several ways to implement broadcast cleanly; the staff has two solutions that require modifying only a few lines in NewMessenger.v (~5 lines). You probably want to put each of your changes in the file between ifdefs so that you can keep track carefully what you have changed and you can disable/enable the changes easily. Verilog supports ifdefs as follows:

`ifdef SOL2
... modified line(s) of verilog ...
`else
... the unmodified line(s) of verilog ...
`endif

In designing your support for hardware broadcast, consider the following hint: we chose the destination address to be the source address and remember that with point-to-point messages the destination core copies the message to its local queue and removes the packets from the ring. Be careful that you don't send a broadcast to the ethernet and copier core---if you don't know what that is, you should read the Beehive document (again)---and the core that initiated the broadcast.

Build a version of risc_3.bit with your hardware broadcast implementation. Run bcast.img on your bit file and, if everything works correctly, bcast.img will pass test 1 and test 2 twice: once for the software version and once for the hardware version.

After making sure 3 core version works, build the 13 core version risc_13.bit, and rerun the test program on it --- hopefully it passes test1 and test2 twice. You might see output as follows:


1 2>x1000
13 cores, clock speed is 100 MHz
1: mqinit
core #1 started
core #2 started
core #3 started
core #7 started
core #11 started
core #4 started
core #5 started
core #6 started
core #8 started
core #9 started
core #10 started
core #12 started
core #13 started
test1: start hw=0


**** METERING REPORT ****

Count of ring slots by type:
 8991 ring slots of type "Startup"
 571 ring slots of type "Token"
 333 ring slots of type "Address"
 55 ring slots of type "Null"
 33 ring slots of type "Message"

Count of Address slots by core:
 core  1: I=0, Dwrite=0, Dread=0
 core  2: I=8, Dwrite=0, Dread=2
 core  3: I=8, Dwrite=0, Dread=2
 core  4: I=8, Dwrite=0, Dread=2
 core  5: I=8, Dwrite=0, Dread=2
 core  6: I=8, Dwrite=0, Dread=2
 core  7: I=8, Dwrite=0, Dread=2
 core  8: I=8, Dwrite=0, Dread=2
 core  9: I=8, Dwrite=0, Dread=2
 core 10: I=8, Dwrite=0, Dread=2
 core 11: I=8, Dwrite=0, Dread=2
 core 12: I=8, Dwrite=0, Dread=2
 core 13: I=155, Dwrite=0, Dread=68
 core 14: I=0, Dwrite=0, Dread=0
 core 15: I=0, Dwrite=0, Dread=0
test1: passed
test2: start hw=0
test2: passed
test1: start hw=1


**** METERING REPORT ****

Count of ring slots by type:
 7053 ring slots of type "Startup"
 454 ring slots of type "Token"
 222 ring slots of type "Address"
 25 ring slots of type "Null"
 3 ring slots of type "Broadcast"

Count of Address slots by core:
 core  1: I=0, Dwrite=0, Dread=0
 core  2: I=1, Dwrite=0, Dread=0
 core  3: I=1, Dwrite=0, Dread=0
 core  4: I=1, Dwrite=0, Dread=0
 core  5: I=1, Dwrite=0, Dread=0
 core  6: I=1, Dwrite=0, Dread=0
 core  7: I=1, Dwrite=0, Dread=0
 core  8: I=1, Dwrite=0, Dread=0
 core  9: I=1, Dwrite=0, Dread=0
 core 10: I=1, Dwrite=0, Dread=0
 core 11: I=1, Dwrite=0, Dread=0
 core 12: I=1, Dwrite=0, Dread=0
 core 13: I=145, Dwrite=0, Dread=66
 core 14: I=0, Dwrite=0, Dread=0
 core 15: I=0, Dwrite=0, Dread=0
test1: passed
test2: start hw=1
test2: passed

Check out the performance counters: did the first invocation of test1 send more messages than the second invocation? How many more? Does that make sense? Write up your answer in the file lab2_answers.txt.

Submit lab2_answers.txt along with your versions of NewMessenger.v and msg.c.

End of Lab #2!