M.I.T. DEPARTMENT OF EECS
|6.033 - Computer System Engineering||Valgrind Assignment|
Complete the following hands-on assignment. Do the activities described, and submit your solutions using the online submission site by 11:59p.
This assignment relates to the Eraser paper, and exposes you to parallel programming with Pthreads, and understanding and fixing concurrency errors. You might find it helpful to skim the Eraser paper, which is assigned for the next section. You also might want to refresh your memory on what race conditions are and the troubles that they can cause by revisiting sections 5.2.2, 5.2.3, and 5.2.4 of the textbook.
This assignment involves a small amount of programming.
no-knife:~> add 6.033; add gccDownload the program ph.c and store it in a file
ph.c. On a linux system, you might do:
no-knife:~> wget http://web.mit.edu/6.033/www/assignments/ph.cThen compile and run it with one thread:
$ gcc -g -o ph ph.c -pthread $ ./ph 1 completion time for put phase = 2.758351 0: 0 keys missing completion time for get phase = 2.270119
The 1 specifies the number of threads that execute put and get operations on
the the hash table. The program inserts
NKEYS into a hash table in
the put phase and then looks them all up in the get phase.
Let's run with 2 threads. Then each thread does
NKEYS/2 puts in
the put phase and
NKEYS/2 lookups in the get phase. The program is
written in C, the default language for low-level systems programming, and uses
pthreads for creating
several threads that can run in parallel if our machine has several cores
(most processors today have at least 2 cores).
If we achieve perfect parallelism, then the two phases complete in half the time. Here is the output:
~/classes/6033-2013.git/wwwdocs/assignments$ ./ph 2 completion time for put phase = 1.688109 0: 27 keys missing 1: 39 keys missing completion time for get phase = 3.215020
We see that put phase is faster now, so we were able to exploit 2 cores (but it didn't get twice as fast). The get phase, however, is slower with 2 cores, which is disappointing: using more cores, the program ran slower. More on that later, because there is a bigger problem.
You will likely observe that the code is incorrect. The application inserted 27+39 keys in phase 1 that phase 2 couldn't find. In your runs, there may be more or fewer keys missing. There may be even 0 keys missing in some runs. If you run with 1 thread, there will never be any keys missing.
Run the application with 4 threads:
~/classes/6033-2013.git/wwwdocs/assignments$ ./ph 4 completion time for put phase = 1.113811 0: 39 keys missing 2: 31 keys missing 3: 15 keys missing 1: 37 keys missing completion time for get phase = 3.436008
Three points: 1) the put runs faster with 4 cores than 2 cores, but not 4 times as fast as a single core; 2) the get phase ran even slower; 3) more keys are missing.
ph.c program is simple, and you may be able to spot the
mistake immediately, but we will use a tool,
helgrind that helps
us find such errors automatically.
helgrind is similar in spirit to
the Eraser system that you are reading about, and is part of a suite of tools
ph. You will see output as follows:
$ valgrind --tool=helgrind ./ph 2 ==5143== Helgrind, a thread error detector ==5143== Copyright (C) 2007-2013, and GNU GPL'd, by OpenWorks LLP et al. ==5143== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info ==5143== Command: ./ph 2 ==5143== ==5143== ---Thread-Announcement------------------------------------------ ==5143== ==5143== Thread #3 was created ==5143== at 0x515543E: clone (clone.S:74) ==5143== by 0x4E44199: do_clone.constprop.3 (createthread.c:75) ==5143== by 0x4E458BA: create_thread (createthread.c:245) ==5143== by 0x4E458BA: pthread_create@@GLIBC_2.2.5 (pthread_create.c:611) ==5143== by 0x4C30E0D: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so) ==5143== by 0x40108B: main (ph.c:148) ==5143== ==5143== ---Thread-Announcement------------------------------------------ ==5143== ==5143== Thread #2 was created ==5143== at 0x515543E: clone (clone.S:74) ==5143== by 0x4E44199: do_clone.constprop.3 (createthread.c:75) ==5143== by 0x4E458BA: create_thread (createthread.c:245) ==5143== by 0x4E458BA: pthread_create@@GLIBC_2.2.5 (pthread_create.c:611) ==5143== by 0x4C30E0D: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so) ==5143== by 0x40108B: main (ph.c:148) ==5143== ==5143== ---------------------------------------------------------------- ==5143== ==5143== Possible data race during read of size 4 at 0x1CE5728 by thread #3 ==5143== Locks held: none ==5143== at 0x400C1D: put (ph.c:61) ==5143== by 0x400E74: put_thread (ph.c:98) ==5143== by 0x4C30FA6: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so) ==5143== by 0x4E45181: start_thread (pthread_create.c:312) ==5143== by 0x515547C: clone (clone.S:111) ==5143== ==5143== This conflicts with a previous write of size 4 by thread #2 ==5143== Locks held: none ==5143== at 0x400CB0: put (ph.c:64) ==5143== by 0x400E74: put_thread (ph.c:98) ==5143== by 0x4C30FA6: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so) ==5143== by 0x4E45181: start_thread (pthread_create.c:312) ==5143== by 0x515547C: clone (clone.S:111) ==5143== Address 0x1ce5728 is 24000008 bytes inside data symbol "table" ==5143== ==5143== ==5143== More than 10000000 total errors detected. I'm not reporting any more. ==5143== Final error counts will be inaccurate. Go fix your program! ==5143== Rerun with --error-limit=no to disable this cutoff. Note ==5143== that errors may occur in your program without prior warning from ==5143== Valgrind, because errors are no longer being displayed. ==5143== ==5143== ==5143== For counts of detected and suppressed errors, rerun with: -v ==5143== Use --history-level=approx or =none to gain increased speed, at ==5143== the cost of reduced accuracy of conflicting-access information ==5143== ERROR SUMMARY: 10000000 errors from 1 contexts (suppressed: 0 from 0)
Question 1: Study
helgrind's output and the
ph.c program. Clearly something is wrong with
on line 61. Why are there missing keys with 2 or more threads, but not with 1
thread? Your explanation should refer to the given code and make use of theory
to justify your answer. Identify a sequence of events that can lead
to keys missing for 2 threads.
To avoid this sequence of events, insert lock and unlock statements in
put so that the number keys missing is always 0. The relevant
pthread calls are (for more see the manual pages, man pthread):
pthread_mutex_t lock; // declare a lock pthread_mutex_init(&lock, NULL); // initialize the lock pthread_mutex_lock(&lock); // acquire lock pthread_mutex_unlock(&lock); // release lock
get() has an example use of locks, and
main initializes it.
ph.c to make it correct and recompile with gcc. Test your
code first with 1 thread, then test it with 2 threads. Is it correct (i.e. have
you eliminated missing keys)? Check the correctness using
(Note that valgrind slows down
ph a lot, so you may want to modify
NKEYS to be some small number.)
Question 2: Describe your changes and argue why these changes ensure correctness. (Do not include the entire ph.c code, but do tell us which lines you have added and feel free to include the put code in your explanation.)
Question 3: Is the two-threaded version faster than the single-threaded version in the put phase? Report the running times for the two-threaded version and the single-threaded version. (Make sure to undo your NKEYS change.)
Question 4: Most likely (depending on your exact changes)
ph won't achieve any speedup. Why is that?
Remove the locks from
ph.c, recompile, and
rerun the program.
Question 5: You should observe speedup on several cores for the
get phase of
ph. Why is that?
Question 6: Why does
valgrind report no errors for
Question 7: Can you think of a way of modifying
ph.c so that you get speedup
for both phases and
valgrind won't report race conditions? (If you
have time, implement that plan and check.)
Go to 6.033 Home Page