M.I.T. DEPARTMENT OF EECS

6.033 - Computer System Engineering Database Hands-On Assignment

Hands-on 8: Databases

This hands-on assignment will introduce you to databases and transactions. Many applications store data in a database similar to the one you will be using in this exercise. For example, it is common for web applications to store data in an SQL database, such as MySQL or Postgres, and in this hands-on, we will be using Postgres in particular. One of the benefits of using a database is that it provide transactions, which have before-or-after atomicity (we called this "isolation" in lecture) and all-or-nothing atomicity (we just called this "atomicity" in lecture). See chapter 9 for details. Transactions provide guarantees that an ordinary file system does not provide. The focus in this hands-on is on the before-or-after atomicity (a.k.a. isolation).

Once you are done with this hands-on assignment, submit your answers, along with the commands you ran and the output you got, using the online submission site before 11:59p.

Intro to databases

For the purposes of this exercise, we have set up a Postgres database server on the machine ud1.csail.mit.edu. We have created an account for every student registered for 6.033 this term; check the online submission web site for a short text file (under the database hands-on assignment) that contains your username and password. Please use our Postgres server for this hands-on assignment, rather than some other Postgres server, since we made some changes to defaults on our server. In particular, we changed the default transaction isolation level, which specifies what kind of before-or-after atomicity is desired, for all user accounts to SERIALIZABLE.

You can access your database by running the following commands on a Linux-based Athena workstation or dialup. To find your assigned username and password, check the online submission site for a comment under the database hands-on assignment.

athena% psql -h ud1.csail.mit.edu -U username
Password for user username: password
psql (8.4.10)
SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)
Type "help" for help.

username=> 

The psql command allows you to issue SQL queries to the database. To exit from this shell, type \q.

SQL databases allow users (such as application developers) to organize data in tables. Each table consists of record called rows. Each row, in turn, consists of several attributes called columns. All rows in the table share the same set of columns, and these columns must be defined ahead of time by the user when the table is created.

As an example, MIT's TechCash system, which handles payments using MIT ID cards, might construct a table that stores information about accounts, such as the account holder's Athena username, their full name, and the total amount of money in their account. You can visualize the table as containing the following information, with 4 rows and 3 columns:

username fullname balance
jones Alice Jones 82
bitdiddl Ben Bitdiddle 65
mike Michael Dole 73
alyssa Alyssa P. Hacker 79

In this hands-on, you will use the SQL language to issue queries to the database to perform operations on tables. Postgres has a good tutorial on SQL, as well as a more detailed SQL language manual that you can refer to.

Using SQL

Exercise 1. As a first step, use the SQL CREATE command to create the table shown above, and then execute several INSERT commands to insert each of the rows into the resulting table, as follows:

username=> create table accounts (username varchar(8), fullname varchar(128), balance int);
CREATE TABLE
username=> insert into accounts values ('jones', 'Alice Jones', 82);
INSERT 0 1
...

If you make a mistake, you can delete the accounts table by issuing a DROP TABLE accounts; command, and then starting over.

Exercise 2. Now, examine the table using the special \d command:

username=> \d accounts
            Table "public.accounts"
  Column  |          Type          | Modifiers 
----------+------------------------+-----------
 username | character varying(8)   | 
 fullname | character varying(128) | 
 balance  | integer                | 

username=> 

We can now read data from the table using the SELECT command. The SELECT command allows you to choose which rows to select (by specifying a predicate in a WHERE clause), and what data to return from each row (e.g., individual columns or aggregates such as SUM()):

username=> select username, fullname, balance from accounts;
...
username=> select fullname from accounts where balance > 75;
...
username=> select sum(balance) from accounts;
...
username=> 

Exercise 3. Run the above three commands. Also construct and run a command to display the full name of the person with username bitdiddl. Also construct and run a command to display the average account balance of people that have at least $70 in their account. You may want to refer to the Postgres SQL manual to find a suitable function for computing the average.

Exercise 4. Transfer $10 from jones to mike. You will find the UPDATE command useful; consult the Postgres SQL manual for more details. For this exercise, it will suffice to perform two updates: one to deduct $10 from Alice's balance, and another to add $10 to Michael's balance.

Transactions

To deal with concurrent operations, many SQL databases, including Postgres, support transactions, which allow several SQL statements to execute atomically (often meaning both before-or-after atomicity and all-or-nothing atomicity). To execute statements as a single transaction, SQL clients first issue a BEGIN command, then execute some SQL commands, and finally issue either a COMMIT command, which makes the transaction's changes permanent, or a ROLLBACK command, which reverts the changes from all of the commands in the transaction.

In this assignment, you will simulate concurrent database queries by issuing SQL statements over two different connections to the database. Open up two terminals, and run psql in each of them, to create two connections to the database. We will use two colors (blue and red) to indicate the two database sessions.

Start a transaction in the first (blue) terminal, and display a list of all accounts:

username=> begin;
BEGIN
username=> select * from accounts;
 username |     fullname     | balance 
----------+------------------+---------
 bitdiddl | Ben Bitdiddle    |      65
 alyssa   | Alyssa P. Hacker |      79
 jones    | Alice Jones      |      72
 mike     | Michael Dole     |      83
(4 rows)

username=> 

Now, in the second (red) terminal, also start a transaction and add an account for Chuck:

username=> begin;
BEGIN
username=> insert into accounts values ('chuck', 'Charles Robinson', 55);
INSERT 0 1
username=> 

Exercise 5. Generate a list of all accounts in the first (blue) terminal, and in the second (red) terminal. What output do you get? Are they the same or not? Why?

Now, commit the transaction in the second (red) terminal, by issuing the COMMIT statement:

username=> commit;
COMMIT
username=> 

Exercise 6. Generate a list of all accounts from the first (blue) terminal again. Does it include Chuck? Why or why not?

Exercise 7. Commit the transaction in the first (blue) terminal and generate a new list of all accounts:

username=> commit;
COMMIT
username=> select * from accounts;
...
What output do you get? Is it different than the output you received in exercise 6? Why or why not?

Now, let's try to modify the same account from two different transactions. In the first (blue) terminal, start a transaction and deposit $5 into Mike's account:

username=> begin;
BEGIN
username=> update accounts set balance=balance+5 where username='mike';
UPDATE 1

In the second (red) terminal, start a transaction and withdraw $10 from Mike's account:

username=> begin;
BEGIN
username=> update accounts set balance=balance-10 where username='mike';
UPDATE 1

Exercise 8. What happens to the execution of the second update? Why?

Let's try aborting a transaction: enter the ABORT command in the first (blue) terminal, undoing the $5 deposit:

username=> abort;
ROLLBACK
username=> 

Exercise 9. What happens to the second (red) terminal's transaction?

Exercise 10. If you now commit the transaction in the second terminal, what is the resulting balance in Mike's account?

Now let's perform an atomic transfer of $15 from Ben to Alyssa, using two UPDATE statements in a single transaction. In the first (blue) terminal, list the balances of all accounts. Then start a transaction and do a part of a transfer in the second (red) terminal:

username=> begin;
BEGIN
username=> update accounts set balance=balance-15 where username='bitdiddl';
UPDATE 1
username=> 

Exercise 11. If you now look at the list of all account balances in the first (blue) terminal, have the results changed compared to before you started the transaction in the second (red) terminal?

Exercise 12. Finish the transfer by executing the following commands in the second (red) terminal. After each command, list all of the account balances in the first (blue) terminal, to see at what point the effects of the second terminal's transaction become visible. What is that point, and why?

username=> update accounts set balance=balance+15 where username='alyssa';
UPDATE 1
username=> commit;
COMMIT
username=> 

Transaction isolation levels

Postgres supports different transaction isolation levels. So far, you have been using SERIALIZABLE transactions, meaning that the transactions appear to execute in some serial order. Postgres also has a READ COMMITTED isolation level, which allows one transaction to immediately see the results of any other committed transaction; you can read more about it in the Postgres manual here.

Exercise 13. By default, Postgres uses the READ COMMITTED isolation level (and for all of the above exercises, we explicitly configured our Postgres server to use the SERIALIZABLE isolation level instead). Why do you think Postgres developers chose to make the default isolation level READ COMMITTED, which allows non-serializable schedules?

Challenge exercise 14 (optional). Come up with two transactions that, when run concurrently (with some interleaving of commands that you get to determine) produce a non-serializable schedule. That is, the result that you obtain should be impossible to produce with SERIALIZABLE isolation, but is possible with READ COMMITTED.

To set the transaction isolation level to READ COMMITTED, you can issue the following command:

username=> set session characteristics as transaction isolation level read committed;
SET
username=> 

Keep in mind that the effects of this command last for the duration of a single session. This means you have to issue it every time you re-start psql, and issue it in every session of psql that you start, to get READ COMMITTED isolation between your transactions.


Go to 6.033 Home Page