Massachusetts Institute of Technology
Department of Urban Studies and Planning
11.521: Spatial Database Management and
Advanced Geographic Information Systems |
11.523: Fundamentals of Spatial Database
Management |
Problem Set C: Relational Database Design
Out: 15 March 2018 || Due: 23 March 2018
Overview: In this problem set you will: Construct a relational database design for a generic housing
authority
The Database Design
Imagine you work for the public housing agency of a city, and you have been
charged with keeping track of who is living in the agency's apartments over
time. To help you in this task, you have decided to use a relational
database for your record keeping. Your task is to design a database that
allows you to capture the facts described below:
- The city has three public housing developments. You want to record
their names, locations, the year they opened, their height in stories,
and the year in which they last had significant renovation.
- For each apartment unit in the development, you want to keep track of
the number of bedrooms, the number of bathrooms, whether the unit has a
separate kitchen or living room, and the square footage.
- The database should keep track of the households living in the units
as well as the individuals living within each household. For each member
of a household, you want to record their name, date of birth, sex, and
indicate whether or not they are the head of the household (more than
one person can share that distinction but, for the purpose of this
exercise, assume that each household has a single 'head').
- You also want to keep track of when a household moved into and out of
a particular unit. You want to be able to follow households as they move
from one unit to another or from one development to another. Think about
how you will find the unit that the household is currently
occupying (i.e., what query would you write to find the current unit of
each household).
You will invent data for the three developments; two units in each
development; and three families, one with 2 members, one with 3 members, and
one with 4 members. Include records for each housing unit and household
tracking the occupancy of each housing unit during a hypothetical period
beginning January 1, 2014, through the current time. Assume that all 6
housing units are empty but ready for occupancy on day 1. Move your three
families in at varying times and account for at least one move to another of
the 6 housing units between then and now. You may insert these data records
into your database using INSERT statements.
You may be interested to know that this assignment was originally
inspired by records kept by the Boston Housing Authority (during the late
1990s) about their developments. This type of problem has definite
real-world importance. The BHA is big enough to (eventually) hire
consultants to design and implement their tenant tracking system, but many
smaller non-profits and community groups face similar data management
problems (with time-varying data and one-to-many relationships) in order
to track their clients, projects, funding sources, workshops, etc. A
little relational thinking about how to structure the data can greatly
facilitate data maintenance, data integrity, and the practicality of
subsequent program analysis.
The Process
Follow this process while designing your database. Keep in mind that there
is not single 'right' answer (but lots of wrong ones). Also, you may think
of complexities that go far beyond what you need to think about for the
purpose of this homework set. For example, don't worry about how to record
births/deaths over time, renovations that change the size and attributes of
apartments, etc.. Focus on a schema that handles the basics by recording the
basic characteristics of housing units and individuals and keeping track of
which households occupy which units over time. The goal is to have a
relational schema that handles one-to-many and many-to-many relationships
appropriately and supports SQL queries that can answer basic questions like,
"Was household X ever in Unit Y?" or "Has Individual Z ever lived in more
than one housing unit?" or "What is the vacancy rate for a development
during the past two years?" or "How many households have lived in their
current housing unit for more than two years?"
- Think about the database design problem and identify the entities
involved, their attributes, and the relationships among them.
- Draw an entity-relationship diagram that captures your thinking. Turn
in your E-R diagram with your problem set. You may either create your
diagram by hand or use software such as MS-Access (but, even if done on
paper, we prefer that you scan the picture so that you can submit it
electronically). Often one starts on paper and then ends up with the
final E-R diagram generated by a tool such as that in MS-Access after
the table schema are finalized.
- Make sure that your schema adheres to at least second or
preferably third normal form (discussed in the lecture on
database design).
- In a text editor, construct a script that implements a schema that
represents your entities and relationships. Such a script for the
'websis' example we discussed in lecture is linked to those lecture
notes: http://mit.edu/11.521/www/lectures/lecture10/lecture10.html.
Name your script something like username_dbschema.sql, where
username is your Athena username. Record the output from
running your SQL script in the file username_dbschema_log.txt.
Save your script on your desktop - Yes, you eventually want to
save it in your network locker - and follow these guidelines as you
write your script:
- Copy all your queries into a text file.
- Include CREATE TABLE statements for all your tables. Remember to
name them <your initials>_tablename.
- Review your notes on referential integrity and write:
- ALTER TABLE xxx ADD CONSTRAINT yyy statements to define
each tables' primary keys.
- ALTER TABLE xxx ADD CONSTRAINT yyy statements to define each
tables' foreign keys.
- Consider including 'commit' statements at various places in your
script so that SQL error does not 'rollback' too many prior statements
when you cut-paste-and-run multiple SQL statements into the
browser. (Note, if you were running Postgres from a command line
rather than phppgadmin, you could feed an entire file of SQL commands
to Postgres for batch processing.
- Include DROP TABLE statements for all your tables (before the
'create table' statement). This will help if you have to run your
script more than once, which will almost certainly be necessary. You
may want to begin the 'drop' lines with '--' so you have the
statements available but the line is treated as a comment when you cut
and paste many lines from the text editor into the SQL window.
Include the term 'cascade' at the end of your 'drop' statement. This
makes sure any foreign key constraints that refer to this table are
dropped too. If these constraints are not dropped, Postgres will
report an error when you try to drop the table (because another table
may depend upon the existence of the table you are trying to drop).
For example:
- Include comments that indicate the meaning of your statements in
your text file. SQL servers ignore any text contained within /* and */
characters even if the text spans multiple lines. Single line comments
must begin with two dashes together (--). Everything after the dashes
is ignored on that line only. For example:
/*****************
These two lines and the one above and below are
entirely ignored.
******************/
-- This line is also a comment.
SELECT * FROM mytable;
-- However, the one line SELECT statement above is not part
-- of the comment and will be executed if this script is run.
- Review your database design using the database design lecture notes (in
lecture 10). Pay particular attention to the database design rules
of thumb. Are you breaking any of these rules? If so, why? Remember, the design rules sometimes conflict and breaking
the rules is often OK, provided you have a good reason. If you make some choices that
you think might raise some questions, make sure to address them in your
comments.
- Run your script in Postgres by cutting and pasting queries from your text editor into the SQL window of phppgadmin. Correct
any errors in your script and run it again. Keep trying until it works.
- Add some sample data to your tables. Create another script called username_dbinsert.sql.
Record the output from running your SQL script in the file username_dbinsert_log.txt.
Insert about five rows in each table using 'INSERT INTO table VALUES (...)'
statements. For example, consider the table mytable created by this
CREATE TABLE statement:
CREATE TABLE mytable (
mycomment VARCHAR2(10),
mylowvalue NUMBER,
myhighvalue NUMBER,
mydate DATE);
You could insert two rows into mytable this way:
INSERT INTO mytable
VALUES ('Hi Mom', 1, 5, '30-OCT-2012');
INSERT INTO mytable (mycomment, mydate)
VALUES ('Hi Dad', '1-NOV-2012');
Note that Postgres is fussy about date values and will, by default, accept
dates only in the format 'DD-MON-YYYY' within single quotes as shown
above. You may use the TO_DATE function to use a different format in a particular
instance, for example:
INSERT INTO mytable (mycomment, mydate)
VALUES ('Bye Mom', TO_DATE('11/10/2012', 'MM/DD/YYYY'));
You may want to 'Google' the keywords for the details of TO_DATE and date functions. Be forewarned: dabbling in date format models has
many potential gotchas (most notably confusing MM (months) with MI
(minutes)). Read carefully! (But keep in mind that you do *not* need to learn all the intricacies about date handling to complete this homework set.)
- Remember to use a COMMIT statement after creating tables or inserting rows to make sure
the data is saved permanently. A SQL error may cause an automatic ROLLBACK
of your changes. (You can use the ROLLBACK statement on purpose if you
want to undo changes since the last COMMIT.) Many statements include
implied COMMITs, and not all errors cause a ROLLBACK. Query your tables
often to make sure that your SQL statements are having the desired effect!
- Test all your primary and foreign key constraints with some
INSERT statements that are intended to fail. Try to insert a
duplicate primary key. Also try to insert a foreign key that does not
exist as a primary key in the referenced table. In general, for every constraint
you create, you should have an INSERT statement that tests the constraint.
For this exercise, however, just include two such 'intended-to-fail' insert statements (following a 'commit' statement) at the end of your username_dbinsert.sql.
Also record the output from running your SQL tests in your file username_dbinsert_log.txt.
- Extra credit: Write a series of statements that record the 'long transaction' associated with two households
swapping apartments. At least one transaction control statement such as COMMIT or ROLLBACK
is required.
What to Turn In
The breakdown of point values for this assignment is as follows:
- Entity relationship diagram for the housing database: 30 points
- Schema definition file and log file: 30 points
- Data insert file and log file: 20 points
- Constraint test file and log file: 20 points
- Extra credit: 5 extra credit points
Turn in your username_dbschema.sql, username_dbschema_log.txt,
username_dbinsert.sql, username_dbinsert_log.txt, and the entity relationship diagrams files.
Upload your answers to Stellar. The assignment is due at 5PM on Friday, March 23, 2018.
This is the last assignment for those who are taking only the 11.523 first half of the semester-long 11.521 class.
Home | Syllabus | Lectures | Labs | CRN | MIT
For more information about this page, please contact the
11.521 Staff Mailing List <11.521staff@MIT.EDU>.
Created 5 November 1998 by Tom
Grayson and Joe Ferreira; Last Modified: 14
March 2018 [rb]