Relational Database Design

Massachusetts Institute of Technology - Department of Urban Studies and Planning

11.521	Spatial Database Management and Advanced Geographic Information Systems
11.523	Fundamentals of Spatial Database Management

Referential Integrity and Relational Database Design

[Additional notes for use with Problem Set B]

Introduction to Relational Database Design (based on Lecture by Tom Grayson)

Review: The Relational Model
Review: Qualities of a Good Database Design
Introduction to Entity-Relationship Modeling
E-R Modeling Process
Database Normalization
Review: Database Design Rules of Thumb
Enforcing Referential Integrity in Oracle

[Optional: Correlated Updates - examples of UPDATEs only for SELECTed rows]

Review: The Relational Model

All data are represented as tables (relations)
Tables are comprised of rows and columns (tuples)
Rows are (officially) unordered (i.e., the order in which rows are referenced does not matter)
A proper relational table contains no duplicate rows.
Each table has a primary key, a unique identifier constructed from one or more columns
Most primary keys are a single column (e.g., OWNERNUM for OWNERS)
A table is linked to another by including the other table's primary key. Such an included column is called a foreign key

Review: Qualities of a Good Database Design

Reflects real-world structure of the problem
Can represent all expected data over time
Avoids redundant storage of data items
Provides efficient access to data
Supports the maintenance of data integrity over time
Clean, consistent, and easy to understand
Note: These objectives are sometimes contradictory!

Introduction to Entity-Relationship Modeling

Entity-Relationship (E-R) Modeling: A method for designing databases
A simplified version is presented here
Represents the data by entities that have attributes.
An entity is a class of distinct identifiable objects or concepts
Entities have relationships with one another
Result of the process is a normalized database that facilitates access and avoids duplicate data
Here's an ER diagram of portions of the 'parcels' test database. It is drawn by importing the tables from Oracle to MS-Access and then using the Tool/Relationships capability of MS-Access to draw the diagrams:

E-R Modeling Process

Identify the entities that your database must represent
Determine the cardinality relationships among the entities and classify them as one of
- One-to-one (e.g., a parcel has one address)
- One-to-many (e.g., a parcel may be involved in many fires)
- Many-to-many (e.g., parcel sales: a parcel may be sold many times by different owners, and an individual owner may sell many parcels)
Draw the entity-relationship diagram
Determine the attributes of each entity
Define the (unique) primary key of each entity
Define the relationships between primary keys in one table and foreign keys in another

From E-R Model to Database Design

Entities with one-to-one relationships can (and often should) be merged into a single entity
Each remaining entity is modeled by a table with a primary key and attributes, some of which may be foreign keys
One-to-many relationships are modeled by a foreign key attribute in the table representing the entity on the "many" side of the relationship (e.g., the FIRES table has a foreign key that refers to the PARCELS table)
Many-to-many relationships among two entities are modeled by defining a third table that has foreign keys that refer to the entities in each original table. These foreign keys should be included in the third table's primary key, if appropriate
Commercially available tools can automate the process of converting a E-R model to a database schema

Database Normalization

Much of formal database design is focused on normalizing the database and ensuring that the design conforms to a level of normalization (e.g., first normal form, second normal form, etc.). Although there are higher normal forms, Third Normal Form is generally considered good enough for typical applications. Normalization generally involves taking a design with fewer tables and many columns and transforming it into a design with more tables with fewer columns -- after conducting some tests and applying some rules. First normal forms requires that there be no multi-valued attributes. Second Normal Form requires that non-key fields must be dependent upon the entire key. Third Normal Form prohibits transitive dependencies (whereby a non-key attribute is dependent on another non-key attribute). Here are some online sites with useful database management concepts and tips including some good discussion of database normalization: and first, second, and third normal form:

Review: Database Design Rules of Thumb

Keep data items atomic (e.g., first and last names are separate). Concatenating columns together later on-the-fly is generally easy, but separating them is not. (First Normal Form)
- What is an example of where parsing subfields from a column may go awry?
- When might you want to include the combined fields in a column anyway?
Define the primary key first. Use a descriptive name (PARCELID, not ID)
In fact, use descriptive names that give a new user a decent chance of guessing what they mean for all your columns! (E.g., use PARCEL_COUNT rather than PACT)
Use a single column for the primary key whenever possible; multi-column primary keys are appropriate for many-to-many relationships
Use lookup tables rather than storing long values
Use numeric keys whenever possible (What about ZIP codes?)
Avoid intelligent keys (exception: lookup tables)
Avoid using multiple columns to represent a one-to-many relationship (e.g., columns such as CHILD1, CHILD2 in a table called PARENT rather than putting the children in a separate table. (First Normal Form)
For readability, use the primary key name for foreign keys unless the same foreign key is used multiple times in the same table (e.g., state of work and state of residence for a person might both be foreign keys that reference a table of states)
Do not include two columns whose values are linked together (e.g., county name and county ID) unless one of the columns is the primary key of the table (Third Normal Form)
Avoid allowing NULL values in columns that have a discrete range of possible values (e.g., integers between 1 and 10, inclusive)
- Not applicable to DBF files, which do not support NULLs
Avoid using multiple tables with similar structures that represent minor variants on the same entity (e.g., putting Boston parcels and Cambridge parcels in separate tables).
- Why is this rule often hard to practice with GIS?
Plan ahead for transferring data to a different database. For example, you may want to move data from Oracle to DBF, or Microsoft Access to Oracle.
- Avoid column names with characters with other than UPPER CASE letters (A-Z), digits (0-9), and the underscore (_). Other characters may not be accepted by a database. Some database systems may be case sensitive with regard to column names, while others are not.
- Keep your column names relatively short. Different databases support different numbers of characters in column names (e.g., 30 for Oracle, 64 for Microsoft Access, 10 for DBF). Try to make column names differ in the first few characters rather than at the end to avoid column name duplication if the names are truncated during the conversion process (e.g., use COL1 and COL2, not LONG_COLUMN_NAME_1 and LONG_COLUMN_NAME_2).
- Note that keeping column names short may be at odds with keeping your column names meaningful for neophytes. Be aware that you are making a tradeoff!
Remember that these are rules of thumb, not absolute laws! Bend the rules if you must but have a justification for your decision. The limitations of a GIS software package often provide a good reason.

Enforcing Referential Integrity in Oracle

See Tom Grayson's notes regarding the syntax of SQL statements to define primary and foreign keys and to constrain table manipulation in order to enforce referential integrity - that is, to keep you from making changes to tables that destroy their internal consistency.

Home | Syllabus | Lectures | Labs | CRN | 37-312 lab | MIT

For more information about this page, please contact the 11.521 Staff Mailing List <11.521staff@MIT.EDU>.
Written February, 2001 (Tom Grayson), Last Modified 11 March 2010 (jf)