Massachusetts Institute of Technology - Department of Urban Studies and Planning

11.520: A Workshop on Geographic Information Systems
11.188: Urban Planning and Social Science Laboratory

Lecture 5: Relational Databases

 

September 24, 2008, Joseph Ferreira, Jr.

(including notes by Visting Prof. Zhong-Rhen Peng who taught the class Fall, 2003)


NEW LECTURE ROOM: Wednesdays 2-3:30 in 14E-310 

Major topics

n Database concepts and issues in GIS

n Relational Database modeling

n Structured Query Language (SQL)

n Entity-Relationship Model

n SQL and ER-diagrams in MS-Access

n Joins and Relates in ArcGIS

 


Limitations of one 'flat-file' for attributes
  • One 'flat file' data table for each map layer is too simple a data model
    • From lab exercise #3: calculating percent-with-high-school education
      • cambbgrp is read-only ==> adding a column requires building a new table
      • after calculation, must add new table and join to original cambbgrp table
    • From lab exercise #3: one-to-many issue
      • some houses in sales89 may have sold more than once!
      • but, map has only one dot per house
      • what sale price is correct? average, most recent sale, inflation adjusted average, ...?
    • From lab exercise #4: one-to-many issue
      • In ArcMap:
        • Add matown00.shp shapefile from data locker
        • What is wrong with thematic mapof population density: pop90 / area
      • Mass towns with rivers and islands have more than one polygon
      • Data often are aggregated at city/town level
      • Hence, hard to compute population density: people / sum(town area)
    • Typical urban planning context: many-to-many issues
      • Parcels can be owned by one or more owner
      • Owners may own more than one parcel
  • Strategy: Relational Data Model (i.e., linked tables)
    • Retain tabular model for textual data but allow many tables that can be related to one another via common columns (attributes)
      • Allows augmenting attribute tables by adding additional data
      • Handles one-to-many issues (where rows have different meaning in joined tables)
    • Maintain one-to-one link between textual data and a class of objects
      • create 'summary' table with one row per town and a column with sum of area
        • then join back via town ID to shapefile with one row per polygon
      • create 'summary' table with one row per house and columns for sale count, maximum price, average price, etc.
        • then join back via house ID to sales tables with one row per sale
  • Relational Database Management (RDBMS) is underneath most computing
    • database-driven web pages: e-business catalogues, online newspapers
    • most transaction processing: airline reservation, ATM transaction, cellphone call logging
    • structured query language (SQL) for joining tables and specifying queries is the lingua franca of distributed database operations
    • Big business: Oracle, IBM (DB2), Microsoft (MS-Access, SQL-Server), ...
  • Rest of Lecture
    • Intro to theory and terminology of relational database management
    • Illustrate RDBMS using MS-Access and Lab #4 datasets

 

A Table (relation)

                                                  Attribute name

tuples  
 

 

 

 


 

 

Domains: data types like integer, strings, floats, dates, etc.

 

Relational Databases

n A relational database is a collection of tabular relations, or tables.

n A relation schema defines the structure of the relationship, and is composed of a set of attribute names and a mapping from each attribute name to a domain (that is, the possible values of that attribute)

n A relation is a finite set of tuples associated with a relational schema in a relational database (that is, a 'table' where each row is a tuple and the columns are the things that are related)

n A database schema is a set of relation schemas.

n A relational database is a set of relations.

n BEWARE: ArcGIS (and most lay folks) use different terms:

    • ArcGIS teminology:
      • Join two tables into one combined table by specifying the columns that link them
      • Relate two tables (but keep them separate) by specifying the columns that link them
    • Relational Database terminology:
      • Join two (or more) relations by matching tuples (rows) based on common values in the corresponding columns
      • 'Select' statements generate new relations (tables); 'views' save the query that will generate the combined relation (table)

 

A Relational database example

n  Relational schema:

n State (State_code, state_name, state_capital)

n  Database schema:

n State (State_id, state_name, state_capital)

n city (city_code, City_name, State_code)

n State_census (state, population, housing_unit_count, median_household_income, )

n  Primary Key: a minimal set of attributes whose value uniquely identifies each row (tuple).

 

Operations on Relations

  • Project operation [pick columns] applies to a single relation & returns a subset of attributes of the original.
    • SELECT state, population FROM census_table

  • Restrict operation [pick rows] return those tuples (rows) that match a 'where clause' condition
    • SELECT state, population FROM census_table
    • WHERE population > 0

n Join takes two relations as input and returns a single relation.

n Join (rel1, rel2, att1, att2)

  • Joins can get complicated (drop rows if no match?, join 3+ tables, ...)
  • Order of operation can affect results and and performance.
  • Language varies slightly across vendors:

    • SELECT state, population, state_code, state_capital
    • FROM census_table c, state_table s
    • WHERE population > 0
    •    AND c.state = state_name

 

Structured Query Language (SQL)

n To define the database scheme (data definition),

n To insert, modify, and retrieve (SELECT) data from the database (data manipulation).

SELECT columns

FROM tables

WHERE row conditions and joins are matched

GROUP BY non-aggregated columns

ORDER BY certain columns

n  Can get complex (subqueries in 'from' and 'where' clause, 3+ tables, ...)

n Can be especially powerful when rows in each table have different meaning

e.g., join city table to state table, parcel table to owner table, or house sales table to house table

 

Given the three tables below, answer the following queries

n What is the result if you associate STATE_CENSUS table with STATE table?

n What is the result if you associate STATE table with STATE_CENSUS table?

n What is the result if you associate STATE table with CITY table?

n What is the result if you associate CITY table with STATE table?

n We will examine and manipulate these tables in MS-Access: lec5_cities.mdb

 

CITY table

 

.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


.

STATE table

 

.

 

 

 

 

 

 

 

 

 


STATE_CENSUS table

  
 

.

 

 

 

 

 

 

 

 


n Using MS-Access to query these tables: lec5_cities.mdb

n Get feel for MS-Access basics - we will use it for large census tables in two weeks

n Focus on basic joins and querys: graphical query interface and equivalent SQL

n Shortchange discussion of best table structure (normalization) and complex join issues

n ArcGIS has basic join/query tools but runs out of gas for complex data manipulation

n MS-Access is better but still has problems with multi-user sharing and complex queries

n Next Monday's Lab #4 uses ArcGIS data manipulation tools to compute MA density

n Following Monday's Lab #5 uses MS-Access with census data (driving alone to work)

 


Other (abstract) Database Concepts (generally beyond scope of class)

 

Distributed databases and Federated databases

n Distributed databases refer to one database (or data replicates) that are distributed across multiple sites.

n Federated databases refer to many similar databases that are distributed across multiple sites but are more loosely coupled and additional rules may be needed to cross-reference tables meaningfully

n Federated databases are also called distributed relational database with fragmentations.

 

Relational data model (our focus)

n      Collection of interrelated tables

n       Highly structured data schemas and data types with data dictionary

n       We'll examine US Census Bureau examples in Lab #5

 

Relational Database as Entity-Relationship Model

n E-R model sees the world as inter-related entities (tables);

n Use MS-Access to build E-R diagram for the three city/state tables: lec5_cities.mdb.

n  Entities (or entity types like cities and states in our example) are related with each other by a relationship (linking primary and foreign keys - e.g, which city is in which state)

n  E-R model uses E-R diagrams to describe relations between entity types (generally one-to-many connections)

n E-R model describes the static state of the entity types.

 

Object-Oriented Model - one alternative to E-R for RDBMS

n Object-oriented model sees the world as inter-related objects.

n Object is dynamic and has its own lifespan. Hence OO model is used to deal with the dynamic nature of real-world object.

n Object = static state + functionality

n Object with similar behaviors are organized into types, a semantic concept.

n Object Class = data structure + methods, an implementation construct.


Back to RDBMS and relational tables in ArcGIS

 

Joining and Relating Tables in ArcGIS
n          Join in ArcGIS appends the attributes of the non-spatial table to the spatial (layer) attribute table.
n          Relate in ArcGIS does not append attributes; only establishes a logical relationship so that when you select one record in one table you can see the matching records in the other table.
Don't get confused between ArcGIS 'relate' (which describes the relationship between two tables, and RDBMS terminology where 'relation' = a table)
 
When to Use Join and Relate
n       Relate is preferred if the non-spatial table is maintained and updated constantly while the spatial data is not. (e.g., Mass towns shapefile plus summary table with data for each town)
n       Use relate when the relationship is many-to-many.
n       Use relate when you have a very large non-spatial table and you don't need all the attributes in the table.
n       Columns in 'related' table cannot be thematically mapped - they must first be 'joined' in
n       In other situations, you could use either.

Illustrate use of relational tables, joins, and ER diagrams

  • Use MS-Access to do data manipulation for Lab #4 (mapping Mass population density): lec5_mass.mdb
    • lab #4: 351 Mass cities and towns but 600+ polygons in shapefile - find density
    • Import DBF attribute table for Masss Town boundaries into MS-Access
    • Write query to sum areas of each part of a town
      • Note data type issues (integer could be too large...)
      • Try Plan B - read 'dbf' dataset into Excel and adjust data precision/scale/names/etc.
        • Adhere to DBF rules: 10-character names, set font to courier/12-pt and adjust column width,...
        • save with different name!
    • Create new table with Town name, total area, pop90 and pop90 density
    • Read MS-Access from ArcGIS and map density
  • Generate ER-diagram (of table relationships) for matown00.dbf and your new table
  • Understand SQL queries and graphical interface to assist in building/saving queries
  • ArcMap can import tables (and queries) from MS-Access but can be troublesome
    • Regular 'Add-Data' option sees MS-Access (mdb) tables but not queries
    • ODBC database connection sees queries as well but has data type problems (floats come across as integers)


Last modified 24 September 2008 [jf]. Back to the 11.520 Home Page.
Back to the CRON Home Page. For information about this page please send e-mail to 11.520staff@MIT.EDU