Massachusetts Institute of Technology - Department of Urban Studies and Planning

11.188: Urban Planning and Social Science Laboratory
11.205: Intro to Spatial Analysis (1st half-semester)
11.520: Workshop on GIS (2nd half-semester)

Lecture 3: Relational Databases & Coordinate Systems

 

February 22, 2017, Joseph Ferreira, Jr.

(including notes by Visting Prof. Zhong-Rhen Peng who taught the class Fall, 2003)

Administrative notes:

  • Lab Exercise #2 due on Stellar next Monday, Feb.. 27, at 2:30 pm
    • upload text, Word, RTF, or PDF formatted answer sheet to Stellar
    • Editing suggestion: save webpage answer sheet as text file and then add your answers using your favorite editor
  • Homework set #1 is online now and due in 2 weeks (Wed. March 8 via Stellar)
    • Examine relationships among eastern Mass Shopping Centers, major roads, and residential locations
    • Long before it is due, read it over, check out the datasets, try the methods - spread out the work
    • Waiting until the end will be frustrating and stressful!
  • Online learning tools for ArcGIS and MS-Access
    • ESRI 'Virtual Campus' ArcGIS tutorials: access codes available via MIT Libraries
      • Email gishelp@mit.edu with names of tutorials you wish to run
    • MS-Access tutorials and online help (see detailed notes below)
  • Optional supervised lab time in 9-554 on Friday 1:00 - 3:00 pm


Recall topics from last week

Powerpoint slides used by commercial firm to market site selection tools
by Edens & Avant and RPM consulting

n  Think about these marketing slides?

n  Who is the audience for these GIS services? What expertise do the consultants offer?

n  Why might firms contract out for these services rather than build in-house capacity?

n  Is the methodology or analytic scope overstated?

n  What considerations are omitted, shortchanged, badly measured?

n  From whose point of view is the siting service helpful or hurtful?

n  How might you do a different analysis for a different audience?

Major topics for today -Database concepts and spatial reference systems

n Projections and coordinate systems

n  Relational Database modeling

n Structured Query Language (SQL)

n Entity-Relationship Diagrams

n SQL and ER-diagrams in MS-Access

n Joins and Relates in ArcGIS


Map Projections and Coordinate Systems

 

(based on notes by Professors Joe Ferreira and Mike Flaxman, and former Visting Prof. Zhong-Rhen Peng with contributions from Prof. Peter Dana's (U. of Colorado) web pages)

Datums, Map Projections, & Coordinate Systems

      What is the minimum information needed to precisely determine
      location on the surface of the planet?

Need *both* a known coordinate system
and a known model of the earth’s surface

If you only know one, you can be hundreds of meters off target

      -literally

An Ellipsoid or a Datum are abstractions of the surface of the earth

WG84 (the World Geodetic System of 1984) is a standard ellipsoid.

In North America , the most recent ellipsoid data it is called the North American Datum of 1983 (NAD83) (the earlier version is NAD27).

Difference between measurements between NAD27 & NAD83 vary by location
but commonly 10 – 100 ft

 

Geographic Reference System: Latitude and Longitude

Axis: the center of earth rotation.

Equator: The plane through the center of mass perpendicular to the axis.

Longitude: lines slicing the earth parallel to the axis, and perpendicular to the plane of equator.

The line going through Greenwich has 0 longitude.

Range from 0 to 360 degrees, or 180 degree west (-) to 180 degree east (+).

Latitude

Latitude is defined based on ellipsoid representing the shape of the earth.

See: Prof. Peter Dana's notes on projections and coordinate systems ( U. of Colorado ) http://www.colorado.edu/geography/gcraft/notes/coordsys/coordsys_f.html

<Click the images below to enlarge...>

Latitude definition:

A line drawn through a point of interest perpendicular to the ellipsoid at that location, the angle made by this line with the plane of Equator is the latitude of that point.

Ranges from 90 degree south (-) to 90 degree north (+).

 

What do Latitude and Longitude mean?

Two points on the same longitude, separated by one degree of latitude are 1/360 of the circumference of earth apart, or about 111 km apart.

One minute latitude is 1.86 km.

One second latitude is 30 m.

For the same latitude, one minute of longitude separation corresponds to different distances depending on the latitude (111 km at equator, nothing at the poles!).

Nowadays, latitude/longitude often expressed in decimal degrees.

Distance calculation using latitude and longitude

  • Latitude -90≤Ø≤90
  • Longitude -180≤λ≤180
  • Arc distance between two points on the earth surface (spherical):

 

Cartesian Coordinate System

n       Assign two coordinates to every point on a flat surface.

Map Projections

n       Map projections transform the curved, 3-D surface of the planet into a flat, 2-D plane. Note, that Map projections distort map scale in various ways

n       Transform a position on the Earth’s surface identified by latitude and longitude (Ø, λ) into a position in Cartesian coordinates (x, y).

n        x = f (Ø, λ)

n        Y = g (Ø, λ)

n       Map projections necessarily distort the Earth and the map scale.


Example using Prof. Peter Dana's notes (
U. of Colorado )
http://www.colorado.edu/geography/gcraft/notes/mapproj/mapproj.html

NOTE: There is no need to memorize the formulas or properties of various coordinate systems. Just understand the concepts and how to find more information about specific cases and coordinate conversion.


Illustrative example using ArcMap

  • ESRI sample data map of 50 US states
    • United States boundary shapefile (from US Census, 2000)
    • This shapefile has only 3 files (the minimum number if no *.prj projection file)
      • st99_d00.dbf - attribute table
      • st99_d00.shp - geometry
      • st99_d00.shx - spatial index
  • Convert to Mass state plane and compare results
    • Distinguish coordinate system of
      • Data Frame
      • X/Y numbers stored on disk
      • Projection specified in *.prj file
    • Build a st99_d00.prj file using ArcMap
    • Compare US map in various Data Frame projections

 

Map Projection Classifications based on preservation properties

n       The conformal property, preserves the shapes of small features on the Earth’s surface (directions). This is useful for navigation. E.g., Mercator projection and Gnomonic projection.

n       The equal area property, preserves the areas. This is useful for analysis involving areas like the size of a land parcel, e.g., Goode’s projection.

n       Any projection can have either conformal property or equal area property, but not both.

 

Map Projection classifications based on physical surface models

n       Cylindrical projections -- wrapping a cylinder of paper around the Earth, projecting the Earth’s features onto it, and then unwrapping the cylinder;

n       Azimuthal or planar projections -- touching the Earth with a sheet of flat paper;

n       Conic projection -- wrapping a sheet of paper around the Earth in a cone.

n       All three types can have either conformal property or equal area property, but not both.

 

Unprojected projection: Plate Carrée or Cylindrical equidistance Projection

n       Just maps longitude as x and latitude as y.

n       Heavily distorts image of the Earth.

n       It does not have either conformal or equal area property.

n       But it maintains the correct distance between every point and the Equator.

n       Serious problems (distorted area, direction and other properties) can occur when doing analysis using this projection.

 

The Universal Transverse Mercator (UTM) Projection

n       Projected by wrapping a cylinder around the Poles, rather than around the Equator.

n       There are 60 zones. Each zone is 6 degree wide and wrapped along a particular line of longitude.

n       The projection is conformal, the scale is the same in all directions.

n       UTM coordinates are in meters, making it easy to make accurate calculations of short distances between points.

n       UTM projections have more problems at high latitudes.

 

State Plane Coordinates and other local systems

n       UTM is still not accurate enough for small area surveying.

n       During 1930s, each US state adopt its own projection and coordinate system, generally known as State Plane Coordinates (SPC).

n       Each state chose its own projection based on its shape to minimize distortion over the area of the state.

n       Some states have more than one internal zone.

n       The North American Datum 1983 (NAD83) is commonly used for SPC.

 

Converting Georeferences

n Two datasets can differ in both the projection and the datum, so it is important to know both for every dataset (and the data can be expressed in feet or meters with different origins!)


n      Use coordinate conversion to combine datasets that use different systems of georeferencing. Keep in mind, changing projections means the system must convert projected coordinates back to lat/lon (geographic) and then re-project into another projection/datum.


n       Convert into projections that have desirable properties, e.g., no distortion of area, for analysis.

More info on Coordinate systems and projections




Limitations of 'flat-file' data model & motivation for relational model (for attributes)
  • One 'flat file' data table for each map layer is too simple a data model
    • From lab exercises: one-to-many issues
      • some houses in sales89 may have sold more than once!
      • but, map has only one dot per house
      • what sale price is correct? average, most recent sale, inflation adjusted average, ...?

    • Using MassGIS map of Mass towns: another one-to-many issue
      • Add matown00.shp shapefile from data locker
      • Plot thematic map of density = pop90/area
      • What is wrong with thematic map of population density?

      • [show problem in ArcMap before discussing!]

         


        • Mass towns with rivers and islands have more than one polygon
        • Data often are aggregated at city/town level
        • Hence, hard to compute population density = people / sum(town area)

    • Typical urban planning context: many-to-many issues
      • Parcels can be owned by one or more owner
      • Owners may own more than one parcel
      • Owners change when parcels are sold
      • How do you find all owners with more than one foreclosed parcel since 2007
  • Strategy: Relational Data Model (i.e., linked tables)
    • Retain tabular model for textual data but allow many tables that can be related to one another via common columns (attributes)
      • Allows augmenting attribute tables by adding additional data
      • Handles one-to-many issues (where rows have different meaning in joined tables)
    • ArcMap strategy: (subset of fully relational model)
      • Maintain one-to-one link between textual data and associated geometry for a class of objects
      • Simplify one-to-many issues when joining in additional tables and hope for the best
        • Keep-or-remove feature when joined table has not match (e.g., homes without any sales transaction)
        • Ignore all but one match when more than one joined row matches a feature (e.g., multiple sales of a single home)
      • Create collapsed 'summary' table first, before joining new table to spatial features

      • For Mass density example:
        • create 'summary' table with one row per town and a column with sum of area
        • then join back via town ID to shapefile with one row per polygon
      • For multiple sales example:
        • create 'summary' table with one row per house and columns for number of sales, maximum price, average price, etc.
        • then join summary table via house ID to the original sales table

  • Relational Database Management (RDBMS) is underneath most computing
    • database-driven web pages: e-business catalogues, online newspapers
    • most transaction processing: airline reservation, ATM transaction, cellphone call logging
    • structured query language (SQL) for joining tables and specifying queries is the lingua franca of distributed database operations
    • Big business: Oracle, IBM (DB2), Microsoft (MS-Access, SQL-Server), PostgreSQL,...
  • Rest of Lecture
    • Intro to theory and terminology of relational database management
    • Illustrate RDBMS using MS-Access and Lab #4 datasets

 

A Table (relation)

                                                  Attribute name


tuples  
 

 

 

 


 

 




Rows: tuples that specify particular relations among values of each attribute
Columns: attributes describing a feature of interest
Domain: data types and range of values for an attribute (like integer, strings, floats, dates, city names, etc.
)

 

Relational Databases

n A relational database is a collection of tabular relations, or tables.

n A relation schema defines the structure of the relationship, and is composed of a set of attribute names and a mapping from each attribute name to a domain (that is, the possible values of that attribute)

n A relation is a finite set of tuples associated with a relational schema in a relational database (that is, a 'table' where each row is a tuple and the columns are the things that are related)

n A database schema is a set of relation schemas.

n A relational database is a set of relations.

n BEWARE: ArcGIS (and most lay folks) use different terms:

    • ArcGIS teminology:
      • Join two tables into one combined table by specifying the columns that link them
      • Relate two tables (but keep them separate) by specifying the columns that link them
    • Relational Database terminology:
      • Join two (or more) relations by matching tuples (rows) based on common values in the corresponding columns
      • 'Select' statements generate new relations (tables); 'views' save the query that will generate the combined relation (table)

 

A Relational database example

n  Relational schema:

n State (State_code, state_name, state_capital)

n  Database schema:

n State (State_id, state_name, state_capital)

n city (city_code, City_name, State_code)

n State_census (state, population, housing_unit_count, median_household_income, )

n  Primary Key: a minimal set of attributes whose value uniquely identifies each row (tuple).

Entity-Relation diagram for three sample tables (in MS-Access):
(These illustrative tables are available in MS-Access in the class data locker: lec5_cities.mdb

ER diagram for tables

Three sample tables in MS-Access:

Three simple tables

 

Operations on Relations

  • Project operation [pick columns] applies to a single relation & returns a subset of attributes of the original.
    • SELECT state, population FROM census_table

  • Restrict operation [pick rows] return those tuples (rows) that match a 'where clause' condition
    • SELECT state, population FROM census_table
    • WHERE population > 0


n Join takes two relations as input and returns a single relation.

n Join (rel1, rel2, att1, att2)

  • Joins can get complicated (drop rows if no match?, join 3+ tables, ...)
  • Order of operation can affect results and and performance.
  • Language varies slightly across vendors:

    • SELECT state, population, state_code, state_capital
    • FROM census_table c, state_table s
    • WHERE population > 0
    •    AND c.state = state_name

 

Structured Query Language (SQL)

n To define the database scheme (data definition),

n To insert, modify, and retrieve (SELECT) data from the database (data manipulation).

SELECT columns

FROM tables

WHERE row conditions and joins are matched

GROUP BY non-aggregated columns

ORDER BY certain columns

n  Can get complex (subqueries in 'from' and 'where' clause, 3+ tables, ...)

n Can be especially powerful when rows in each table have different meaning

e.g., join city table to state table, parcel table to owner table, or house sales table to house table

 

 

Given the three city/state/census tables above, answer the following queries

n (1a) What is the result if you associate STATE_CENSUS table with STATE table?

n  (1b) What is the result if you associate STATE table with STATE_CENSUS table?

n  (2) What is the result if you associate STATE table with CITY table?

n  (2b) What is the result if you associate CITY table with STATE table?

n  (3) List each city with its state, state capital, and statewide population density

 

n  We will examine and manipulate these tables in MS-Access: lec5_cities.mdb

 

 

Example #2b (ms-access SQL query):

Query view:

Join state and city

Table of results: (datasheet view)
city-state results

SQL query: (SQL view) city-state SQL
This SQL can be simplified via abreviations and doing 'join' in where clause:

SELECT c.City_code, c.City_name, c.State_code, s.State_name, s.State_Capital
  FROM CITY c, State s
WHERE c.State_code = s.State_Code;
Example #3: List state, city name, and statewide population density:

Query view: (Note, the expression builder window shows the population density computation.)

state_pop_density table

Table of results: (datasheet view)

state-pop-density table

Notice that there is one row per city with the state info duplicated for all the cities in the same state. If we wanted to list one row per state, we could replace the 'city_name' column with, say, a count of the number of cities in the city table that are in each state. To do that in ms-access, you need to add the 'totals' row to the query design and select the appropriate 'group by' and aggregate functions (such as 'count' or 'sum' or 'avg') that should be applied to the attributes in each column. Can you see how to do this? (It is not that hard, just more than we have spent time explaining at this point.)

SQL query: (SQL view)

state-pop-density SQL

This 3-way join is ugly and seems complicated but is straightforward with standard SQL and relational algebra. If we changed the query to get one row per state with the count of cities in each state (as suggested above), we would have to use a 'count' function in the 'select' clause and also add a 'group by' clause. (Again, this is not hard, just more than we have spent time explaining at this point.)

We want to get comfortable enough with MS-Access to be able to do basic join-summarize-manipulate queries on a few tables in order to prepare data for mapping and spatial analysis. Lab #4 on Monday will get us started.

 


 

n Using MS-Access to query tables: lec5_cities.mdb

n Get feel for MS-Access basics - we will use it for large census tables in two weeks

n Focus on basic joins and queries: graphical query interface and equivalent SQL

n (Ignore complexities regarding 'best' table structure (normalization) and complex join issues )

 

n ArcGIS has basic join/query tools but runs out of gas for complex data manipulation

n MS-Access is better but still has problems with multi-user sharing and complex queries

n Next Monday's Lab #4 uses ArcGIS data manipulation tools and buffering

n Later, we will use MS-Access with larger census datasets (to examine spatial patterns of workers driving alone to work)

 

Notes and Tutorials for MS-Access Help


<<< lecture ends here - additional notes below are extra >>>


 

Illustrate use of relational tables, joins, and ER diagrams (preview of lab #4 exercises)

  • Use MS-Access to do data manipulation for Lab #4 (mapping Mass population density): lec5_mass.mdb
    • lab #4: 351 Mass cities and towns but 600+ polygons in shapefile - find density
    • Import DBF attribute table for Masss Town boundaries into MS-Access
    • Write query to sum areas of each part of a town
      • Note data type issues (integer could be too large...)
    • Create new table with Town name, total area, pop90 and pop90 density
    • Read MS-Access from ArcGIS and map density
  • Generate ER-diagram (of table relationships) for matown00.dbf and your new table
  • Understand SQL queries and graphical interface to assist in building/saving queries
  • ArcMap can import tables (and queries) from MS-Access but can be troublesome
    • Regular 'Add-Data' option sees MS-Access (mdb) tables but not queries
    • ODBC database connection sees queries as well but has data type problems (floats come across as integers)

Other (abstract) Database Concepts (generally beyond scope of class)

 

Distributed databases and Federated databases

n Distributed databases refer to one database (or data replicates) that are distributed across multiple sites.

n Federated databases refer to many similar databases that are distributed across multiple sites but are more loosely coupled and additional rules may be needed to cross-reference tables meaningfully

n Federated databases are also called distributed relational database with fragmentations.

 

Relational data model (our focus)

n      Collection of interrelated tables

n       Highly structured data schemas and data types with data dictionary

n       We'll examine US Census Bureau examples in Lab #5

 

Relational Database as Entity-Relationship Model

n E-R model sees the world as inter-related entities (tables);

n Use MS-Access to build E-R diagram for the three city/state tables: lec5_cities.mdb.

n  Entities (or entity types like cities and states in our example) are related with each other by a relationship (linking primary and foreign keys - e.g, which city is in which state)

n  E-R model uses E-R diagrams to describe relations between entity types (generally one-to-many connections)

n E-R model describes the static state of the entity types.

 

Object-Oriented Model - one alternative to E-R for RDBMS

n Object-oriented model sees the world as inter-related objects.

n Object is dynamic and has its own lifespan. Hence OO model is used to deal with the dynamic nature of real-world object.

n Object = static state + functionality

n Object with similar behaviors are organized into types, a semantic concept.

n Object Class = data structure + methods, an implementation construct.

 


 

Other Notes about RDBMS and relational tables in ArcGIS

 

Joining and Relating Tables in ArcGIS
n          Join in ArcGIS appends the attributes of the non-spatial table to the spatial (layer) attribute table.
n          Relate in ArcGIS does not append attributes; only establishes a logical relationship so that when you select one record in one table you can see the matching records in the other table.
Don't get confused between ArcGIS 'relate' (which describes the relationship between two tables, and RDBMS terminology where 'relation' = a table)
 
When to Use Join and Relate
n       Relate is preferred if the non-spatial table is maintained and updated constantly while the spatial data is not. (e.g., Mass towns shapefile plus summary table with data for each town)
n       Use relate when the relationship is many-to-many.
n       Use relate when you have a very large non-spatial table and you don't need all the attributes in the table.
n       Columns in 'related' table cannot be thematically mapped - they must first be 'joined' in
n       In other situations, you could use either.

 



Last modified 21 February 2017 [jf]. Back to the 11.188 Home Page.
Back to the CRON Home Page.