Massachusetts Institute of Technology
Department of Urban Studies and Planning


11.521: Spatial Database Management and Advanced Geographic Information Systems

11.523: Fundamentals of Spatial Database Management

Lab 1: Introduction to SQL, GIS, and Lab Setup

5 February 2013


Due: Tuesday, February 12, 2013 ==> Extended to Thursday, Feb. 14 (due to storm...)

Motivation:

Urban sensing, pervasive computing, and the widespread availability of georeferenced data and geoprocessing tools have greatly expanded the capacity of urban planners to observe, analyze, visualize, and model the spatial deployment of people, infrastructure, and social and economic activity and interactions. While the availability of relevant data has expanded exponentially, good analysis is still hard and appropriate data integration and interpretation requires considerable technical skill, contextual understanding, and artful choices. For any particular urban planning problem, relevant data are often voluminous (but incomplete), dispersed across multiple agencies and sources, and rarely cross-referencable through simple one-to-one relationships. This subject (the full-semester 11.521, and half-semester 11.523) provides hands-on experience with such urban planning and spatial analysis settings using basic GIS and relational database management tools in a distributed computing environment.

Lab Objectives:

This lab exercise has two primary purposes: (1) to acquaint you with the computing environment and basic GIS and relational database tools that we will be using this semester, and (2) to introduce you to relational joins and database queries using the structured query language (SQL).  For many of the 11.523 lab exercises, we we use parcel data, sales data, and building footprints for East Boston to estimate and visualize the spatial variation in land value and some of the amenity and accessibility characteristics that influence land value. Today, we will get started by using ArcScene to visualize the built form of East Boston by extruding building footprints to the building heights that were estimated using LIDAR data. Developing useful estimates of land value involves more analysis and computation since zoning, building type, point-of-interest locations, and other factors can complicate the estimation and visualization. We will get to this in later labs.

If someone else has already assembled, cleaned, and cross-referenced all the data and tools that you need, then 'urban analytics' does not look too hard. But urban planners tend to ask questions that have not already been answered, and much of the task involves finding, filtering, manipulating, and merging data acquired from several different sources. For sizeable datasets, these steps require a deeper understanding of data management tools, location tagging, and data structure. To get started with some of these skills, we will focus on a 'relational' data model of linked tables and the use of 'structured query language' (SQL) to query relational databases. For these SQL queries, we will use a set of seven, small tables that are similar in structure and content to the types of information about property ownership and landuse that are maintained by most local assessing and planning offices.  We will use MS-Access, Excel, and, eventually, Oracle and PostgresSQL to examine this small 'toy' database of illustrative parcel, zoning, and tax tables.

For those who would like to brush up on your familiarity with ArcGIS, we suggest the first few lab exercises from past versions of 11.188 and/or 11.205 or 11.520 ( the prerequisites for this course) and/or some of the 'virtual campus' tutorials that ESRI provides. The first two 11.188 lab exercises from last Spring are here:
Exercise #1: http://mit.edu/11.188/www12/labs/lab1
Exercise #2: http://mit.edu/11.188/www12/labs/lab2
The exercises from 11.205 and 11.520 are available on Stellar: https://stellar.mit.edu/S/course/11/fa12/11.205/index.html
You can obtain passwords for the virtual campus exercises from the MIT library staff in the GIS lab location in Rotch Library. For m ore information of GIS assistance from the MIT Libraries see: http://libraries.mit.edu/gis

Additional Background Material:

Additional in-class lab notes are available here: http://mit.edu/11.521/www/11.521_day1_notes13.html

For an example of a recent DUSP dissertation that analyzed voluminous, spatially detailed datasets using many of the spatial data management and advanced GIS methods learned in this class, see Mi Diao's PhD dissertation, "Sustainable Metropolitan Growth Strategies: Exploring the Role of the Built Environment" (Sept. 2010).


Part I: Athena Space & ArcGIS Example

Class Lockers and WinAthena lab machines: We are going to use our class and personal network lockers within the 'Athena' space on the andrew network file system (AFS). All data we will use in the first half of the class (that is, pre-project) will be stored in /afs/athena.mit.edu/course/11/11.521/data. From WinAthena PCs on campus, this class data locker can be accessed via drive Z: [Navigate the directory tree to get to: Z:\athena.mit.edu\course\11\11.521\data ]. For your convenience, you can use Tools/Map-network-drive in Windows Explorer to mount the class locker as drive K:. You will save your own work in your personal Athena locker (which is automatically attached as drive I: when you log into the lab machines). The class exercises can also be done on your own laptop or desktop computer (both on and off campus) if your machine is properly configured (with ArcGIS, Oracle Client tools, AFS, Kerberos, and - if off campus - a virtual private network connection). All this software is free for MIT students but the installation can be tricky. MIT also supports a Citrix server that lets you run some of the exercises on the server from your latop or desktop even if the GIS and RDBMS software is not installed on your machine. We will provide instructions and limited support for such personal computer connections.

ARCGIS REVIEW

Running ArcGIS on CRON or WinAthena machines around campus can be painfully slow unless you take steps to insure that default files and datasets are *not* located on your network locker space. In general, we recommend that you copy all data, saved documents, and workspaces to a local hard drive before beginning lab exercises. (There will be exceptions later for large datasets stored in database engines on the net.) Rob Goodspeed (DUSP, PhD student) wrote a useful set of notes (for 11.205/11.520 last Fall) that explains how to setup your workstation to improve the ArcGIS performance. This document is available via Stellar on the class website: https://stellar.mit.edu/S/course/11/fa12/11.205/courseMaterial/topics/topic15/resource/11.520_Setup_Guide/11.520_Setup_Guide.pdf

and is also reproduced, for your convenience, in our class locker: 11.520_Setup_Guide.pdf

 

As a warmup for next week's ArcGIS exercises (in Lab #2), use Rob's notes to setup your ArcGIS environment and then copy this entire folder from our class locker to your local drive:

FROM this folder in class locker: Z:\athena.mit.edu/course/11/11.521/data/eboston05

TO this local drive: C:\temp

 

Within this local copy of the 'eboston05' folder is an ArcMap document, eboston05_lab2.mxd. Double click on this document to open ArcMap and load the East Boston parcels and building footprints. Your screen should look something like this:

East Boston startup map with web services

 

The ebos_parcels05 shapefile shows the boundaries of East Boston parcels as provided by the Boston Assessing Department for 2005. The ebos_buildings02 shapefile contains the building footprints for East Boston parcels as of 2002. Neither shapefile contains many attributes - mostly IDs that can be linked to other data tables. We will link the parcel shapefile to assessing data in next week's lab. But the building footprint shapefile does have estimates of the ground and roof heights of each building. These estimates were developed from LIDAR data that was funded by Homeland Security around 2002 and supported by local state agencies. MassGIS, the state GIS office, provides downloads of the building footprint layer with the height attributes from their public website:
http://www.mass.gov/anf/research-and-tech/it-serv-and-support/application-serv/office-of-geographic-information-massgis/

 

Now let's use ArcScene to visualize the East Boston buildings in 3D. Use Start/ArcGIS/ArcScene-10.1 to open the ArcScene window. Drag and Drop the ebos_buildings02 and ebos_parcels05 shapefiles from the table of contents in the ArcMap window to the table of contents in the ArcScene window. Make sure that the ebos_buildings02 layer is 'on top of' the parcel layer. Now right-click the ebos_buildings02 layer and choose 'properties.' We can click the 'Extrusion' tab and tell ArcScene to use the roof heights to extrude the building footprints up to the roof height. Since the ground height varies across East Boston, we should specify the extrusion to be [ROOF] - [GND], that is the difference between the roof and ground heights. Click OK after setting this extrusion expression and play with the visualization window to get used to the interface for 3D interaction. Often, users will want to exaggerate the height to facilitate visualization. You may want to reset the extrusion to be 5*[ROOF] - [GND] in order to get more height variation. The result should look something like this:

East Boston building extrusions

 

This is all that we are going to do with ArcScene and the East Boston data today. We are a long way from estimating building and land values from assessing data and we may also want to use a surface model that does not assume 'flatland' for East Boston. Once we have done more analysis to estimate parcel and building values we can tag our data with those estimates or generate 'hot spot' envelopes that help visualize spatial patterns. For now, just take a screen shot of your ArcScene window and submit that as part of your results for this exercise.

 

In addition, for those who have not taken 11.188, 11.205 or 11.520, we require that you complete lab 1 of 11.188 (http://mit.edu/11.188/www12/labs/lab1) and hand in the final layout you create. You may also want to review lab2 of 11.188 - See: \afs\athena.mit.edu\course\11\11.188\www12\labs\lab2\lab2.html - which is the same location as: http://mit.edu/11.188/www12/labs/lab2

PART II: Database Queries using MS-Access, Excel, and ORACLE

To facilitate learning basic relational database concepts, we will utilize a set of seven small tables that encode landuse and ownership information about PARCELS, OWNERS, SALES, FIRES, PERMITS, TAX, and ZONING. This is a 'toy' database with only a few rows in each table but with a table format (schema) and values that are typical of parcel-related data that are maintained by municipalities and counties. An MS-Access version of this database contains the seven tables and is called 11.521_lab1.mdb. It is available in the class data locker as 11.521_lab1.mdb (using the older 2002-2006 format for MS-Access databases for increased compatibility). The location of the file in the AFS network file system is /afs/athena.mit.edu/course/11/11.521/data/11.521_lab1.mdb. You can find this database by navigating down the directory tree on Drive Z: to reach Z:\athena.mit.edu\course\11\11.521\data\11.521_lab1.mdb or down the shorter path if you have mounted the class locker as Drive K:. You can also download and then run a copy of this database by entering this web address into a browser: http://mit.edu/11.521/data/11.521_lab1.mdb

 

Here is an Entity-Relationship diagram of the seven tables (prepared in MS-Access) together with links to more detailed listings of the entries in the seven tables.

 

Relationships among the Seven Tables
The 'Toy' Parcel Database
parcels_schema
  • Each row of the PARCELS table contains the parcel identifiers, address, owner, landuse, and lot size of each parcel.
  • Each row of the OWNERS table lists the name and (mailing) address of owner. The owner number (ownernum) in the OWNERS table matches the owner number (onum) in the PARCELS table.
  • Each row of the SALES, TAX, FIRES, PERMITS, and ZONING tables list transaction information relevant to particular parcels.
  • The SALES, TAX, FIRES, AND ZONING tables are linked to the PARCELS table via the parcel ID (parcelid).
  • The PERMITS table is linked to the PARCELS table by the combination of two columns (labeled pid and wpb).
  • Follow these links for a listing of the rows in each table:

    FIRES | OWNERS | PARCELS | PERMITS | SALES | TAX | ZONING

     

     

    For our introductory database query exercises we will use versions of this same 'toy' parcel database stored in different formats: the MS-Access database just mentioned and an Oracle database engine running on a virtual machine supported by CRON and named greenline.mit.edu. To access the tables in Oracle, we will use so-called 'client/server' technology and a simple client-side query tool called SQL*Plus-- a free Oracle client that can connect to a 'backend' oracle database engine using standard TCP/IP protocols, a standard query language (SQL), and access control. Most of this is transparent to the user.

     

    Using MS-Access with the 'toy' parcel database

    Steps

    A. Login to a WinAthena PC using your Athena (MIT email) account. Make sure the domain is set to “ATHENA.MIT.EDU (Kerberos Realm)”.

     

    Option 1: (retrieve the database through the file system)

    B. Attach the class locker as Drive K (as explained above) or navigate the AFS folders via Drive Z to find the DATA sub-directory in the class locker.

    C. Copy the MS-Access database file to somewhere on your local drive that is writeable by you. (e.g., C:\TEMP). NOTE that, on WinAthena machines, the DESKTOP is NOT stored on a local drive but is redirected to your Athena locker (on Drive I:). For the small 'toy' parcel dataset, it will not matter whether you use a truly local drive or your Athena locker. However, when we work with larger datasets and ArcGIS tools, having your data and scratch space on network storage will slow down the processing and greatly decrease the reliability of the operations.

    Option 2: (download the database form a browser)

    B. Enter this web address into a browser: http://mit.edu/11.521/data/11.521_lab1.mdb

    C. Save the downloaded file to somewhere on your local drive that is writeable by you. (e.g., C:\USERTEMP). NOTE that, on WinAthena machines, the DESKTOP is NOT stored on a local drive but is redirected to your Athena locker (on Drive I:).

    B. Double-click on your local copy of 11.521_lab1.mdb to open this database file in MS-Access

    Once MS-Access opens the database, you may see a security warning: "Certain content in the database has been disabled." This warning is because the database contains saved queries that could conceivably be a malicious program. Click 'options' and choose 'enable this content.'

     

    You will see listed in the table of contents each of the 7 tables listed in the above diagram. Double-click on the tables to take a look. You will also find the entity-relationship diagram via the Tools/Relationships menu.

     

    Highlight the 'Queries' choice under 'objects' in the main table-of-contents window and notice the two saved queries: q_midsize_parcels and q_parcels_with_fires. If the queries are not listed, left-click the 'Tables' heading and choose 'all access objects." The first query 'q_midsize_parcels' uses the parcels table to find all parcels whose square footage is between 10,000 and 100,000 square feet. The second query 'q_parcels_with_fires' joins the parcels table to the fires table (using PARCELID) and lists all the parcels that have had a fire along with the fire date and estimated loss.

     

    Double-click on the first query to see the results. Then, choose View/Design to see the GUI for building and editing the query. Also, take a look at the SQL view to see the more-or-less standardized text that describes the query using Structured Query Language syntax. Do the same for the other query. SQL is the lingua franca of client-server communication with online database engines and sits under the covers of database applications including most web services and online geoprocessing.

     

    Modify the first query to show all parcels with less than 100,000 square feet (not just those bigger than 10,000 or with missing data). Modify the second query to omit the pid, wpb, add1, and add2 fields and to omit parcels whose landuse is equal to 'R3'.

     

    Cut and paste the tables produced by your modified queries into an editor of your choice. Label them appropriately, add your name and them save this text document. Turn it in as part of your Lab #1 exercise.

     

    Using Excel with the 'toy' parcel database

    You can access databases saved in MS-Access databases (or in Oracle and other relational database management systems) from many desktop tools such as Excel.

     

    Before trying to access the data from Excel, make a second local copy of 11.521_lab1.mdb and rename it to lab1.mdb (since Excel will choke on MS-Access database names with more than 8 characters!).

     

    Open Excel and choose Data/Import-External-Data/New-Database-Query and select MS-Access Database when the dialog box opens up. Navigate to your local copy of lab1.mdb and choose all the columns from the parcels data base. Beware that the MS-Query tools used by Excel to interface with the database is fragile. Not only will it break if the MS-Access database name is too long, but it will choke on the path name if it contains special characters such as the period '.' in 11.521! When you try to open lab1.mdb, you may see a warning message, "Microsoft Office has identified a potential security ocncern." This is because a malicious program might conceivably be disguised as a MS-Access database. You will need to click 'enable' to get past this message and then select the MS-Access tables or queries that you wish to import into MS-Excel. Select the 'parcels' table.

     

    Use the business graphics tools of Excel to plot a bar graph showing the square feet of each parcel in the parcels table. Label the graph appropriately and include a jpeg or png copy of the graph in what you turn in for lab1. (A PDF file of your exercise results is also okay.)

     

    Excel (and MS-Access and ArcGIS) can also access data directly from our Oracle server (and most other relational database servers such as SQL Server, Postgres, MySql, etc.). We will use this capability in later exercises.

     

    Using SQL*Plus to run queries on Oracle tables

    We do not need to access the 'toy' dataset from Oracle as part of today's lab exercise. However, the same seven tables are stored in an Oracle database engine online and we will use this database engine (with much larger tables) later in the semester. Here are the steps needed to connect to our Oracle database engine using SQL*Plus from the WinAthena computers in our lab (Room 37-312) or from other CRON machines running Windows (or from any other device connected to the internet if you have the needed access permission and local software.) SQL*Plus is a barebones, command-line interface for querying data stored on Oracle database servers. It is provided for free by Oracle. There are many other client applications that provide more elaborate SQL-based graphical interfaces to relational database engines, including SQuirreL, an open source, java-based universal SQL client that is available at http://www.squirrelsql.org and can run on Macs. However, to get comfortable with the basics of SQL queries and relational thinking, we will start with SQL*Plus.

     

    Since most of you are unlikely to be familiar with SQL*Plus, we spell out some of the tips and tricks for using the application and connecting it to Oracle.

    Steps

    (A) After you are logged in to one of the lab machines, run the SQL*Plus application from the lab machines via: Start/Programs/Oracle/SQL Plus

     

     

    (B) Fill in the user name and password of an authorized Oracle account.During the first class, you are going to use the public account—user name “parcels” and host string = "CRL". The password will be given by the instructor in class. During the first lab, we will collect your MIT information in order to create a personal Oracle account for you. In subsequent labs you will use your own Oracle account to access the class databases.

     

    NOTE: On Windows-7 machines, the SQL*Plus application may not open this "LogOn" window. Instead, SQL*Plus will open a text-based "SQL Plus" window with a message something like this:

    SQL*Plus: Release 11.2.0.1.0 Production on Tue Feb 7 13:58:15 2012
    Copyright (c) 1982, 2010, Oracle. All rights reserved.
    Enter user-name:

    In this case, enter "parcels@CRL" as your user name. When prompted, enter your password and then you should become connected with the "SQL>" prompt as indicated below.

     

    (C) The interface of Oracle SQL Plus looks like the following (but with a later version of Oracle 11g.). “SQL>” is the prompt in Oracle SQL Plus which indicates that it is ready for Oracle SQL Plus to accept your commands.

     

    Running Basic SQL Select Statement Queries

    At this point, you have all the connections and windows in place and you are ready to begin doing SQL queries. Now let’s use our small, hypothetical parcel and property tax database to get a feel for how we can use SQL*Plus to run simple queries.  Elsewhere on the class help pages, we describe this sample parcel database and the basic structure of the SELECT command.  Type in some simple queries, or you can use cut-and-paste to grab pieces of queries shown in the table below. You will probably find it helpful to edit the queries in a separate window with a text editor such as 'Notepad'. When you are finished editing a query, you can run it by cutting and pasting the text into the SQL window. Remember to end each SQL statement with a semi-colon ';' before pressing the 'Enter' key. Try out each of the following SQL examples of SELECT statements - Do you understand what each query is doing? Does the SQL syntax make sense? [In addition to the select...from...where structure, note the use of wildcards and the treatment of missing values, arithmetics expressions, aggregate functions such as min(tax) and count(*) ]

     

    Query

    Tasks

    SQL Statements

    1

    The simplest query listing the TAX table

    SELECT * FROM tax;

    2

    A query with an expression and a column alias

    SELECT parcelid, (landval + bldval) AS tot_val, tax
      FROM tax;

    3

    A simple query that aggregates (groups) over a whole table

    SELECT COUNT(*), MIN(tax), MAX(tax), AVG(tax)
      FROM tax;

    4

    This example uses a literal string in the SELECT list

    SELECT parcelid, 'Total prop value is', (landval + bldval), tax
      FROM tax;

    5

    A simple query using the "IS NULL" syntax to list only those parcels with at least 20,000 sq. ft. and no missing value

    SELECT *
      FROM parcels
     WHERE sqft >= 20000 OR sqft IS NULL;

    6

    Compare the results of the query 5 with query 6. Why are they different?

    SELECT *
      FROM parcels
     WHERE sqft >= 20000 OR sqft = NULL;

    7

    A query to find values within a certain range

    SELECT *
      FROM parcels
     WHERE sqft >= 10000 AND sqft <= 100000;

    8

    Another way of writing the query above using the
    -- "BETWEEN" keyword

    SELECT *
      FROM parcels
     WHERE sqft BETWEEN 10000 AND 100000;

    9

    A simple join between PARCELS and FIRES. Note that only those parcels which had fires are listed.

    SELECT p.*, f.estloss
      FROM parcels p, fires f
     WHERE p.parcelid = f.parcelid;

    10

    A slight variation of query 9.

      SELECT p.parcelid, p.sqft, p.landuse, f.estloss
        FROM parcels p, fires f
     WHERE p.parcelid = f.parcelid
    ORDER BY f.estloss;


     

    TUNING SQL*PLUS AND SPOOLING OUTPUT TO A FILE

    The end of the 'SQL Help' web page contains some useful tips and tricks to customize your SQL*Plus environment, to save and retrieve SQL queries, and to spool SQL output tables into text files. Review these tips and tricks paying particular attention to

    1.            the use of set linesize nnn to set the line width to nnn characters

    2.            the use of COLUMN sss FORMAT A5 and the like to control formatting of column output, and

    3.            the use of the spool file command to start and stop the spooling of SQL*Plus output into whatever text path+file you specify.

    Regarding the spool command, if you do not specify a full path name for the file, it will be written in the default directory -- that is, your H drive. Try out the spool command by spooling to a path+filename, running one or two SQL queries, turning off the spooling, and then opening up the spooled file in your text editor window. [Note that the spooled copy of your output is not written to disk until after you turn spooling off.]

    PRINTOUT

    We cannot save or print the SQL statement and the query result directly from Oracle SQL Plus. There are two alternative ways to save or print them out.

    1)            Spool (as discussed above) the SQL statements to a "*.TXT" or “*.LST” file, open it in a text editor such as Notepad, then print from Notepad..

    2)           Use the mouse to copy your SQL statements and paste them into a text editor such as Notepad, then print from Notepad..

     

    'Print' out the SQL statements and the query results for Query 3, 7 and 10 in the above table of queries. Actually, there is no need to generate printed hardcopy. All the exercises will be submitted electronically to the class homework locker on Stellar. CRON machines have a 'printer' named 'Adobe PDF' that, instead of printing a document, saves it in PDF format within a folder of your choosing. All Athena clusters have access to physical printers via a system called 'Pharos' that controls the routing of your printout. See this website: http://kb.mit.edu/confluence/display/istcontrib/Pharos+Public+Student+Printing+at+MIT for further information about how to print from Athena machines. See this website: http://cronlasso.mit.edu/cron/p.lasso?t=4:2:0 for further information about printing from CRON machines. Basically, after 'printing' you must release a print job for printing on a particular machine.

     

    Next door to the 37-312 lab in Room 37-324 there are three printers including a good quality HP color laserjet printer that can print 11x17 inch pages. But, since it is free, it is often overused and often not working. In any event, there is no need to print anything today. There are also two printers available around the corner from the 9-251 lab: a grayscale printer named B9-235, and a color printer named C9-235-LTR. CRON also provides large format plotting capabilities.

     

    Your file(s) containing your lab exercise #1 can be uploaded to Stellar later this week when we have provided access to you to the class homework turn in area on Stellar.

     

    PART IV: WHAT TO TURN IN

    This first lab exercise is intended to acquaint you with the computing environment used for the class and with the tools available to move data among various desktop applications and database engines. Don't worry if you are not familiar enough at the outset the various tools and the steps needed to make the connections. We will discuss them a bit in today's lab and again in subsequent lectures and labs.

     

    You should turn in PDF copies of the the three 'printouts' described in Part II above (one each from ArcScene, MS-Access, and Excel) and the map you created in the ArcGIS review exercises (if you haven’t already taken 11.520 or 11.188). Make sure that your name, MIT email address and the date are on the submission. If you wish to include the PDFs as part of a Word document that you submit as a single file, that is okay too. If you wish to submit homework using some other file format, please check with the Instructor regarding the particular format. The lab assignment should be uploaded to the 11.521 Stellar website (https://stellar.mit.edu/S/course/11/sp11/11.521) by the start of next week's lab exercise at 4pm Tuesday, February 12 2013. (Note, you will not be able to access the stellar site until a few days from now when you have been added to the class list.)

    NOTE:

    (1) due to the snowstorm delay, we have extended the due date for this lab until Thursday, Feb. 14, 2013, at 4 pm.

    (2) For further information about the usage of the 37-312 cluster, please reference the following link: http://ist.mit.edu/services/athena/clusters


     Home | Syllabus | Lectures | Labs | CRN | MIT
     last modified: 10 Feb 2013 (jf)