Massachusetts Institute of Technology
Department of Urban Studies and Planning


11.521: Spatial Database Management and Advanced Geographic Information Systems

11.523: Fundamentals of Spatial Database Management

Lab 1: Introduction to Oracle, SQL, and ArcGIS 9.3

2 February 2010


This lab exercise has two primary purposes: (1) to acquaint you with the computing environment and basic GIS and relational database tools that we will be using this semester, and (2) to introduce you to relational joins and database queries using the structured query language (SQL).  For the SQL queries, we will use a set of seven, small tables that are similar in structure and content to the types of information about property ownership and landuse that are maintained by most local assessing and planning offices.  We will use MS-Access, Excel, and an Oracle query tools to examine a small 'toy' database of illustrative parcel, zoning, and tax tables. For the ArcGIS brushup, we will use the lab exercises from 11.520 (one of the prerequisites for this course).

Additional in-class lab notes: http://mit.edu/11.521/www/11.521_day1_notes10.html

Part I: Athena Space & the 'toy' Parcel database

We are going to use our class and personal network lockers within the 'Athena' space on the andrew network file system (AFS). All data we will use in the first half of the class (that is, pre-project) will be stored in /afs/athena.mit.edu/course/11/11.521/data. From WinAthena PCs on campus, this class data locker can be accessed via drive Z: For our convenience, we will also attach the class locker as drive K: and we'll save our own work in our personal Athena locker attached to drive I:. The class exercises can also be done on your own laptop or desktop computer (both on and off campus) if your machine is properly configured (with ArcGIS, Oracle Client tools, AFS, Kerberos, and - if off campus - a virtual private network connection). All this software is free for MIT students but the installation can be tricky. MIT also supports a Citrix server that lets you run some of the exercises on the server from your latop or desktop even if the GIS and RDBMS software is not installed on your machine. We will provide instructions and limited support for such personal computer connections.

 

To facilitate learning basic relational database concepts, we will utilize a set of seven small tables that encode landuse and ownership information about PARCELS, OWNERS, SALES, FIRES, PERMITS, TAX, and ZONING. This is a 'toy' database with only a few rows in each table but with a table format (schema) and values that are typical of parcel-related data that are maintained by municipalities and counties. Here is an Entity-Relationship diagram of the seven tables (prepared in MS-Access) together with links to more detailed listings of the entries in the seven tables. The MS-Access database contains the seven tables and is called 11.521_lab1.mdb. It is available in : /afs/athena.mit.edu/course/11/11.521/data/11.521_lab1.mdb

You can find this database by navigating down the directory tree on Drive Z: or down the shorter path if you have mounted the class locker as Drive K:. To mount the class locker as drive K:\ on the lab computers, open a DOS command window (Start/Run + 'cmd') and type the command: attach -Dk 11.521

 

Relationships among the Seven Tables
The 'Toy' Parcel Database
parcels_schema
  • Each row of the PARCELS table contains the parcel identifiers, address, owner, landuse, and lot size of each parcel.
  • Each row of the OWNERS table lists the name and (mailing) address of owner. The owner number (ownernum) in the OWNERS table matches the owner number (onum) in the PARCELS table.
  • Each row of the SALES, TAX, FIRES, PERMITS, and ZONING tables list transaction information relevant to particular parcels.
  • The SALES, TAX, FIRES, AND ZONING tables are linked to the PARCELS table via the parcel ID (parcelid).
  • The PERMITS table is linked to the PARCELS table by the combination of two columns (labeled pid and wpb).
  • Follow these links for a listing of the rows in each table:

    FIRES | OWNERS | PARCELS | PERMITS | SALES | TAX | ZONING

     

    PART II: Database Queries using MS-Access, Excel, and ORACLE 10g with SQLPlus 9i

    For our SQL exercises we will use databases stored in an Oracle database engine running on a CRL server, named bulfinch.mit.edu. We will use so-called 'client/server' technology to query these databases via MITnet using ArcGIS, Excel, MS-Access, and SQL*Plus-- an Oracle client that connects to the 'backend' database engine using Oracle's Net9 with standard TCP/IP protocols. Most of this is transparent to the user.

     

    Using MS-Access with the 'toy' parcel database

    Steps

    A. Login to a WinAthena PC using your Athena (MIT email) account. Make sure the domain is set to “ATHENA.MIT.EDU (Kerberos Realm)”.

    B. Attach the class locker as Drive K (as explained above) or navigate the AFS folders via Drive Z to find the DATA sub-directory in the class locker.

    C. Copy the MS-Access database file to somewhere on your local drive that is writeable by you. (e.g., C:\USERTEMP). NOTE that, on WinAthena machines, the DESKTOP is NOT stored on a local drive but is redirected to your Athena locker (on Drive I:).

    C. Double-click on your local copy of 11.521_lab1.mdb to open this database file in MS-Access

     

    You will find each of the 7 tables listed in the above diagram. Double-click on the tables to take a look. You will also find the entity-relationship diagram via the Tools/Relationships menu.

     

    Highlight the 'Queries' choice under 'objects' in the main window and notice the two saved queries: q_midsize_parcels and q_parcels_with_fires. The first query uses the parcels table to find all parcels whose square footage is between 10,000 and 100,000 square feet. The second query joins the parcels table to the fires table (using PARCELID) and lists all the parcels that have had a fire along with the fire date and estimated loss.

     

    Double-click on the first query to see the results. Then, choose View/Design to see the GUI for building and editing the query. Also, take a look at the SQL view to see the more-or-less standardized text that describes the query using Structured Query Language syntax. Do the same for the other query. SQL is the lingua franca of client-server communication with online database engines and sits under the covers of database applications including most web services and online geoprocessing.

     

    Modify the first query to show all parcels with less than 100,000 square feet (not just those bigger than 10,000 or with missing data). Modify the second query to omit the pid, wpb, add1, and add2 fields and to omit parcels whose landuse is equal to 'R3'.

     

    Cut and paste the tables produced by your modified queries into an editor of your choice. Label them appropriately, and your name and print them out. Turn in this prinout as part of your Lab #1 exercise.

     

    Using Excel with the 'toy' parcel database

    You can access databases saved in MS-Access databases (or in Oracle and other relational database management systems) from many desktop tools such as Excel.

     

    Before trying to access the data from Excel, make a second local copy of 11.521_lab1.mdb and rename it to lab1.mdb (since Excel will choke on MS-Access database names with more than 8 characters!).

     

    Open Excel and choose Data/Import-External-Data/New-Database-Query and select MS-Access Database when the dialog box opens up. Navigate to your local copy of lab1.mdb and choose all the columns from the parcels data base. Beware that the MS-Query tools used by Excel to interface with the database is fragile. Not only will it break if the MS-Access database name is too long, but it will choke on the path name if it contains special characters such as the period '.' in 11.521!

     

    Use the business graphics tools of Excel to plot a bar graph showing the square feet of each parcel in the parcels table. Label graph appropriately and include a printout of the graph in what you turn in for lab1.

     

    Excel (and MS-Access and ArcGIS) can also access data directly from our Oracle server (and most other relational database servers such as SQL Server, Postgres, MySql, etc.). We will use this capability in later exercises.

     

    Using SQL*Plus to run queries on Oracle tables

    Here are the steps needed to connect to our Oracle database engine using SQL*Plus from the WinAthena computers in our lab (Room 37-312) or from other CRON machines running Windows. SQL*Plus is a barebones, command-line interface for querying data stored on Oracle database servers. It is provided for free by Oracle. There are many other client applications that provide more elaborate SQL-based graphical interfaces to relational database engines, including SQuirreL, an open source, java-based universal SQL client that is available at http://www.squirrelsql.org and can run on Macs. However, to get comfortable with the basics of SQL queries and relational thinking, we will start with SQL*Plus.

     

    Since most of you are unlikely to be familiar with SQL*Plus, we spell out some of the tips and tricks for using the application and connecting it to Oracle.

    Steps

    A. Login to a WinAthena PC using your Athena (MIT email) account. Make sure the domain is set to “ATHENA.MIT.EDU (Kerberos Realm)”.

    B. Start/Programs/Oracle/SQL Plus

     

     

    C. Fill in the user name and password of your CRL Oracle account. Use CRL as the Host String. Then press OK. During the first class, you are going to use the public account—user name “parcels”, and the password will be given by the instructor in class. Then we will collect your information and create a CRL Oracle account for you. In the following labs you are going to use your own CRL Oracle account.

     

    D. The interface of Oracle SQL Plus looks like the following. “SQL>” is the prompt in Oracle SQL Plus which indicates that it is ready for Oracle SQL Plus to accept your commands.

     

    Running Basic SQL Select Statement Queries

    At this point, you have all the connections and windows in place and you are ready to begin doing SQL queries. Now let’s use our small, hypothetical parcel and property tax database to get a feel for how we can use SQL*Plus to run simple queries.  Elsewhere on the class help pages, we describe this sample parcel database and the basic structure of the SELECT command.  Type in some simple queries, or you can use cut-and-paste to grab pieces of queries from any text editors. You will probably find it helpful to edit the queries in a separate window with a text editor such as 'Notepad'. When you are finished editing a query, you can run it by cutting and pasting the text into the SQL window. Remember to end each SQL statement with a semi-colon ';' before pressing the 'Enter' key. Try out each of the following SQL examples of SELECT statements - Do you understand what each query is doing? Does the SQL syntax make sense? [In addition to the select...from...where structure, note the use of wildcards and the treatment of missing values, arithmetics expressions, aggregate functions such as min(tax) and count(*) ]

     

    Query

    Tasks

    SQL Statements

    1

    The simplest query listing the TAX table

    SELECT * FROM tax;

    2

    A query with an expression and a column alias

    SELECT parcelid, (landval + bldval) AS tot_val, tax
      FROM tax;

    3

    A simple query that aggregates (groups) over a whole table

    SELECT COUNT(*), MIN(tax), MAX(tax), AVG(tax)
      FROM tax;

    4

    This example uses a literal string in the SELECT list

    SELECT parcelid, 'Total prop value is', (landval + bldval), tax
      FROM tax;

    5

    A simple query using the "IS NULL" syntax to list only those parcels with at least 20,000 sq. ft. and no missing value

    SELECT *
      FROM parcels
     WHERE sqft >= 20000 OR sqft IS NULL;

    6

    Compare the results of the query 5 with query 6. Why are they different?

    SELECT *
      FROM parcels
     WHERE sqft >= 20000 OR sqft = NULL;

    7

    A query to find values within a certain range

    SELECT *
      FROM parcels
     WHERE sqft >= 10000 AND sqft <= 100000;

    8

    Another way of writing the query above using the
    -- "BETWEEN" keyword

    SELECT *
      FROM parcels
     WHERE sqft BETWEEN 10000 AND 100000;

    9

    A simple join between PARCELS and FIRES. Note that only those parcels which had fires are listed.

    SELECT p.*, f.estloss
      FROM parcels p, fires f
     WHERE p.parcelid = f.parcelid;

    10

    A slight variation of query 9.

      SELECT p.parcelid, p.sqft, p.landuse, f.estloss
        FROM parcels p, fires f
     WHERE p.parcelid = f.parcelid
    ORDER BY f.estloss;


     

    TUNING SQL*PLUS AND SPOOLING OUTPUT TO A FILE

    The end of the 'SQL Help' web page contains some useful tips and tricks to customize your SQL*Plus environment, to save and retrieve SQL queries, and to spool SQL output tables into text files. Review these tips and tricks paying particular attention to

    1.            the use of set linesize nnn to set the line width to nnn characters

    2.            the use of COLUMN sss FORMAT A5 and the like to control formatting of column output, and

    3.            the use of the spool file command to start and stop the spooling of SQL*Plus output to into whatever text path+file you specify.

    Regarding the spool command, if you do not specify a full path name for the file, it will be written in the default directory -- that is, your H drive. Try out the spool command by spooling to a file, running one or two SQL queries, turning off the spooling, and then opening up the spooled file in your text editor window.

    PRINTOUT

    We cannot print the SQL statement and the query result directly from Oracle SQL Plus. There are two alternative ways to print them out.

    1)            Spool the SQL statements to a "*.TXT" or “*.LST” file, open it in a text editor such as Notepad, then print from Notepad..

    2)           Use the mouse to copy your SQL statements and paste them into a text editor such as Notepad, then print from Notepad..

     

    Print out the SQL statements and the query results for Query 3, 7 and 10 in the above table of queries.

     

    The grayscale printer in (or next door to) 37-312 is named “acantha” and the color printer is "echo". The color printer is a good quality HP color laserjet printer that can print 11x17 inch pages. But, since it is free, it is often overused and not working.

    PART III: ARCGIS REVIEW

    In order to refresh your understanding of ArcGIS, you should review the GIS-related labs in 11.520. For those who have not taken 11.188 or 11.520, we require that you complete lab 1 of 11.520 (http://mit.edu/11.520/www/labs/lab1) and hand in the final layout you create. You may also want to review lab2 of 11.520 - See: \afs\athena.mit.edu\course\11\11.520\www\labs\lab2\lab2.html - which is the same location as: http://mit.edu/11.520/www/labs/lab2

    PART IV: WHAT TO TURN IN

    This first lab exercise is intended to acquaint you with the computing environment used for the class and with the tools available to move data among various desktop applications and database engines. Don't worry if you are not familiar enough at the outset the various tools and the steps needed to make the connections. We will discuss them a bit in today's lab and again in subsequent lectures and labs.

     

    You should turn in the three printouts described in Part II above (one each from MS-Access, Excel, and Oracle) and the map you created in Part III (if you haven’t already taken 11.520 or 11.188). Make sure that your name, MIT email address and the date are on the printout. The lab assignment is due at the end of the class on Tuesday, February 9, 2009.

    Notes: For further information about the usage of the 37-312 cluster, pleaser reference the following link : http://web.mit.edu/windows/cluster/

     


     Home | Syllabus | Lectures | Labs | CRN | MIT
     last modified: 2 Feb 2010 (jf)