Massachusetts Institute of Technology - Department of Urban Studies and Planning


11.521
Spatial Database Management and Advanced Geographic Information Systems
11.523
Fundamentals of Spatial Database Management

In-Lab Exam - 21 March 2017 


Instructions

This is an open-book, open-note examination. You are free to use the web (especially the class web pages and online SQL documentation) to help you. Like any examination, however, we expect your submission to reflect your work exclusively. Hence any dialogue via any means (e.g., voice, paper, e-mail, IM, texting) with anyone other than the class instructors is prohibited.

The exam starts at 4:05 PM in the Room W31-301 Computing Lab. It is intended to be finished by 6:00 PM but you can continue until 7:00 PM so that there is less time pressure. All the tables needed for the exam questions are accessible in Postgres using your personal account that we have been using for lab exercises and homework. These tables are also available in a personal geodatabase that can be opened using MS-Access (as well as ArcMap).

The exam is worth a total of 100 points.

An effective way for you to prepare your exam answers is to use the same familiar methods you have used for the lab assignments and homework. Create an ASCII text or HTML or Word file in your Athena locker to contain your answers. Name your exam username_exam.txt if you use text format or username_exam.html if you use HTML format. An MS-Word or PDF formatted document is also okay. For your own security, we suggest that you save the final draft of your work in the "Private" subdirectory of your locker (i.e., the user smith might create the text file on her I:\ drive from a WinAthena PC and name it I:\Private\smith_exam.txt). We strongly recommend that you retain the file containing your SQL queries, and final answers and work products until we return the graded exams to you.

Upload your exam to the homework section on Stellar: https://stellar.mit.edu/S/course/11/sp17/11.521/homework/

Finally, don't spend all your time on one or two questions. Start by looking over all the questions to get a general sense of what, and how much, is being asked, and the grade points associated with each question. Then start work on your answers but move on to the next question if you've spent more than 10 minutes on any one of them.

Good luck!


Data Preparation

Before beginning your work, copy the 'exam17_data' folder from the 'data' folder in the class locker into a local drive such as C:\TEMP. All the data that you will need for the exam can be found either in your local copy of this folder or in the Postgres database. Unless specified otherwise, all tables referenced in this document are the names of tables in the Postgres database, exam17, on the server that we have used for the class. All the queries required for the test can be done either in Postgres or in MS-Access but a few are easier in Postgres and the examples provided are tested in Postgres.

The datasets for this exam include newly released 2016 Boston data that are available on the city's open data portal: http://data.boston.gov We have downloaded datasets related to Boston parcels, assessed values, and building code violations and we have extracted data about East Boston for use in this test. All the information needed for the exam is provided in the 'exam17_data' folder containing the following:

Type of Data File or Table Name Description
ArcMap doc ./11.521_exam17_start.mxd ArcMap document, similar to the one we have been using since lab #2, with several shapefiles and web servers pre-loaded that will be a helpful starting point for the test.
Shapefile ./shapefiles/ebos_buildings02

The familiar East Boston shapefile of building footprints from 2002 that we have used in exercises. The coordinates are stored on disk in Mass State Plane (mainland) NAD83 (feet) coordinates.

Shapefile ./shapefiles/ebos_parcels05 The familiar East Boston shapefile of parcel boundaries from 2005. The coordinates are stored on disk in Mass State Plane (mainland) NAD83 (feet) coordinates.
Shapefile ./shapefiles/ebos_parcels16 A new shapefile of East Boston parcel boundaries as of fiscal year 2016. It is similar to the familiar ebos_parcels05 shapefile but stores boundaries on disk in lat-lon coordinates based on WGS84. )Since the shapefile includes projection information, ArcMap converts it on the fly to the Data Frame coordinate system for viewing.)
Shapefile ./shapefiles/ma_towns00 The familiar boundary shapefile of Massachusetts cities and towns that we have used in lab exercises. The coordinates are stored on disk in Mass State Plane (mainland) NAD83 (meter) coordinates.
Postgres Table
eboston.bos05eb
The familiar 2005 assessing records for East Boston that we have been using in lab exercises.
Postgres Table
eboston.bos16eb
The 2016 version of the same type of assessing records for East Boston that we have been using from bos05eb.
Postgres Table
eboston.bad16pids
A list of PID values in the bos16eb assessing tables that are considered 'bad' because they are repeated in more than one record even though the PID field is supposed to be unique in the assessing records.
Postgres Table
eboston.mass_landuse
The same lookup table of Massachusetts land use codes that we have used in lab exercises. Match 'ptype' in assessing tables with 'stclass_txt' in mass_landuse since both are stored as text.
Postgres Table
eboston.matown2000
Attribute table from the MATOWNS00 shapefile of Massachusetts city and town boundaries that we have used in lab exercises.
Postgres Table
eboston.
ebos_parcels05_wgs84
Attribute table from the EBOS_PARCELS05 shapefile together with a geometry column (that you will not need) that stores boundaries in WGS84 coordinates.
Postgres Table
eboston.
ebos_parcels16_wgs84
Attribute table from the EBOS_PARCELS16 shapefile together with a geometry column (that you will not need) that stores boundaries in WGS84 coordinates.
MS-Access database
./gdb_access/
11.521_exam17.mdb
A Personal Geodatabase containing the seven Postgres tables listed above. Note that this personal geodatabase is just an MS-Access database with some extra 'GDB_*' tables that are not needed for this exam.

 


Part I: Writing SQL Queries (35 points)

For Part I, we will use the bos16eb table that contains tax assessing data for real estate in East Boston. It is analagous to the bos05eb table with 2005 assessed valued that we have used for class exercises. A few columns are renamed or added but we will let you know ifthey matter for any questions that we ask in the test.

Let us start by examining the land use categories and property types that appear in bos16eb and compare them with our familiar bos05eb data. Both tables are available in and MS-Access. In Postgres, the bos16eb and bos05eb tables are in the eboston schema of the exam17 database. Make sure that your search path looks at the 'exam17' as well as 'public' schema or you will have to prefix all the Postgres tables names with 'exam17.'. For MS-Access, both tables are included in the 11.521_exam17.mdb database within the the exam17_data folder.

Show your SQL queries as part of your answers throughout the exam. Doing so will enable you. to obtain partial credit in the event your answer is not quite right.

Question I-1. (2 points each, 8 points total)

(I-1a): How many rows are there in bos16eb ___________? how many in bos05eb ________ ?

(I-1b): How many PID values are NULL in bos16eb ______? how many in bos05eb _______?

(I-1c): How many unique values of PID (the parcel identifier) are there in bos16eb ______? how many in bos05eb _________?

(I-1d): How many rows are there in the shapefiles for Boston parcels in 2016 and 2005:
in ebos_parcels16___________? and in ebos_parcels05 ____________?

Question I-2. (4 points each, 12 points total)

(I-2a): What is the most common property type (ptype) among all of the East Boston assessing records in bos16eb ___105____? How many records have this property type ___2142______?

(I-2b): What is the most common property type (ptype) among all of the East Boston assessing records in the 2005 table, bos05eb _______? How many records have this property type _______?

(I-2c): Write a SQL query that lists the property type code (ptype), the description (descript in mass_landuse), and the frequency of occurrence for the 2016 table, bos16eb. Be sure that you pay attention to data types when joining the tables.

Question I-3. (3 points each, 15 points total)

(I-3a): Let's examine which parcel identifiers (PID) exist in both the 2005 and 2016 tables. We try this query:

select count(*) as row_count, count(distinct a.pid) as pid_count
  from bos05eb a, eboston.bos16eb b
 where a.pid = b.pid;	 
row_count
pid_count
7351
7105

Explain briefly why the two number differ? How many parcel identifiers (PID) exist in both tables? _________?

(I-3b): How many parcel identifiers (PID) exist in bos16eb but not in bos05eb ________?

(I-3c): How many parcel identifiers (PID) exist in bos05eb but not in bos16eb ___________?

Among those parcel identifiers (PID) that exist in bos16eb but not in bos05eb, let us examine their property types (ptype).

(I-3d): What are the different property types (ptype) for the records with those parcel identifier records (PID) that exist in bos16eb but not in bos05eb?

I-3e): Write a SQL query that lists the frequency of occurrence of each property type (ptype) among those parcel identifier records (PID) that exist in bos16eb but not in bos05eb? Show the table of results for all property types that are associated with at least 3 assessment records. Sort the table by the record count in decreasing order.



Part II: Short Answer and 'Fix SQL' Questions (22 points)

Question II-1. (7 points each, 14 points total)

(II-1a): The following query is intended to list the Massachusetts municipalities along with their 1980-1990 change in population for those municipalities that have more than ten polygons associated with them in the shapefile. However, the query does not work properly. Briefly explain why and show the corrected SQL query. (Note: the Postgres table matown2000 is the attribute table associated with the ma_towns00.shp shapefile that we have used in lab exercises.)

(II-1b): Consider the following query:

select count(*), count(pop80), count(distinct town)
  from eboston.matown2000;

The query returns the counts: 631, 631, and 351 respectively. Explain briefly why all four numbers are not the same and what each number means.

Question II-2. (8 points)

(II-2): Explain briefly what is meant by the phrase 'many-to-many relationship.' How are many-to-many relationships typically handled when designing relational databases? Illustrate your points using one of the database schemas that we have considered in class lectures and exercises. (A verbal explanation is sufficient. No need for an ER diagram or table definitions.)



Part III: Thematic Mapping of SQL results (43 points - actually 45 so max=102)

In Part III, we will use the 2005 and 2016 parcel and assessment data to examine the nature and location of some of the real estate changes that have occurred during this interval. Before starting, however, we need to address some problems with the 2016 data. The parcel identifier, PID, in the bos16eb table is not unique. There are 91 different parcel identifiers (PID) that have more than one record in bos16eb and these instances account for 349 records. For your convenience we have provided a table, bad16pids, that lists PID, and the number of records, BADCOUNT, for each record that involves duplicate PID values.. The bos05eb table for 2005 assessment records has no duplicated PID values so there is no need for a 'badpids' table for 2005.

Question III-1. (7 points)

(III-1): Rewrite your SQL query from question I-3e so that it lists the frequency of occurrence of each property type (ptype) along with the text description of the property type for all parcel identifier records (PID) that are listed in bad16pids.

In lab exercises, we have used the shapefile, ebos_parcels05, to map ground parcels in East Boston. The corresponding shapefile for 2016 is called ebos_parcels16. In each case, the field, PID_LONG, in the shapefile identifies the ground parcel. The first 7 characters of PID_LONG can be joined to the first 7 characters of PID to match the assessing records to the corresponding ground parcel. (That is, just as in the earlier exercises, we can match the shapefile ground parcels to assessing records via: substr(ebos_parcels16.pid_long,1,7) = substr(bos16eb.pid,1,7) since the first 7 characters identify the ground parcel and the last 3 are either 000 or enumerate each unit of a condominium property).

For the 2005 shapefile ( ebos_parcels05), there are no duplicates of PID_LONG. However, in the 2016 shapefile (ebos_parcels16) one ground parcel (pid_long='0104126000' is duplicated. Upon inspection, we see that this record is for the large airport ground parcel including all the runways. However, there is also a second smaller parcel in the 2016 data with the same pid_long that is associated with a polygon that includes highway entrance and exit ramps near the entrance to the airport. For the purposes of this test, we can ignore this small ground parcel with a duplicated pid_long identifier. However, if you wanted to be sure to avoid having any duplicated ground parcels, you could exclude the one record for the entrance ramp parcel by omitting the parcel where pid_long='0104126000' and shapestlen < 2400).

Question III-2. (7 and 3 points, 10 total)

(III-2a): Draw a thematic map that highlights those ground parcels that have duplicate PID values in bos16eb. (That is, the parcels associated with the assessing records that have PID values included in bad16pids.) Use a color ramp to shade these parcels according to the number of bos16eb records that share the same ground parcel identifier. (That is: pid_long = substr(PID,1,7)||'000' ) You may want to zoom in on the more populated parts of East Boston (so some of the islands are excluded) in order for some of your smaller shaded parcels to be a little larger.

(III-2b): Comment briefly on any pattern that you see regarding the location of the bad16pids parcels and/or the frequency of duplication of PID.

Question III-3. ( 7 points)

(III-3): Create a view, xx_okparcels, that collects all the unique ground parcels (that is, the substr(pid,1,7) portion of PID) that appears in both bos16eb and bos05eb and are associated with at least one PID value that is not in bad16pids. Substitute your initials for 'xx' in xx_okparcels so you have a unique table name. Remember that your view needs to be in the 'public' schema rather than in the 'exam17' schema. (NOTE:This number of ground parcels is not the same as computed earlier in question I-1c for assessing records since that question referenced the full PID value and not just the first 7 characters.) How many ground parcel identifiers do you find in both tables: ________?

For all these ground parcels that appeared in both years (after excluding ground parcels that only contained PID values in the bad16pids table) let us determine how many ground parcels transitioned from non-condo in 2005 to condo in 2016. To determine these numbers, we can use the CM_ID column in the bos16eb and bos05eb assessing tables. The CM_ID value is NULL for non-condos and, for condos, is set equal to the ground parcel identifier.

Question III-4. ( 6 points each, 12 total)

(III-4a): Create a view, xx_noncondo05, that lists all the ground parcels from xx_okparcels that were not condos in 2005. Show your query and state the number of non-condo 2005 ground parcels that you find: __________.

(III-4b): How many of these non-condo 2005 ground parcels became condos by 2016? Create a view, xx_newcondo16 that selects these ground parcel identifiers and report both the number of 2016 ground parcels (from xx_okparcels) that became condos __________ and the total number of assessing records associated with these condo-converted ground parcels ___________.

Question III-5. ( 6 and 3 point, 9 total)

(III-5a): Create a thematic map showing those parcels that were converted to condos between 2005 and 2016. Shade each ground parcel based on the total number of assessing records that are associated with that ground parcel. (NOTE: Not all of the pid_long values in xx_newcondo16 will match to the polygons in ebos_parcels16. They should, but the shapefile and assessing records came from different sources and have not been fully reconciled. Do not worry about the xx_newcondo16 parcels that drop out of the match.)

(III-5b): Comment briefly on the number of parcels that converted to condos between 2005 and 2016 and any spatial pattern that you see in this thematic map of new condos.



Exam Turnin Checklist

Is your name and Athena ID at the top of your exam? If not, you'll lose a point!

Is your completed test saved in a plain text (ASCII) file named username_exam.txt or HTML (web page) file named username_exam.html, or MS-Word file name username_exam.doc, or PDF file named username_exam.pdf?

Did you upload your answers to Stellar: https://stellar.mit.edu/S/course/11/sp17/11.521/homework/

Did you confirm with the instructor that your exam was received?


Home | Syllabus | Lectures | Labs | CRN | 37-312 lab | MIT


Last modified: 19 March 2017 [jf]