Massachusetts Institute of Technology - Department of Urban Studies and Planning


11.521
Spatial Database Management and Advanced Geographic Information Systems
11.523
Fundamentals of Spatial Database Management

In-Lab Exam - 20 March 2018


Instructions

This is an open-book, open-note examination. You are free to use the web (especially the class web pages and online SQL documentation) to help you. Like any examination, however, we expect your submission to reflect your work exclusively. Hence any dialogue via any means (e.g., voice, paper, e-mail, IM, texting) with anyone other than the class instructors is prohibited.

The exam starts at 4:05 PM in the Room W31-301 Computing Lab. It is intended to be finished by 6:00 PM but you can continue until 7:00 PM so that there is less time pressure. All the tables needed for the exam questions are accessible in Postgres using your personal account that we have been using for lab exercises and homework. These tables are also available in a personal geodatabase that can be opened using MS-Access (as well as ArcMap).

The exam is worth a total of 100 points.

An effective way for you to prepare your exam answers is to use the same familiar methods you have used for the lab assignments and homework. Create a text, PDF or Word file in your Athena locker to contain your answers. Name your exam username_exam.txt (or *.pdf or *.docx).  For your own security, we suggest that you save the final draft of your work in the "Private" subdirectory of your locker (i.e., the user smith might create the text file on her I:\ drive from a WinAthena PC and name it I:\Private\smith_exam.txt). We strongly recommend that you retain the file containing your SQL queries, and final answers and work products until we return the graded exams to you.

Upload your exam to the homework section on Stellar: https://stellar.mit.edu/S/course/11/sp18/11.521/homework/

Finally, don't spend all your time on one or two questions. Start by looking over all the questions to get a general sense of what, and how much, is being asked, and the grade points associated with each question. Then start work on your answers but move on to the next question if you've spent more than 10 minutes on any one of them.

Good luck!


Data Preparation

Before beginning your work, copy the 'exam18_data' folder from the 'data' folder in the class locker into a local drive such as C:\TEMP. All the data that you will need for the exam can be found either in your local copy of this folder or in the Postgres database. Unless specified otherwise, all tables referenced in this document are the names of tables in the class521 Postgres database on the server that we have used for the class. All the queries required for the test can be done either in Postgres or in MS-Access but a few are easier in Postgres and the examples provided are tested in Postgres.

The datasets for this exam include newly released 2016 Boston data that are available on the city's open data portal: http://data.boston.gov We have downloaded datasets related to Boston parcels, assessed values, and we have extracted data about East Boston for use in this test.  These are the same 2016 Boston data that was used in last year's test, but the questions are different.  All the information needed for the exam is provided in the 'exam18_data' folder containing the following:

Type of Data File or Table Name Description
ArcMap doc ./11.521_exam18_start.mxd ArcMap document, similar to the one we have been using since lab #2, with several shapefiles and web servers pre-loaded that will be a helpful starting point for the test.  The Data Frame views layers in Mass State Plane (mainland) NAD83 (feet) coordinates.
Shapefile ./shapefiles/ebos_buildings02

The familiar East Boston shapefile of building footprints from 2002 that we have used in exercises. The coordinates are stored on disk in Mass State Plane (mainland) NAD83 (feet) coordinates. This layer may be useful for visualization but is not needed for the test.

Shapefile ./shapefiles/ebos_parcels96 The familiar East Boston shapefile of parcel boundaries from 1996. The coordinates are stored on disk in Mass State Plane (mainland) NAD83 (feet) coordinates.
Shapefile ./shapefiles/ebos_parcels16 A new shapefile of East Boston parcel boundaries as of fiscal year 2016. It is similar to the familiar ebos_parcels05 shapefile but stores boundaries on disk in lat-lon coordinates based on WGS84. )Since the shapefile includes projection information, ArcMap converts it on the fly to the Data Frame coordinate system for viewing.)
Shapefile ./shapefiles/ebos_blocks16w A new shapefile of East Boston blocks as of fiscal year 2016.  It is saved on disk in lat-lon coordinates based on WGS84 and converted on the fly to the Data Frame coordinate system for viewing. (Not needed to test but can be helpful for viewing.)
Shapefile ./shapefiles/ebos_blocks96w A new shapefile of East Boston blocks as of fiscal year 1996.  It is saved on disk in lat-lon coordinates based on WGS84 and converted on the fly to the Data Frame coordinate system for viewing. (Not needed to test but can be helpful for viewing.)
Shapefile ./shapefiles/ma_towns00 The familiar boundary shapefile of Massachusetts cities and towns that we have used in lab exercises. The coordinates are stored on disk in Mass State Plane (mainland) NAD83 (meter) coordinates.
Postgres Table boston96.parcel96 The familiar assessing records from 1996 for Boston parcels that we used in class exercises.
Postgres Table
boston96.sales
The familiar sales table of residential real estate transactions during the 1986-1995 period that we have used in class exercises. 
Postgres Table eboston.mass_landuse The same lookup table of Massachusetts land use codes that we have used in lab exercises. Match 'ptype' in assessing tables with 'stclass_txt' in mass_landuse since both are stored as text.
Postgres Table
boston16.bos16eb
The 2016 version of the same type of assessing records for East Boston that we have been using for 2005 data (bos05eb) in class exercises.
Postgres Table
boston16.bad16pids
A list of PID values in the bos16eb assessing tables that are considered 'bad' because they are repeated in more than one record even though the PID field is supposed to be unique in the assessing records. (You will not need this table for the test but it may be useful to help you sort out one-to-many relationships in the datasets.)
Postgres Table
boston16.
xref_parcel16_96
A new cross reference table associating 1996 and 2016 parcels (pid_long) and blocks (wpb) for 6552 East Boston parcels (in 2016).
MS-Access database
./gdb_access/
11.521_exam18.mdb
A Personal Geodatabase containing the six Postgres tables listed above, plus the attribute table (and indices) for 1996 and 2016 East Boston parcels and the boundaries of Massachusetts cities and towns. Note that this personal geodatabase is just an MS-Access database with some extra 'GDB_*' tables that are not needed for this exam.

 


Part I: Writing SQL Queries (39 points)

For Part I, we will use the bos16eb table that contains tax assessing data for real estate in East Boston. It is analogous to the bos05eb table with 2005 assessed values that we have used for class exercises. A few columns are renamed or added but we will let you know if they matter for any questions that we ask in the test.

Remember to prefix table names with schema names if the tables are not in the first schema of your search_path.  (e.g., boston16.bos16eb).   Show your SQL queries as part of your answers throughout the exam. Doing so will enable you. to obtain partial credit in the event your answer is not quite right.

Question I-1. (22 points total) Queries of bos16eb

(I-1a: 2 points): How many assessing records in bos16eb are for triple deckers (ptype='105') in East Boston? ____________

(I-1b: 6 points): For all these triple deckers, compute the average of their assessed values (av_total in US dollars) __________________, their total living area (living_area, measured in square feet) __________________, and their  assessed value per square foot _________________. 

(I-1c: 2 points): How many assessing records in bos16eb are for condos (ptype='102') in East Boston? ____________

(I-1d: 3 points): For all these condo units,  compute the sum of their assessed values (av_total in US dollars) ________________, their total living area (living_area is measured in square feet) __________________, and the average assessed value per square foot __________________. 

NOTE: Recall that, for non-condos, the 10-digit pid field in bos16eb identifies the ground parcel on which the triple decker is located.  The first 7 digits are the so-called base_id and the last three digits are 000 (except for a small number of parcels that we will avoid in this test).  You can grab the first seven digits with the substring function - e.g., substr(pid,1,7).  For condos, the second column (cm_id) of bos16eb records the ground parcel ID (comprising the base_id concatenated with three zeros '000'), and the first column (pid) records the unique condo unit ID (comprising the base_id concatenated with the three digit unit number).  There is also a separate record with further information about condo ground parcels but that record should have ptype='995' rather than 102 (and will not be needed).

(I-1e: 3 points): How many distinct ground parcels account for all the condo units that you identified in the previous question?  _____________.

(I-1f: 6 points): Count how many condo ground parcels have 1, 2, 3, etc. number of condo units on them. Order results by the number of units.

Question I-2. (17 points total): Queries of cross-reference table

We have built a cross-reference table that matches parcels in the 2016 assessing database with parcels in the 1996 assessing database.  The last two fields in the new bos16eb table contain the latitude and longitude of a parcel centroid for each assessing record.  We converted this to a point shapefile and then did a point-in-polygon operation to associate each 2016 parcel with the 1996 parcel (if any) that contained the point.  In those cases where the 2016 parcel centroid did not fall within a 1996 parcel, we computed the distance to the edge of the closest parcel.  (Note: this 'distance' is measured in decimal degrees since we did the computations using lat/lon coordinates.  Hence, the distance is distorted but quite sufficient for our purposes to distinguish close-by versus contained-within parcels.) The resulting cross-reference table is in the boston16 schema and is called xref_parcel16_96 with the following fields:

pid_long16 = 10-character 2016 parcel ID as text
wpb_2016 = 9 character 2016 block identifier as text (nn-nn-nnn)
pid_long96 = 10-character 1996 parcel ID as text
wpb_1996 = 9 character 1996 block identifier as text (nn-nn-nnn)
distance = 0 if 2016 centroid is inside of 1996 parcel, otherwise distance to closest 1996 parcel (in decimal degrees)


(I-2a: 2 points): How many 2016 parcels in xref_parcel16_96 are not inside any of the 1996 parcel boundaries? ____________

(I-2b: 3 points): There are 6552 rows in xref_parcel16_96 with no duplicates or null values for pid_long16.  How many unique 1996 parcels are matched with these 2016 parcels in the table?  ________________

(I-2c: 2 points): Some of the 1996 parcels are matched to more than one of the 2016 parcels.   How many unique 1996 parcels are matched more than once to a 2016 parcel?  ______________

(I-2d: 4 points): Write a query that lists all the 1996 parcels that are matched to more than one of the 2016 parcels along with a count of the number of 2016 parcel matches for each such 1996 parcel.

(I-2e: 3 points): How many distinct 1996 East Boston parcels are not matched to any 2016 parcel in boston16.xref_parcel16_96 ?  ___________
(Note: in boston96.parcel96shp the unique ground parcel ID is in parcel_i_1 and, for East Boston, the first two characters are '01'.)

(I-2f: 3 points): How many distinct 1996 East Boston parcels in xref_parcel16_96 are recorded as triple deckers (stateclass='105') in boston96.parcel96? _________________
(Note: for 1996, the assessing records that record property type are stored in boston96.parcel96.  The 10-character parcel ID is stored in parcel_id and the property type is stored in stateclass.)

Part II: Mapping the mismatched parcels (14 points)

Map the 2016 East Boston parcels and highlight those parcels that are *not* matched to any 1996 parcel (as determined in questions I-2c above).  Zoom in to the more dense, northwest part of East Boston and switch selection or toggle the highlighted 2016 parcels so you can see whether they tend to cluster together.  Turn on the 1996 parcel layer and play with the transparency of the 2016 parcel layer so you can see how they overlay and get a sense of the reasons why (a) some of the 2016 parcel centroids did not fall within a 1996 parcel, and (b) why some of the 1996 parcels were associated with more than one 2016 parcel. 

Question II-1. (10 points ):

Map #1: Turn in a map that displays the 2016 East Boston parcels and highlights those whose centroid is not within a 1996 parcel - that is, the 2016 parcels in xref_parcel16_96 where distance >0.  In addition, display only those 1996 parcels that are matched in xref_parcel16_96 to at least four 2016 parcels.  Adjust zoom level, shading, symbology, and transparencies so you can distinguish these 1996 parcel outlines from the 2016 parcels. 

Question II-2. (4 points ):

Explain briefly (a) any pattern that you see regarding the location of the 2016 parcels whose centroid is not within a 1996 parcel, and (b) why some of the 1996 parcel are associated with more than one 2016 parcel.

Part III: Concepts and Fixing SQL Queries  (10 points)

Question III-1. (3 points ):

What is the PRIMARY KEY for the Boston ZONING variance table (used in Lab #4)? ____._____________
Which field in ZONING is the FOREIGN KEY for the Boston NEIGHBRHD table (also used in Lab #4)?
____________________.

Explain briefly a few reasons why it is generally considered good practice to use integer codes linked to lookup
tables for attributes such as owner or town names rather than omit the lookup table and just insert the names in
the main table.

Question III-2. (3 points ):

Consider the following query:

select count(*), 
       count(distinct pid_long16), count(distinct wpb_2016), 
       count(distinct pid_long96), count(distinct wpb_1996)
  from boston16.xref_parcel16_96;
-- '6552','6552','194','6408','196'
Explain briefly what each of these number is counting.

Question III-3. (4 points ):

The following query using tables in the toy parcel database (toyparcels) is intended to list the owners who own more than one parcel together with a count of the number of parcels that they own.  However, the query is incorrect.  Briefly explain why and show your corrected query.

select ownernum, oname, count(parcelid)
  from owners n, parcels p
 where count(parcelid)>1
 group by ownernum, oname;

Part IV: Queries of 1986-1995 residential sales (37 points)

The residential sales data for 1986-1995 are stored in boston96.sales and use the same parcel IDs as the 1996 assessing and parcel shapefile tables.  Let's reuse some of our queries from Lab Exercise #5 to find all the triple deckers and condos that sold during the 1986-1995 period.  Then we can use the cross-reference table boston16.xref_parcel16_96 to match them to 2016 parcels (and assessing data) and answer some questions about what has happened to those parcels and what they are worth a few decades later.

In Lab #5, we created a view (public.xxx_holdsales3) listing each East Boston ground parcel that sold during the 1986-1995 decade along with the average sale price and number of sales.   Here is a modification of this query to find the average sale price and sales count for each East Boston ground parcel where the sale record indicates that the parcel is a triple decker (stateclass = 105') and the salecode indicates that the sale was an arm-length transaction (salecode in ('SX', 'W1', 'W2', 'X1')). (Note: this view groups parcels by the first 7 characters of their parcel identifier and then replaces the last three characters with '000'.  This will be sufficient to match ground parcels in the shapefiles with assessing records for almost all of the triple deckers and condos.  The relationship is more complicated for certain other land uses and for a few triple deckers that we we not consider for the purposes of these test questions. In last year's test we provided a bad16pids table with ninety-one 2016-year pid values that were not unique in boston16.bos16eb.  This table is available in boston16.bad16pids but not needed for this year's test.)

create view public.xxx_triplesales_arms as
select substr(parcel_id,1,7)||'000' as parcel_id, base_id,
avg(cast(price as numeric)) as avg_price, count(*) as scount from boston96.sales s, (select distinct substr(parcel_i_1,1,7) as parcelshp from boston96.parcel96shp where substr(parcel_i_1,1,2) = '01') p, boston96.salecode c where p.parcelshp = substr(parcel_id,1,7) and cast(to_char(s.saledate,'YYYY') as integer) > 1985 and s.salecode=c.salecode
and (s.stateclass = 105) and (s.salecode in ('SX', 'W1', 'W2', 'X1')) group by substr(parcel_id,1,7)||'000', base_id;

Question IV-1. (13 points ):

(IV-1a: 4 points): Run this SQL statement to create a view, public.xxx_triplesales_arms (substituting your initials for xxx), and then use it to write a new SQL query that computes the overall average sale price and number of sales for triple decker sales during the decade that were considered to be arms length transactions.

Arms-length triple decker sales:  count:_____________, average sale price:________________.

(IV-1b: 3 points): Adjust this view to create a new view, public.xxx_triplesales_fcl, for all triple decker sales that involved bankruptcy or foreclosure as determined by sales with (salecode in ('I1', 'L1')).  Show us your SQL for this view and use it to compute the overall average sale price and number of sales for triple decker sales during the decade that were considered to be bankrucpy or foreclosure sales.

Foreclosure triple decker sales: count:_______________, average sale price:________________.

(IV-1c: 6 points): Further adjust the views to create public.xxx_condosales_arms and public.xxx_condosales_fcl that do the same thing for condo sales (stateclass = 102') and then compute their sale counts and average prices.

Arms-length condo unit sales:  count:_____________, average sale price:______________.

Foreclosure condo unit  sales:  count:______________, average sale price:______________.

Question IV-2. (15 points ):

(IV-2a: 4 points): During the 1986-1996 decade, a number of triple deckers sold at least once via an arms length transaction and at least once via bankruptcy or foreclosure (as a triple decker). Write a query to count the number of triple decker parcels (parcel_id) that behaved in this way.

(IV-2b: 4 points): During the 1986-1995 decade, a few of triple deckers that sold at least once via an arms length transaction, also sold at least once, in another sale, as a condo via either arms length or foreclosure! Write a query to list the parcels (parcel_id) that behaved this way.

(IV-2c: 4 points): Let's examine the sales transactions of the ground parcel(s) that you identified in the previous query. List the parcel_id, saledate, stateclass, price, seller, buyer, salecode, and condotag (from the boston96.sales table) for each sale involving the ground parcels identified in the previous query.  Order the output by parcel_id and saledate.

(IV-2d: 3 points):  Briefly interpret the sequence of sales that you observe for these triple deckers in the previous question.  Comment on when they sold as a triple decker, when they sold as condos, and when the sale involved a foreclosure.

Question IV-3. (9 points ):

Now, let's use the cross-reference table, boston16.xref_parcel16_96,  to compare sales prices for East Boston parcels during 1986-1995 with their assessed values in 2016.  In question III-1 we have already computed the number of 1986-1995 sales and average sale price for all triple deckers and condos, broken down by arms-length sales and bankruptcy or foreclosure sales.  In question III-1a, you created a view, public.xxx_triplesales_arms, that listed all the triple decker parcels that sold via arms-length transactions during 1986-1995.  We can use the cross-reference table, boston16.xref_parcel16_96, to determine the corresponding 2016 ground parcels for these triple deckers with arms-length sales and compute the average of their 2016 assessed values (av_total) in boston16.bos16eb.-`

(IV-3a: 3 points): How many of the East Boston triple decker ground parcels in public.xxx_triplesales_arms, matched ground parcels in the 2016 aasessing records in boston16.bos16eb?  _____________________

(IV-3b: 3 points): How many of these ground parcels that were triple deckers in public.xxx_triplesales_arms are listed as condos (ptype='102') rather than triple deckers (ptype='105') in the 2016 assessing table? Number of ground parcels converted to condos ________________.

(IV-3c: 3 points): Now let's compute the average of the assessed values of all those ground parcels that appeared in public.xxx_triplesales_arms, are linked in boston16.xref_parcel16_96 to 2016 ground parcels, and are still listed as triple deckers in 2016.  (Note: A few pre-1996 triple deckers were converted to uses other than condos and a few triple decker ground parcels that had only one assessing record in 1996 ended up with more than one assessing record in 2016.  So do not worry if this count plus the count in the previous question do nut equal the number of rows in public.xxx_triplesales_arms.) Number of ground parcel meeting criteria: ______________  Average of their 2016 assessed value ____________________. 



Exam Turnin Checklist

Is your name and Athena ID at the top of your exam? If not, you'll lose a point!

Is your completed test saved in a plain text (ASCII) file named username_exam.txt or HTML (web page) file named username_exam.html, or MS-Word file name username_exam.doc, or PDF file named username_exam.pdf?

Did you upload your answers to Stellar: https://stellar.mit.edu/S/course/11/sp18/11.521/homework/

Did you confirm with the instructor that your exam was received?


Home | Syllabus | Lectures | Labs | CRN | 37-312 lab | MIT


Last modified: 18 March 2018 [jf]