| 11.521 | Spatial
Database Management and Advanced Geographic Information Systems |
| 11.523 | Fundamentals
of Spatial Database Management |
The exam starts at 4:15 PM in the Room 37-312 Computing Lab. It is intended to be finished by 5:45 PM but you can continue until 7:00 PM so that there is less time pressure. All the tables needed for the exam questions are accessible in Oracle via SQL*Plus using your personal Oracle account that we have been using for lab exercises and homework.
The exam is worth a total of 100 points.
An effective way for you to prepare your exam answers is to use the same familiar methods you have used for the lab assignments and homework. Create an ASCII text, a Word .doc, or an HTML file in your Athena locker to contain your answers. Name your exam username_exam.txt if you use text format or username_exam.html if you use HTML format. (An MS-Word or PDF-formatted document is also okay.) For your own security, we suggest that you save it in the "Private" subdirectory of your locker (i.e., user smith might create a text file on her I:\ drive from a WinAthena PC and name it I:\Private\smith_exam.txt). We strongly recommend that you retain the file containing your answers until we return the graded exams to you.
Finally, don't spend all your time on one or two questions. Start by looking over all the questions to get a general sense of what, and how much, is being asked, and the grade points associated with each question. Then start work on your answers but move on to the next question if you've spent more than 10 minutes on any one of them.
Consider the ZONING tables used in the Lab exercises. We wish to compute the zoning variance count by neighborhood (NEIGHBRHD) of all zoning variances that involved a change of occupancy (CHNGEOCCUP = 2). We try this query:
select n.NEIGHBRHD, count(*) as change_count
from zoning z, neighbrhd n
group by n.NEIGHBRHD
having CHNGEOCCUP = 2
order by n.NEIGHBRHD
Briefly explain why the query does not produce the intended result and show the corrected SQL query.
Answer: The two tables in the 'from' clause are not joined and the CHGEOCCUP condition should be in a 'where' clause:
select n.NEIGHBRHD, count(*) as change_count
from zoning z, neighbrhd n
where CHNGEOCCUP = 2 and z.neighbrhd = n.nei_code
group by n.NEIGHBRHD
order by n.NEIGHBRHD
Consider the table, JF.MATOWN2000, that contains attribute data for each polygon in the MassGIS shapefile for Masss Town boundaries. (This is one of the shapefiles that we used to illustrate the difficulties in computing meaningful Mass population densities.) Be sure to include your SQL queries on your answer sheet (to enable earning partial credit)..
Question I-2(a) (6 points): Which Massachusetts town is represented in JF.MATOWN2000 by the most number of polygons?__ Boston_____ How many polygons are in that Town? ____19________
select town, count(town) as count from jf.matown2000 group by town having count(town) > 10 order by count desc; TOWN COUNT --------------------- ---------- BOSTON 19 WESTPORT 17 BARNSTABLE 15 GLOUCESTER 15 GOSNOLD 15 IPSWICH 12 SCITUATE 12Question I-2(b) (6 points): Which Massachusetts town has the largest difference in area (in square meters) between the largest and smallest polygon within the Town? __Plymouth_____ What is that difference in area? __263.54 sq. km________
select town, max(area), min(area), (max(area)-min(area)) as difference from jf.matown2000 where area>0 group by town order by difference desc; TOWN MAX(AREA) MIN(AREA) DIFFERENCE --------------------- ---------- ---------- ---------- PLYMOUTH 263918480 381076.344 263537404 DARTMOUTH 160914992 9995.5957 160904996 BARNSTABLE 158369008 7564.40918 158361444 etc...Question I-2(c) (8 points): Consider the following query:
select count(*), count(pop80), count(town), count(distinct town) from jf.matown2000;The query returns the counts: 632, 632, 611, and 351 respectively. Explain briefly why all four numbers are not the same and what each number means.
From left to right, the numbers represent the number of rows (polygons) in the Mass town-boundary shapefile, the number of polygons with a non-null value in the pop80 column, the number of polygons with a non-null value in the Town column, and the number of different town names in the Town column. Note that the count(pop80) count is 632 because, while there are no null entries, there are a number of '0' entries for small areas that are not associated with a town. Note also that all 351 towns in Massachusetts are identified with at least one polygon.
Consider the seven illustrative PARCELS tables that we have used in lecture and lab exercises.
Question I-3(a) (8 points): What would you suggest as the primary key for the SALES table? Explain briefly. What foreign key in the PARCELS table would you use when joining that table with the SALES table?
For the SALES table, the primary key should be the combination of the parcel identifier (parcelid) and sale date (sdate) columns. The parcelid field in the sales table is the foreign key that matches the primary key parcelid in the parcels table. (The wording of this question is slightly miss-stated, so we will accept your answer if you label parcelid the foreign key in the parcel table.)
Note: In practice, a property can - and sometimes does - sell more than once within a single day. So we want the sale date to be a 'date' field that measures milleseconds past midnight as well as the Julian date. Our toy sales table only includes the day.
Question I-3(b) (10 points): Suppose we wanted to allow parcels to have more than one owner. How might you restructure the tables in the parcels schema to accomodate this possibility? You don't need to provide an E-R diagram but explain briefly and be sure to specify the table name, meaning, and primary and foreign keys of any tables that you create or restructure.
If we allow more than one owner, we will have a many-to-many relationship between the owners and parcels table. We would handle this by (a) removing the 'onum' owner number column from the parcels, (b) leaving the owners tables as is, and (c) adding an owner_match table with two columns: parcelid and ownernum.
For this question, we will examine some of the East Boston properties that went into foreclosure during the last downturn in the Boston housing market (around 1990). To do this, we will use the ASSESS_DATA.SALES table documenting 68,134 residential property sales in Boston between 1986 and 1995. We introduced this table in Lab #5 but you need not have done that lab to answer the questions below.
Here is a description of the schema for ASSESS_DATA.SALES:
describe assess_data.salesHere is a sample listing from the assess_data.sales (along with some SQL*Plus commands to resize and pause after each 'page' of printout. The where condition includes only those parcels that are in East Boston.
Name Null? Type ----------------------------------------- -------- -------------- SALE_ID NOT NULL NUMBER PARCEL_ID NOT NULL VARCHAR2(10) SALEDATE NOT NULL DATE STATECLASS NOT NULL NUMBER(3) PRICE NOT NULL NUMBER(38,2) SELLER VARCHAR2(20) BUYER VARCHAR2(20) SALECODE VARCHAR2(2) BASE_ID VARCHAR2(7) CONDOTAG VARCHAR2(1)
set pause 'more...'
set pause on
set pagesize 30
set linesize 130
select * from assess_data.sales
where substr(parcel_id,1,2)='01';
SALE_ID PARCEL_ID SALEDATE STATECLASS PRICE SELLER BUYER SA BASE_ID C
------- ---------- --------- ---------- ---------- -------------------- -------------------- -- ------- -
682 0102870000 12-NOV-92 105 27000 FEDERAL HOME LOAN MG JOSHI SHUBH RAMESH E L1 010287
683 0102877000 13-SEP-94 105 128000 CONTILLI EDWARD ETAL BUENRROSTRO JOSE X1 01028
684 0102881000 21-JUL-95 105 85000 SHIKLES JACKIE G IRAHETA GERMAN H W1 0102881 N
685 0102882000 14-DEC-90 104 148000 GALLO DANTE ET AL IANNELLI RALPH J JR D3 01
686 0102885000 15-DEC-87 105 75000 VISCAY RICHARD CUNNINGHAM JOHN O1 0102885 N
687 0102886000 15-DEC-87 105 75000 VISCAY RICHARD CUNNINGHAM JOHN O1 0102886 N
etc....
For our questions, we will focus on foreclosure sales of non-condo, 1-6 family residences. Foreclosures are indicated by SALECODE = 'L1' in the sales records and 1-6 family non-condo residences have STATECLASS in (101, 103, 104, 105, 111). To assist you in identifying the desired sales information, we have already created a view, jf.v_ebforeclose, that lists 321 sales in ASSESS_DATA.SALES that involve an East Boston non-condo, 1-6 family residence going into (or out of) foreclosure. The view is a little complicated since we also want to consider only those sales that occurred during the ten years (1986-1995) and have parcel IDs that match those in the Oracle table, parcel96shp, that includes the attributes from the 1996 parcel shapefile for East Boston (ebos_parcels96.shp). For the exam questions, you will not need to know the specific definition of the view but, for your information, here it is:
create view v_ebforeclose as
select parcel_id, saledate
from assess_data.sales s,
(select distinct parcel_i_1 as parcelshp
from parcel96shp
where substr(parcel_i_1,1,2) = '01') p
where s.parcel_id = p.parcelshp
and to_char(saledate,'YYYY') > 1985
and stateclass in (101, 103, 104, 105, 111)
and salecode = 'L1';
Question II-a (8 points): Many of the parcels in these 321 foreclosure sales show up more than once in the listing. (For example, if the foreclosure resulted in a Bank assuming ownership, then a subsequent sale would record someone else buying the property from the bank. Both transactions would be recorded as SALECODE = 'L1' to indicate that the saleprice was probably not a 'fair market value' since it involved going into or coming out of foreclosure.). Create a view (or table) called v_fcount that lists every parcel involved in these 321 foreclosure sales along with a count of the number of times that parcel_id is involved in a foreclosure. Be sure to include the SQL query on your answer sheet. How many unique parcels are involved in these 321 foreclosure sales? __220_____
create view v_fcount as select parcel_id, count(parcel_id) as fcount from jf.v_ebforeclose group by parcel_id order by fcount desc; select count(*) from v_fcount; COUNT(*) ---------- 220Question II-b (5 points): Write a SQL query that counts the number of parcels in v_ebforeclose that were involved in more than one foreclosure during the 10 years. What is this count? ___99______
select count(*) from v_fcount where fcount > 1; COUNT(*) ---------- 99Note: Only 3 of these 99 parcels had more than 2 foreclosures and 121 parcels were associated with a single foreclosureselect fcount, count(*) from v_fcount group by fcount order by fcount desc; FCOUNT COUNT(*) ---------- ---------- 3 2 2 97 1 121This result is all we need for the exam, but we might want to dig further. The 121 parcels with only one foreclosure is surprising and may deserve some further exploration. Do you see why this next query pulls all the sales during the 10 year period (ordered by parcel and saledate) for all the parcels that had at least one foreclosure?
select * from assess_data.sales s, v_fcount f where s.parcel_id = f.parcel_id and f.fcount = 1 order by s.parcel_id, saledate;We see that some parcels had their foreclosure late in the period, so coming out of foreclosure might occur after 1995. Others came out of foreclosure with a salecode other than 'L1' - e.g., 'N3' that is a multi-parcel sale in one deed and not a fair market value. In some cases, the salecode is missing.Question II-c (8 points): How many of the 321 foreclosure sales involved triple-deckers (stateclass = 105)? ___221____ Be sure to include your SQL query with your answer.
select count(*), count(distinct f.parcel_id) triple_count from jf.v_ebforeclose f, assess_data.sales s where f.parcel_id = s.parcel_id and f.saledate = s.saledate and s.stateclass = 105; COUNT(*) TRIPLE_COUNT ---------- ------------ 221 154So, 221 of the 321 foreclosures involve 154 different triple-deckers.Question II-d (5 points): Among all those parcels listed in jf.v_ebforeclose that had exactly 2 foreclosures, what is the parcel_id and saledate of the triple-decker (stateclass = 105) parcel that had the earliest foreclosure sale? __0106696000, 07-FEB-90______ Be sure to include your SQL query with your answer.
First, let's extract the rows in assess_data.sales that meet the desired conditions (triple-decker with 2 foreclosures in jf.v_ebforeclose):
create view v_triplef as select * from assess_data.sales where parcel_id in (select parcel_id from jf.v_ebforeclose) and parcel_id in (select parcel_id from v_fcount where fcount = 2) and stateclass = 105;The v_fcount view was defined earlier in part II-a. Next, find the foreclosure row in v_triplef with the earliest date and use that date to find the parcel_id associated with that row:
select parcel_id, saledate, salecode from v_triplef where saledate = (select min(saledate) from v_triplef where salecode = 'L1'); PARCEL_ID SALEDATE SA ---------- --------- -- 0106696000 07-FEB-90 L1There are many ways to arrive at this answer. For example:
create view v_earlydate as select min(saledate) as mindate from assess_data.sales where parcel_id in (select parcel_id from v_fcount where fcount = 2) and stateclass = 105 and salecode = 'L1'; select parcel_id, saledate from jf.v_ebforeclose where saledate = (select mindate from v_earlydate) and parcel_id in (select parcel_id from v_fcount where fcount = 2); PARCEL_ID SALEDATE ---------- --------- 0106696000 07-FEB-90However, there are subtleties in this question that could lead to different answers (and which I would accept with all or near-all credit if you included your SQL query). For example, a triple-decker with exactly two foreclosures in jf.v_ebforeclose might have had an earlier foreclosure (prior to 1986) that was not included in jf.v_ebforeclose. There also might be a triple-decker with exactly two foreclosures in jf.v_ebforeclose that had an earlier non-foreclosure listed in assess_data.sales. If your SQL first identified triple-decker parcels with 2 foreclosures and then queried assess_data.sales to find the earliest sale date, that date need not correspond to a foreclosure sale. Experiment with different SQL strategies to examine these possibilities. There are also some parcels in jf.v_ebforeclose that have only one 'L1' foreclosure but should have more. For example, try this query: select * from assess_data.sales where parcel_id = '0105229000' order by saledate; This parcel went in to foreclosure on Feb. 13, 1989, and appears to come out on Oct. 29, 1992, at twice the price after being transferred from a mortgage company to a bank in a transaction not listed in assess_data.sales. Now that we are in another period of high foreclosures, real estate transaction databases similar to assess_data.sales will be the basis for future studies of the fallout from the current crisis and the 'transparency' about winners, losers, and neighborhood effects that might help mitigate the adverse effects of future business cycles and lending practices.
Question II-e (8 points): Among all those parcels listed in jf.v_ebforeclose that had exactly 2 foreclosures, how many sold for more money on the latest saledate? ____16_______ Be sure to include your SQL query with your answer.
Again, there are several ways to construct the answer. Here is one that uses the 'self-join' idea to look twice at assess_data.slaes, once to find the earliest saledate and the second time to find the latest saledate. First, we create a new view from jf.v_ebforeclose that extracts only those qualifying parcels with exactly two foreclosures.
create view v_fcount2 as select * from jf.v_ebforeclose where parcel_id in (select parcel_id from v_fcount where fcount = 2); select f1.parcel_id, f1.saledate, s1.price, f2.parcel_id, f2.saledate, s2.price from v_fcount2 f1, assess_data.sales s1, v_fcount2 f2, assess_data.sales s2 where f1.parcel_id = s1.parcel_id and f1.saledate = s1.saledate and f2.parcel_id = s2.parcel_id and f2.saledate = s2.saledate and f1.parcel_id = f2.parcel_id and f1.saledate < f2.saledate and s1.price < s2.price order by s2.price - s1.price desc; PARCEL_ID SALEDATE PRICE PARCEL_ID SALEDATE PRICE ---------- --------- ---------- ---------- --------- ---------- 0101476000 02-SEP-93 40000 0101476000 05-NOV-93 135500 0104665000 25-SEP-92 2000 0104665000 07-FEB-94 55000 0106683000 07-JUN-94 35000 0106683000 09-AUG-94 59000 0107002000 20-NOV-91 50850 0107002000 23-APR-92 72767 0103355000 23-FEB-94 36763 0103355000 21-APR-94 56000 0103589000 07-MAR-90 75000 0103589000 31-AUG-92 91400 0103749000 04-NOV-92 5000 0103749000 27-JAN-93 20000 0106111000 05-AUG-94 24500 0106111000 14-NOV-94 38000 0106593000 31-MAY-94 30640 0106593000 30-DEC-94 43500 0101590000 09-NOV-93 83300 0101590000 22-AUG-94 95941 0102812000 06-MAY-93 42000 0102812000 27-AUG-93 53000 0102629000 10-NOV-93 10430 0102629000 12-JAN-94 17500 0103751000 01-SEP-94 50000 0103751000 05-OCT-94 55000 0107004000 20-NOV-91 71000 0107004000 23-APR-92 74235 0102825000 15-JAN-92 37500 0102825000 22-JUL-93 40000 0103024000 03-FEB-93 21000 0103024000 17-AUG-93 21900 16 rows selected.
For this question, we will use the 1996 East Boston parcel shapefile to map some of the parcels involved in the foreclosures.
Question III-a (8 points): Write a SQL query (creating a view or table called v_3dforeclose) that lists the parcel_id of each triple-decker (stateclass = 105) that had at least one foreclosure along with a count of the number of foreclosures. How many such parcels are there? ___154___ Be sure to include your SQL query with your answer..
Again, there is more than one way to do this. Here is one way:
create view v_3dforeclose as select parcel_id, count(*) fcount from jf.v_ebforeclose where parcel_id in (select parcel_id from assess_data.sales where stateclass = 105) group by parcel_id;The view contains all 154 parcels counted in question II-c above.. Note that the view implicitly assumes that parcels retain the same stateclass=105 status for all sales. That is the case for our dataset but need not be true if, for example, a triple-decker rental was converted to a condo. Do you see what might happen if a parcel had different stateclass codes associated with different sales in assess_data.sales?
Question III-b (12 points): Create a thematic map of East Boston Parcels (similar in extent to the maps you created in Lab #2) where those parcels involved in triple-decker foreclosures are shaded (darker or lighter) based on the number of times they were involved in foreclosures during the 10 year period. Use the same ebos_parcels96.shp shapefile that we are using in Lab #5 (so parcel_i_1 in ebos_parcels96.shp will match parcel_id in assess_data.sales). This shapefile is in the eboston05 subfolder of the class data locker: K:\data\eboston05. Submit a PDF-formatted map with sufficient readability to distinguish the degree of shading of various parcels. (You can be zoomed in a bit so that only a sizeable but incomplete part of East Boston is visible on the map.)
Home | Syllabus | Lectures | Labs | CRN | 37-312 lab | MIT
Last modified: 10 March 2009 [jf]