| 11.521 | Spatial
Database Management and Advanced Geographic Information Systems |
| 11.523 | Fundamentals
of Spatial Database Management |
The exam starts at 3:30 PM in the Room 37-312 Computing Lab. It is intended to be finished by 5:00 PM but you can continue until 6:30 PM so that there is less time pressure. All the tables needed for the exam questions are accessible in Oracle via SQL*Plus using your personal Oracle account that we have been using for lab exercises and homework.
The exam is worth a total of 100 points.
An effective way for you to prepare your exam answers is to use the same familiar methods you have used for the lab assignments and homework. Create an ASCII text or HTML file in your Athena locker to contain your answers. Name your exam username_exam.txt if you use text format or username_exam.html if you use HTML format. (An MS-Word formatted document is also okay.) For your own security, we suggest that you save it in the "Private" subdirectory of your locker (i.e., the user smith might create the text file on her I:\ drive from a WinAthena PC and name it I:\Private\smith_exam.txt). We strongly recommend that you retain the file containing your answers until we return the graded exams to you.
To turn in your exam, send it in an email message to jf@mit.edu.
Finally, don't spend all your time on one or two questions. Start by looking over all the questions to get a general sense of what, and how much, is being asked, and the grade points associated with each question. Then start work on your answers but move on to the next question if you've spent more than 10 minutes on any one of them.
Take a look at the query below, which is intended to list information about the parcels that had fires:
SELECT f.fdate, SUM(p.sqft) sqft
FROM parcels p, fires f
GROUP BY f.fdate
ORDER BY sqft;
This query returns the following erroneous results:
FDATE SQFT --------- ---------- 02-AUG-88 574800 26-JUL-89 574800 02-JUL-89 574800 02-APR-89 574800 26-JUL-90 574800 01-AUG-87 574800 6 rows selected.Question I-1a (6 points).
Why doesn't the query do what is intended?Answer: The PARCELS and FIRES tables are not joined together in the WHERE clause, so each fire date is associated with the total square footage of all parcels.
Question I-1b (6 points).
How would you fix this query to obtain the right result? Show your SQL and its output.Answer:
SELECT f.fdate, SUM(p.sqft) sqft FROM parcels p, fires f WHERE p.parcelid = f.parcelid GROUP BY f.fdate ORDER BY sqft; FDATE SQFT --------- ---------- 26-JUL-90 5500 26-JUL-89 5500 01-AUG-87 14000 02-JUL-89 50000 02-APR-89 100000 02-AUG-88 100000 6 rows selected
Explain briefly what is meant by primary key and foreign key. Illustrate your explanation in terms of the relationship between these two tables from the 'zoning' database: zoning and use. (The tables are available in Oracle and in 11.521_lab3.mdb in the class data locker.)
Answer: The primary key for a table comprises the one or more columns that uniquely identify a record (row) in the table. A foreign key in a table comprises one or more columns that are the primary key in another table. A primary key is not permiited to have null values or duplicates. A foreign key can have duplicates or null values. For the zoning table, the primary key is the casenumber column. For the use table, the primary key is the use_code column. Both the existuse and the prpseduse columns in the zoning table are each foreign keys that refer to the use_code primary key in the use table.
The query below computes the average assessed value per condo unit and the average assessed value per gross square foot of building area for each of the condominiums (with known assessing and gross area values) that are recorded in the Fiscal Year 2005 assessing data for East Boston. The table, bos05eb, is available in Oracle and in the MS-Access database used for lab #2 (see: \data\eboston05\bos05eb_lab2.mdb within the class data locker). We have included some column formatting commands for readability and we have sorted the output in descreasing order based on the value of the 3rd column.
COLUMN assess_per_sqft FORMAT 99990.00 COLUMN assess_per_unit FORMAT 99999990 SELECT cm_id, count(*) condo_units, sum(fy2005_total)/sum(gross_area) assess_per_sqft, sum(fy2005_total)/count(*) assess_per_unit from bos05eb where cm_id is not null and gross_area > 0 group by cm_id order by 3 desc;
Question I-3a (8 points): Modify this query (or create a view and use the view in a new query) to determine the average assessed value per gross square foot for (a) all condominiums (combined), and (b) all condominiums (combined) with more than 5 units.
Answer: There are (at least) two ways to interpret this question depending upon what 'average' you compute:
Method (I) is to average the 91 values computed in the SQL query provided above for each of the 91 condominium complexes.
create view v_condo_value as SELECT cm_id, sum(fy2005_total) fy2005_total, sum(gross_area) gross_area, count(*) units, sum(fy2005_total)/sum(gross_area) assess_per_sqft, sum(fy2005_total)/count(*) assess_per_unit from bos05eb where cm_id is not null and gross_area > 0 group by cm_id order by 5 desc; SELECT avg(assess_per_sqft) avg1_assess_per_sqft FROM v_condo_value; AVG1_ASSESS_PER_SQFT
--------------------
160.368694Method (II) is to compute the average price across all individual condominium units (of which there are 640 in total)
SELECT sum(fy2005_total) fy2005_total, sum(gross_area) gross_area, count(*) units, sum(fy2005_total)/sum(gross_area) avg2_assess_per_sqft from bos05eb where cm_id is not null and gross_area > 0; FY2005_TOTAL GROSS_AREA UNITS AVG2_ASSESS_PER_SQFT
------------ ---------- ---------- --------------------
91682930 588258 640 155.854965 Do you understand why there is a difference? The average of the average for each condo is not the same as the average for all condo units taken individually. Here is the Method (I) average for those condo complexes that have more than 5 units. SELECT avg(assess_per_sqft) assess_per_sqft_for_big_condos FROM v_condo_value WHERE units > 5; ASSESS_PER_SQFT_FOR_BIG_CONDOS
------------------------------
162.952757 We will accept either method but I subtracted a point if you made no comment about having a choice.Question I-3b (10 points): Develop a thematic map showing East Boston parcels and building footprints at a scale of 1:15000 or larger. Shade the parcels thematically based on the assessed value per gross square foot (as calculated using the SELECT query shown above) for the 91 condominiums. Submit a PDF-formatted map with sufficient readability to identify the condo locations and the relatively high and low-valued locations.
Answer: See the thematic map below. (It is a JPEG to be small and readable on the web. I asked for a PDF in the test so I could zoom in if necessary to see the condo locations and colors on your maps.) The map uses quantile classification with 5 categories so it is easier to spot relatively high/medium/low values. In addition, the parcel outline and the building footprints are toned down to make it easier to see the color ramp for the assessed value per gross square foot. For the purposes of the test, we just wanted to be sure that you had grabbed the right 91 condo locations and had managed to join them up with the parcel map. For the record, however, I have prepared a map layout and suggested this interpretation: The 91 condominium complexes are spread through the residential parts of East Boston, and there does not appear to be a significant spatial pattern to the relatively high and low-valued locations. If we did a viewshed analysis, we might find that the condos units with a view of Boston (to the west and southwest) have higher values, but we would need to use a terrain model to account for the hills in the southwest and northeast parts of East Boston. The large condo complexes just below the center of the map have above-average value and have a cluster of smaller condos nearby that are also above-average. I believe these are relatively recent redevelopments and they are within walking distance of the Maverick T station. Of course, these valuations are based on assessed value not sales prices and those valuations may have reflected recent renovations more than general location.
Consider the table ZRESULT described below which lists each zoning variance (from Lab #3) for which the BRA recommendation and Board decision is known and labels them as 'same' or 'diff' depending upon whether the recommendation and decision are the same.
CREATE TABLE zresult AS
SELECT casenumber, brarecom, boarddecis, 'unknown' zgroup
FROM zoning
WHERE brarecom > 0 and boarddecis > 0;
update zresult
set zgroup = 'same'
where brarecom = boarddecis;
update zresult
set zgroup = 'diff'
where brarecom <> boarddecis;
Question II-1a (8 points): Write a SQL query using this 'zresult' table that counts the number of zoning variances with 'same' and 'diff' labels.
Answer:select zgroup, count(*) cases from zresult group by zgroup; ZGROUP CASES
------- ----------
diff 778
same 854Question II-1b (10 points): Write a SQL query that counts the number of cases associated with each possible combination of BRARECOM and BOARDDECIS included in ZRESULT. List only those combinations that have at least 10 cases. Include in your query the ZGROUP field and a column that counts the number of cases. Sort the output by this count (in descending order).
Answer: select brarecom, boarddecis, zgroup, count(*) cases from zresult group by brarecom, boarddecis, zgroup having count(*) >= 10 order by count(*) desc; BRARECOM BOARDDECIS ZGROUP CASES ---------- ---------- ------- ---------- 2 2 same 586 1 2 diff 154 1 1 same 116 6 2 diff 113 5 5 same 99 5 2 diff 92 2 1 diff 81 4 2 diff 60 5 4 diff 39 2 4 diff 35 6 5 diff 31 2 5 diff 31 6 4 diff 30 4 4 same 29 6 1 diff 20 4 5 diff 16 1 5 diff 14 5 1 diff 14 1 4 diff 13 6 6 same 13 7 7 same 10 21 rows selected.Question II-1c (8 points): Write a SQL query that counts the number of 'same' and 'diff' zoning variances that involve vacant land converted to housing (i.e., EXISTUSE= 10 and PRPSEDUSE=1).
Answer:select zgroup, count(*) cases from zresult z, zoning n where z.casenumber = n.casenumber and n.existuse = 10 and n.prpseduse = 1 group by zgroup; ZGROUP CASES ------- ---------- same 104 diff 86 select d.decision bra_recommend,
d2.decision board_decision, zgroup, count(*) cases
from zresult r, zoning z, decision d, decision d2
where z.casenumber = r.casenumber
and r.brarecom = d.dec_code
and r.boarddecis = d2.dec_code
and existuse = 10 and prpseduse = 1
group by d.decision, d2.decision, zgroup
having count(*) >= 10
order by count(*) desc; BRA_RECOMMEND BOARD_DECISION ZGROUP CASES
------------------------- ------------------------- ------- ----------
APPROVED WITH PROVISIONS APPROVED WITH PROVISIONS same 90
DEFERRED APPROVED WITH PROVISIONS diff 17
DEFERRED DENIED WITHOUT PREJUDICE diff 11
APPROVED WITH PROVISIONS DENIED WITHOUT PREJUDICE diff 10 The second query only lists combinations with at least 10 cases and joins in the 'decision' table as a lookup table (twice as a self-join) in place of the approval codes to increase readability. However, you receive full credit if you list the approval codes instead. Most of the vacant-land-to-housing variances were approved as recommended or deferred but about 8% (of the frequent combinations) were approved despite a BRA recommendation to deny.
For this question, we will use the table assess_data.sales from lab #5 containing residential property sales information for Boston between 1986 and 1995, together with the table mass_landuse (from lab #2) that explains the meaning of the state land use codes in the sales table.
Question II-2a (5 points): Write a SQL query to compute the average sale price (price)of all East Boston property sales in assess_data.sales that involved residential condos or one to three-family dwellings - that is, sales satisfying the condition: "stateclass in (101, 102, 104, 105)". Include in the output a count of the total number sales with any of those four stateclass codes. (Recall that East Boston parcels begin with '01' indicating Ward #1.)
select count(*) sale_count, avg(price) avgprice from assess_data.sales where stateclass in (101, 102, 104, 105) and substr(parcel_id,1,2) = '01' SALE_COUNT AVGPRICE ---------- ---------- 2788 95820.4042Question II-2b (7 points): Write another SQL query that computes the average sale price (as in part II-2a) but broken down by the four different stateclass codes considered in part II-2a. Include in the output a count of the number of sales within each of the 4 stateclass code categories. Also, use the mass_landuse lookup table to get the text describing each of the 4 included stateclass codes and include this 'descript' text in the output table of your query.
select s.stateclass, count(*), descript, avg(price) avgprice from assess_data.sales s, mass_landuse m where s.stateclass = m.stateclass and s.stateclass in (101, 102, 104, 105) and substr(parcel_id,1,2) = '01' group by s.stateclass, descript order by s.stateclass;STATECLASS COUNT(*) DESCRIPT AVGPRICE ---------- ---------- ---------------------------- ---------- 101 360 Single Fam Dwelling 99041.2556 102 1019 Residential Condo 85587.7468 104 403 Two-Fam Dwelling 103514.653 105 1006 Three-Fam Dwelling 101950.414
Consider these two SQL queries of the ZONING table used in class and homework:
SELECT count(*) rowcount, count(hringdate) non_null,
min(hringdate) early_d, max(hringdate) late_d
FROM zoning;
ROWCOUNT NON_NULL EARLY_D LATE_D
---------- ---------- -------- --------
1799 1799 19871021
SELECT min(hringdate)early_d FROM zoning
WHERE hringdate <> ' ';
EARLY_D
--------
19840402
Question II-3a (5 points): Evidently, the hearing dates are missing for some zoning variance cases and these missing dates have a blank (' ') in the field. Write a SQL query that counts the number of zoning variances for which the hearing date is missing. What is that number?
Answer: SELECT count(*) FROM zoning WHERE hringdate = ' '; COUNT(*) ---------- 19Question II-3b (7 points): The hearing dates are not encoded as a 'date' data type but are encoded as an 8-character string using year/month/day format - that is, yyyymmdd.. Write a SQL query that counts the number of zoning variances with hearing dates occurring during each year.
Answer: SELECT substr(hringdate,1,4) hyear, count(*) cases FROM zoning WHERE hringdate <> ' ' GROUP BY substr(hringdate,1,4) ORDER by substr(hringdate,1,4); HYEA CASES ---- ---------- 1984 2 1985 575 1986 870 1987 333Question II-3c (10 points): Write one or more SQL queries in order to determine the fraction of zoning variances (with known board decision) that were approved out of those zoning variances with hearing dates during 1985 and 1986. Your final query should display, for each of the two years, the count of zoning variances with known board decision, the number approved and the fraction (or percent) approved. (Consider 'approved' to mean boarddecis = 1 or 2).
Answer: denominator count (total known cases by year): ----------- CREATE VIEW v_denom1 as SELECT substr(hringdate,1,4) hyear, count(*) totcases FROM zoning WHERE (substr(hringdate,1,4) = '1985' OR substr(hringdate,1,4) = '1986') AND boarddecis > 0 GROUP BY substr(hringdate,1,4); numerator count (approved cases by year): --------- CREATE VIEW v_num1 as SELECT substr(hringdate,1,4) hyear, count(*) appcases FROM zoning WHERE (substr(hringdate,1,4) = '1985' OR substr(hringdate,1,4) = '1986') AND (boarddecis = 1 OR boarddecis = 2) GROUP BY substr(hringdate,1,4); fraction approved: -------- SELECT t.hyear, appcases, totcases, 100.0*(appcases/totcases) pct_approved FROM v_num1 a, v_denom1 t WHERE a.hyear = t.hyear AND (t.hyear = '1985' OR t.hyear = '1986') GROUP BY t.hyear, appcases, totcases ORDER BY t.hyear; HYEA APPCASES TOTCASES PCT_APPROVED ---- ---------- ---------- ------------ 1985 447 575 77.7391304 1986 665 851 78.1433608 OR, a combined SQL query: ---------------------- SELECT t.hyear, appcases, totcases, 100.0*(appcases/totcases) pct_approved FROM (SELECT substr(hringdate,1,4) hyear, count(*) appcases FROM zoning WHERE boarddecis = 1 OR boarddecis = 2 GROUP BY substr(hringdate,1,4)) a, (SELECT substr(hringdate,1,4) hyear, count(*) totcases FROM zoning WHERE boarddecis > 0 GROUP BY substr(hringdate,1,4)) t WHERE a.hyear = t.hyear AND (t.hyear = '1985' OR t.hyear = '1986') GROUP BY t.hyear, appcases, totcases ORDER BY t.hyear; HYEA APPCASES TOTCASES PCT_APPROVED ---- ---------- ---------- ------------ 1985 447 575 77.7391304 1986 665 851 78.1433608
Home | Syllabus | Lectures | Labs | CRN | 37-312 lab | MIT
Last modified: 12 March 2008 [jf]