Massachusetts Institute of Technology - Department of Urban Studies and Planning


11.521
Spatial Database Management and Advanced Geographic Information Systems
11.523
Fundamentals of Spatial Database Management

Answers to In-Lab Exam - 11 March 2008


Instructions

This is an open-book, open-note examination. You are free to use the web (especially the class web pages and Oracle documentation) to help you. Like any examination, however, we expect your submission to reflect your work exclusively. Hence any communication via any means (e.g., voice, paper, e-mail, IM, texting) with anyone other than the class instructors is prohibited.

The exam starts at 3:30 PM in the Room 37-312 Computing Lab. It is intended to be finished by 5:00 PM but you can continue until 6:30 PM so that there is less time pressure. All the tables needed for the exam questions are accessible in Oracle via SQL*Plus using your personal Oracle account that we have been using for lab exercises and homework.

The exam is worth a total of 100 points.

An effective way for you to prepare your exam answers is to use the same familiar methods you have used for the lab assignments and homework. Create an ASCII text or HTML file in your Athena locker to contain your answers. Name your exam username_exam.txt if you use text format or username_exam.html if you use HTML format. (An MS-Word formatted document is also okay.) For your own security, we suggest that you save it in the "Private" subdirectory of your locker (i.e., the user smith might create the text file on her I:\ drive from a WinAthena PC and name it I:\Private\smith_exam.txt). We strongly recommend that you retain the file containing your answers until we return the graded exams to you.

To turn in your exam, send it in an email message to jf@mit.edu.

Finally, don't spend all your time on one or two questions. Start by looking over all the questions to get a general sense of what, and how much, is being asked, and the grade points associated with each question. Then start work on your answers but move on to the next question if you've spent more than 10 minutes on any one of them.

Good luck!


Part I: Short Answer, 'Fix SQL' Questions, and Thematic Mapping (40 points)

Question I-1. (12 points total)

Take a look at the query below, which is intended to list information about the parcels that had fires:

    SELECT f.fdate, SUM(p.sqft) sqft
      FROM parcels p, fires f
     GROUP BY f.fdate
     ORDER BY sqft;
This query returns the following erroneous results:
FDATE           SQFT
--------- ----------
02-AUG-88     574800
26-JUL-89     574800
02-JUL-89     574800
02-APR-89     574800
26-JUL-90     574800
01-AUG-87     574800

6 rows selected.

Question I-1a (6 points).
Why doesn't the query do what is intended?

Answer: The PARCELS and FIRES tables are not joined together in the WHERE clause, so each fire date is associated with the total square footage of all parcels.

Question I-1b (6 points).
How would you fix this query to obtain the right result? Show your SQL and its output.

Answer:

  SELECT f.fdate, SUM(p.sqft) sqft
    FROM parcels p, fires f
   WHERE p.parcelid = f.parcelid
   GROUP BY f.fdate
   ORDER BY sqft;


   FDATE           SQFT
   --------- ----------
   26-JUL-90       5500
   26-JUL-89       5500
   01-AUG-87      14000
   02-JUL-89      50000
   02-APR-89     100000
   02-AUG-88     100000
   
   6 rows selected

 


Question I-2. (10 points total)

Explain briefly what is meant by primary key and foreign key. Illustrate your explanation in terms of the relationship between these two tables from the 'zoning' database: zoning and use. (The tables are available in Oracle and in 11.521_lab3.mdb in the class data locker.)

Answer: The primary key for a table comprises the one or more columns that uniquely identify a record (row) in the table. A foreign key in a table comprises one or more columns that are the primary key in another table. A primary key is not permiited to have null values or duplicates. A foreign key can have duplicates or null values. For the zoning table, the primary key is the casenumber column. For the use table, the primary key is the use_code column. Both the existuse and the prpseduse columns in the zoning table are each foreign keys that refer to the use_code primary key in the use table.


Question I-3: (18 points total)

The query below computes the average assessed value per condo unit and the average assessed value per gross square foot of building area for each of the condominiums (with known assessing and gross area values) that are recorded in the Fiscal Year 2005 assessing data for East Boston. The table, bos05eb, is available in Oracle and in the MS-Access database used for lab #2 (see: \data\eboston05\bos05eb_lab2.mdb within the class data locker). We have included some column formatting commands for readability and we have sorted the output in descreasing order based on the value of the 3rd column.

 COLUMN assess_per_sqft FORMAT 99990.00
 COLUMN assess_per_unit FORMAT 99999990
 

 SELECT cm_id, count(*) condo_units, 
   sum(fy2005_total)/sum(gross_area) assess_per_sqft,
   sum(fy2005_total)/count(*) assess_per_unit 
   from bos05eb
   where cm_id is not null and gross_area > 0
   group by cm_id
   order by 3 desc;

Question I-3a (8 points): Modify this query (or create a view and use the view in a new query) to determine the average assessed value per gross square foot for (a) all condominiums (combined), and (b) all condominiums (combined) with more than 5 units.

Answer: There are (at least) two ways to interpret this question depending upon what 'average' you compute:

Method (I) is to average the 91 values computed in the SQL query provided above for each of the 91 condominium complexes.

create view v_condo_value as
SELECT cm_id, sum(fy2005_total) fy2005_total, 
   sum(gross_area) gross_area, count(*) units, 
   sum(fy2005_total)/sum(gross_area) assess_per_sqft,
   sum(fy2005_total)/count(*) assess_per_unit 
  from bos05eb
 where cm_id is not null and gross_area > 0
 group by cm_id
 order by 5 desc;

SELECT avg(assess_per_sqft) avg1_assess_per_sqft
  FROM v_condo_value;
  
AVG1_ASSESS_PER_SQFT
--------------------
160.368694

Method (II) is to compute the average price across all individual condominium units (of which there are 640 in total)

SELECT sum(fy2005_total) fy2005_total, 
       sum(gross_area) gross_area, count(*) units, 
       sum(fy2005_total)/sum(gross_area) avg2_assess_per_sqft
  from bos05eb
 where cm_id is not null and gross_area > 0;

FY2005_TOTAL GROSS_AREA      UNITS AVG2_ASSESS_PER_SQFT
------------ ---------- ---------- --------------------
91682930 588258 640 155.854965 Do you understand why there is a difference? The average of the average for each condo is not the same as the average for all condo units taken individually. Here is the Method (I) average for those condo complexes that have more than 5 units. SELECT avg(assess_per_sqft) assess_per_sqft_for_big_condos FROM v_condo_value WHERE units > 5; ASSESS_PER_SQFT_FOR_BIG_CONDOS
------------------------------
162.952757 We will accept either method but I subtracted a point if you made no comment about having a choice.

Question I-3b (10 points): Develop a thematic map showing East Boston parcels and building footprints at a scale of 1:15000 or larger. Shade the parcels thematically based on the assessed value per gross square foot (as calculated using the SELECT query shown above) for the 91 condominiums. Submit a PDF-formatted map with sufficient readability to identify the condo locations and the relatively high and low-valued locations.

Answer: See the thematic map below. (It is a JPEG to be small and readable on the web. I asked for a PDF in the test so I could zoom in if necessary to see the condo locations and colors on your maps.) The map uses quantile classification with 5 categories so it is easier to spot relatively high/medium/low values. In addition, the parcel outline and the building footprints are toned down to make it easier to see the color ramp for the assessed value per gross square foot. For the purposes of the test, we just wanted to be sure that you had grabbed the right 91 condo locations and had managed to join them up with the parcel map. For the record, however, I have prepared a map layout and suggested this interpretation: The 91 condominium complexes are spread through the residential parts of East Boston, and there does not appear to be a significant spatial pattern to the relatively high and low-valued locations. If we did a viewshed analysis, we might find that the condos units with a view of Boston (to the west and southwest) have higher values, but we would need to use a terrain model to account for the hills in the southwest and northeast parts of East Boston. The large condo complexes just below the center of the map have above-average value and have a cluster of smaller condos nearby that are also above-average. I believe these are relatively recent redevelopments and they are within walking distance of the Maverick T station. Of course, these valuations are based on assessed value not sales prices and those valuations may have reflected recent renovations more than general location.

East Boston Value Map


Part II: Writing SQL Statements (60 points)

Question II-1: (26 points)

Consider the table ZRESULT described below which lists each zoning variance (from Lab #3) for which the BRA recommendation and Board decision is known and labels them as 'same' or 'diff' depending upon whether the recommendation and decision are the same.

CREATE TABLE zresult AS
SELECT casenumber, brarecom, boarddecis, 'unknown' zgroup
  FROM zoning
 WHERE brarecom > 0 and boarddecis > 0;

update zresult
   set zgroup = 'same'
 where brarecom = boarddecis;

update zresult
   set zgroup = 'diff'
 where brarecom <> boarddecis;

Question II-1a (8 points): Write a SQL query using this 'zresult' table that counts the number of zoning variances with 'same' and 'diff' labels.

Answer:
select zgroup, count(*) cases 
  from zresult
 group by zgroup;


ZGROUP       CASES
------- ----------
diff 778
same 854

Question II-1b (10 points): Write a SQL query that counts the number of cases associated with each possible combination of BRARECOM and BOARDDECIS included in ZRESULT. List only those combinations that have at least 10 cases. Include in your query the ZGROUP field and a column that counts the number of cases. Sort the output by this count (in descending order).

Answer:

select brarecom, boarddecis, zgroup, count(*) cases 
  from zresult
 group by brarecom, boarddecis, zgroup
having count(*) >= 10
order by count(*) desc; 

  BRARECOM BOARDDECIS ZGROUP       CASES
---------- ---------- ------- ----------
         2          2 same           586
         1          2 diff           154
         1          1 same           116
         6          2 diff           113
         5          5 same            99
         5          2 diff            92
         2          1 diff            81
         4          2 diff            60
         5          4 diff            39
         2          4 diff            35
         6          5 diff            31
         2          5 diff            31
         6          4 diff            30
         4          4 same            29
         6          1 diff            20
         4          5 diff            16
         1          5 diff            14
         5          1 diff            14
         1          4 diff            13
         6          6 same            13
         7          7 same            10

21 rows selected.

Question II-1c (8 points): Write a SQL query that counts the number of 'same' and 'diff' zoning variances that involve vacant land converted to housing (i.e., EXISTUSE= 10 and PRPSEDUSE=1).

Answer:
select zgroup, count(*) cases
 from zresult z, zoning n
 where z.casenumber = n.casenumber and 
        n.existuse = 10 and n.prpseduse = 1 
 group by zgroup;

ZGROUP       CASES
------- ----------
same           104
diff            86

select d.decision bra_recommend,
d2.decision board_decision, zgroup, count(*) cases
from zresult r, zoning z, decision d, decision d2
where z.casenumber = r.casenumber
and r.brarecom = d.dec_code
and r.boarddecis = d2.dec_code
and existuse = 10 and prpseduse = 1
group by d.decision, d2.decision, zgroup
having count(*) >= 10
order by count(*) desc; BRA_RECOMMEND BOARD_DECISION ZGROUP CASES
------------------------- ------------------------- ------- ----------
APPROVED WITH PROVISIONS APPROVED WITH PROVISIONS same 90
DEFERRED APPROVED WITH PROVISIONS diff 17
DEFERRED DENIED WITHOUT PREJUDICE diff 11
APPROVED WITH PROVISIONS DENIED WITHOUT PREJUDICE diff 10 The second query only lists combinations with at least 10 cases and joins in the 'decision' table as a lookup table (twice as a self-join) in place of the approval codes to increase readability. However, you receive full credit if you list the approval codes instead. Most of the vacant-land-to-housing variances were approved as recommended or deferred but about 8% (of the frequent combinations) were approved despite a BRA recommendation to deny.

Question II-2 (12 points total)

For this question, we will use the table assess_data.sales from lab #5 containing residential property sales information for Boston between 1986 and 1995, together with the table mass_landuse (from lab #2) that explains the meaning of the state land use codes in the sales table.

Question II-2a (5 points): Write a SQL query to compute the average sale price (price)of all East Boston property sales in assess_data.sales that involved residential condos or one to three-family dwellings - that is, sales satisfying the condition: "stateclass in (101, 102, 104, 105)". Include in the output a count of the total number sales with any of those four stateclass codes. (Recall that East Boston parcels begin with '01' indicating Ward #1.)

  select count(*) sale_count, avg(price) avgprice
    from assess_data.sales
   where stateclass in (101, 102, 104, 105)
     and substr(parcel_id,1,2) = '01'
	 
  SALE_COUNT   AVGPRICE 
  ---------- ----------
        2788 95820.4042
  

Question II-2b (7 points): Write another SQL query that computes the average sale price (as in part II-2a) but broken down by the four different stateclass codes considered in part II-2a. Include in the output a count of the number of sales within each of the 4 stateclass code categories. Also, use the mass_landuse lookup table to get the text describing each of the 4 included stateclass codes and include this 'descript' text in the output table of your query.

select s.stateclass, count(*), descript, avg(price) avgprice 
  from assess_data.sales s, mass_landuse m
 where s.stateclass = m.stateclass
   and s.stateclass in (101, 102, 104, 105)
   and substr(parcel_id,1,2) = '01'
 group by s.stateclass, descript
 order by s.stateclass;
STATECLASS   COUNT(*) DESCRIPT                       AVGPRICE
---------- ---------- ---------------------------- ----------
101            360    Single Fam Dwelling          99041.2556
102           1019    Residential Condo            85587.7468
104            403    Two-Fam Dwelling             103514.653
105           1006    Three-Fam Dwelling           101950.414 

Question II-3 (22 points total)

Consider these two SQL queries of the ZONING table used in class and homework:

SELECT count(*) rowcount, count(hringdate) non_null, 
       min(hringdate) early_d, max(hringdate) late_d
  FROM zoning;

    ROWCOUNT   NON_NULL EARLY_D  LATE_D
  ---------- ---------- -------- --------
        1799       1799          19871021
	  
SELECT min(hringdate)early_d FROM zoning
 WHERE hringdate <> ' ';

  EARLY_D
  --------
  19840402  

Question II-3a (5 points): Evidently, the hearing dates are missing for some zoning variance cases and these missing dates have a blank (' ') in the field. Write a SQL query that counts the number of zoning variances for which the hearing date is missing. What is that number?

Answer:

  SELECT count(*) FROM zoning
   WHERE hringdate = ' ';

    COUNT(*)
  ----------
          19

Question II-3b (7 points): The hearing dates are not encoded as a 'date' data type but are encoded as an 8-character string using year/month/day format - that is, yyyymmdd.. Write a SQL query that counts the number of zoning variances with hearing dates occurring during each year.

Answer: 

  SELECT substr(hringdate,1,4) hyear, count(*) cases
    FROM zoning
   WHERE hringdate <> ' '
   GROUP BY substr(hringdate,1,4)
   ORDER by substr(hringdate,1,4);

 HYEA      CASES
 ---- ----------
 1984          2
 1985        575
 1986        870
 1987        333 

Question II-3c (10 points): Write one or more SQL queries in order to determine the fraction of zoning variances (with known board decision) that were approved out of those zoning variances with hearing dates during 1985 and 1986. Your final query should display, for each of the two years, the count of zoning variances with known board decision, the number approved and the fraction (or percent) approved. (Consider 'approved' to mean boarddecis = 1 or 2).

Answer:

denominator count (total known cases by year):
-----------
  CREATE VIEW v_denom1 as
  SELECT substr(hringdate,1,4) hyear, count(*) totcases 
    FROM zoning
   WHERE (substr(hringdate,1,4) = '1985' OR substr(hringdate,1,4) = '1986')
     AND boarddecis > 0
   GROUP BY substr(hringdate,1,4);

numerator count (approved cases by year):
---------
  CREATE VIEW v_num1 as
  SELECT substr(hringdate,1,4) hyear, count(*) appcases
    FROM zoning
   WHERE (substr(hringdate,1,4) = '1985' OR substr(hringdate,1,4) = '1986')
     AND (boarddecis = 1 OR boarddecis = 2)
   GROUP BY substr(hringdate,1,4);

fraction approved:
--------

  SELECT t.hyear, appcases, totcases, 100.0*(appcases/totcases) pct_approved
    FROM v_num1 a, v_denom1 t
   WHERE a.hyear = t.hyear AND (t.hyear = '1985' OR t.hyear = '1986')
   GROUP BY t.hyear, appcases, totcases
   ORDER BY t.hyear; 

   HYEA   APPCASES   TOTCASES PCT_APPROVED
  ---- ---------- ---------- ------------
  1985        447        575   77.7391304
  1986        665        851   78.1433608

OR, a combined SQL query:
----------------------
  SELECT t.hyear, appcases, totcases, 100.0*(appcases/totcases) pct_approved
    FROM (SELECT substr(hringdate,1,4) hyear, count(*) appcases
            FROM zoning
           WHERE boarddecis = 1 OR boarddecis = 2
           GROUP BY substr(hringdate,1,4)) a,
         (SELECT substr(hringdate,1,4) hyear, count(*) totcases
            FROM zoning
           WHERE boarddecis > 0 
           GROUP BY substr(hringdate,1,4)) t
   WHERE a.hyear = t.hyear AND (t.hyear = '1985' OR t.hyear = '1986')
   GROUP BY t.hyear, appcases, totcases
   ORDER BY t.hyear;
 
   HYEA   APPCASES   TOTCASES PCT_APPROVED
  ---- ---------- ---------- ------------
  1985        447        575   77.7391304
  1986        665        851   78.1433608

Exam Turnin Checklist


Home | Syllabus | Lectures | Labs | CRN | 37-312 lab | MIT
Last modified: 12 March 2008 [jf]