Massachusetts Institute of Technology - Department of Urban Studies and Planning


11.521
Spatial Database Management and Advanced Geographic Information Systems
11.523
Fundamentals of Spatial Database Management

In-Lab Exam with solutions - 20 March 2018


Instructions

This is an open-book, open-note examination. You are free to use the web (especially the class web pages and online SQL documentation) to help you. Like any examination, however, we expect your submission to reflect your work exclusively. Hence any dialogue via any means (e.g., voice, paper, e-mail, IM, texting) with anyone other than the class instructors is prohibited.

The exam starts at 4:05 PM in the Room W31-301 Computing Lab. It is intended to be finished by 6:00 PM but you can continue until 7:00 PM so that there is less time pressure. All the tables needed for the exam questions are accessible in Postgres using your personal account that we have been using for lab exercises and homework. These tables are also available in a personal geodatabase that can be opened using MS-Access (as well as ArcMap).

The exam is worth a total of 100 points.

An effective way for you to prepare your exam answers is to use the same familiar methods you have used for the lab assignments and homework. Create a text, PDF or Word file in your Athena locker to contain your answers. Name your exam username_exam.txt (or *.pdf or *.docx).  For your own security, we suggest that you save the final draft of your work in the "Private" subdirectory of your locker (i.e., the user smith might create the text file on her I:\ drive from a WinAthena PC and name it I:\Private\smith_exam.txt). We strongly recommend that you retain the file containing your SQL queries, and final answers and work products until we return the graded exams to you.

Upload your exam to the homework section on Stellar: https://stellar.mit.edu/S/course/11/sp18/11.521/homework/

Finally, don't spend all your time on one or two questions. Start by looking over all the questions to get a general sense of what, and how much, is being asked, and the grade points associated with each question. Then start work on your answers but move on to the next question if you've spent more than 10 minutes on any one of them.

Good luck!


Data Preparation

Before beginning your work, copy the 'exam18_data' folder from the 'data' folder in the class locker into a local drive such as C:\TEMP. All the data that you will need for the exam can be found either in your local copy of this folder or in the Postgres database. Unless specified otherwise, all tables referenced in this document are the names of tables in the class521 Postgres database on the server that we have used for the class. All the queries required for the test can be done either in Postgres or in MS-Access but a few are easier in Postgres and the examples provided are tested in Postgres.

The datasets for this exam include newly released 2016 Boston data that are available on the city's open data portal: http://data.boston.gov We have downloaded datasets related to Boston parcels, assessed values, and we have extracted data about East Boston for use in this test.  These are the same 2016 Boston data that was used in last year's test, but the questions are different.  All the information needed for the exam is provided in the 'exam18_data' folder containing the following:

Type of Data File or Table Name Description
ArcMap doc ./11.521_exam18_start.mxd ArcMap document, similar to the one we have been using since lab #2, with several shapefiles and web servers pre-loaded that will be a helpful starting point for the test.  The Data Frame views layers in Mass State Plane (mainland) NAD83 (feet) coordinates.
Shapefile ./shapefiles/ebos_buildings02

The familiar East Boston shapefile of building footprints from 2002 that we have used in exercises. The coordinates are stored on disk in Mass State Plane (mainland) NAD83 (feet) coordinates. This layer may be useful for visualization but is not needed for the test.

Shapefile ./shapefiles/ebos_parcels96 The familiar East Boston shapefile of parcel boundaries from 1996. The coordinates are stored on disk in Mass State Plane (mainland) NAD83 (feet) coordinates.
Shapefile ./shapefiles/ebos_parcels16 A new shapefile of East Boston parcel boundaries as of fiscal year 2016. It is similar to the familiar ebos_parcels05 shapefile but stores boundaries on disk in lat-lon coordinates based on WGS84. )Since the shapefile includes projection information, ArcMap converts it on the fly to the Data Frame coordinate system for viewing.)
Shapefile ./shapefiles/ebos_blocks16w A new shapefile of East Boston blocks as of fiscal year 2016.  It is saved on disk in lat-lon coordinates based on WGS84 and converted on the fly to the Data Frame coordinate system for viewing. (Not needed to test but can be helpful for viewing.)
Shapefile ./shapefiles/ebos_blocks96w A new shapefile of East Boston blocks as of fiscal year 1996.  It is saved on disk in lat-lon coordinates based on WGS84 and converted on the fly to the Data Frame coordinate system for viewing. (Not needed to test but can be helpful for viewing.)
Shapefile ./shapefiles/ma_towns00 The familiar boundary shapefile of Massachusetts cities and towns that we have used in lab exercises. The coordinates are stored on disk in Mass State Plane (mainland) NAD83 (meter) coordinates.
Postgres Table boston96.parcel96 The familiar assessing records from 1996 for Boston parcels that we used in class exercises.
Postgres Table
boston96.sales
The familiar sales table of residential real estate transactions during the 1986-1995 period that we have used in class exercises. 
Postgres Table eboston.mass_landuse The same lookup table of Massachusetts land use codes that we have used in lab exercises. Match 'ptype' in assessing tables with 'stclass_txt' in mass_landuse since both are stored as text.
Postgres Table
boston16.bos16eb
The 2016 version of the same type of assessing records for East Boston that we have been using for 2005 data (bos05eb) in class exercises.
Postgres Table
boston16.bad16pids
A list of PID values in the bos16eb assessing tables that are considered 'bad' because they are repeated in more than one record even though the PID field is supposed to be unique in the assessing records. (You will not need this table for the test but it may be useful to help you sort out one-to-many relationships in the datasets.)
Postgres Table
boston16.
xref_parcel16_96
A new cross reference table associating 1996 and 2016 parcels (pid_long) and blocks (wpb) for 6552 East Boston parcels (in 2016).
MS-Access database
./gdb_access/
11.521_exam18.mdb
A Personal Geodatabase containing the six Postgres tables listed above, plus the attribute table (and indices) for 1996 and 2016 East Boston parcels and the boundaries of Massachusetts cities and towns. Note that this personal geodatabase is just an MS-Access database with some extra 'GDB_*' tables that are not needed for this exam.

 


Part I: Writing SQL Queries (39 points)

For Part I, we will use the bos16eb table that contains tax assessing data for real estate in East Boston. It is analogous to the bos05eb table with 2005 assessed values that we have used for class exercises. A few columns are renamed or added but we will let you know if they matter for any questions that we ask in the test.

Remember to prefix table names with schema names if the tables are not in the first schema of your search_path.  (e.g., boston16.bos16eb).   Show your SQL queries as part of your answers throughout the exam. Doing so will enable you. to obtain partial credit in the event your answer is not quite right.

Question I-1. (22 points total) Queries of bos16eb

(I-1a: 2 points): How many assessing records in bos16eb are for triple deckers (ptype='105') in East Boston? ___2142_________

select count(*)  from boston16.bos16eb where ptype='105';

(I-1b: 6 points): For all these triple deckers, compute the average of their assessed values (av_total in US dollars) ___420654_______________, their total living area (living_area, measured in square feet) _____2965_____________, and their  assessed value per square foot _____146____________. 

select count(av_total) ucount,
       round(avg(av_total),0) av_total,
   round(avg(living_area::integer),1) as living_area,
round(avg(av_total/living_area::integer),0) per_sqft
  from boston16.bos16eb
 where ptype='105';
-- '2142','420654','2965.1','146'
-- we also accepted (total value / total living area) which is a little lower (141)

(I-1c: 2 points): How many assessing records in bos16eb are for condos (ptype='102') in East Boston? ___1476_________

(I-1d: 3 points): For all these condo units,  compute the sum of their assessed values (av_total in US dollars) ____266559____________, their total living area (living_area is measured in square feet) ______938____________, and the average assessed value per square foot _____295_____________. 

select count(av_total) ucount, round(sum(av_total),0) av_total,
       round(sum(living_area::integer),1) as living_area,
       round(sum(av_total)/sum(living_area::integer),0) per_sqft
  from boston16.bos16eb
 where ptype='102'
   and av_bldg>0;
-- '1476','393441352','1384335.0','284'
-- The totals are hard to interpret.  Better to compute the averages
-- (sum(av_total)/sum(living_area)) yields $284 per square foot

select count(av_total) ucount, round(avg(av_total),0) av_total,
  round(avg(living_area::integer),1) as living_area,
round(avg(av_total/living_area::integer),0) per_sqft
  from boston16.bos16eb
 where ptype='102'
  and av_bldg>0;
-- '1476','266559','937.9','295'
-- this average is higher indicating that condos in smaller developments are higher priced.

NOTE: Recall that, for non-condos, the 10-digit pid field in bos16eb identifies the ground parcel on which the triple decker is located.  The first 7 digits are the so-called base_id and the last three digits are 000 (except for a small number of parcels that we will avoid in this test).  You can grab the first seven digits with the substring function - e.g., substr(pid,1,7).  For condos, the second column (cm_id) of bos16eb records the ground parcel ID (comprising the base_id concatenated with three zeros '000'), and the first column (pid) records the unique condo unit ID (comprising the base_id concatenated with the three digit unit number).  There is also a separate record with further information about condo ground parcels but that record should have ptype='995' rather than 102 (and will not be needed).

(I-1e: 3 points): How many distinct ground parcels account for all the condo units that you identified in the previous question?  ____245_________


select count(*), count(distinct substr(pid,1,7))
from boston16.bos16eb
where ptype='102';
-- '1476','245'
-- Should also work to count condo ground parcel directly (ptype='995')
select count(distinct pid), count(distinct substr(pid,1,7)) from boston16.bos16eb
where ptype like '995' and cm_id is not null;
-- '256','249'
-- 256 directly (11 more than previous method)
-- but only 249 have unique first 7 digits
-- still have 4 more so there must be a few empty condo ground parcels

We accepted 245, 249, 252, and 256 as correct answers since they differ only because Boston is not consistent in its numbering of parcels.

(I-1f: 6 points): Count how many condo ground parcels have 1, 2, 3, etc. number of condo units on them. Order results by the number of units.

-- find number of units in each condo
select substr(pid,1,7) base_id, count(*) units
  from boston16.bos16eb
 where ptype='102'
 group by substr(pid,1,7)

-- now get count for each number of units
-- select from above (as view) or do sub-query
select units, count(*) condo_count
  from (select substr(pid,1,7) base_id, count(*) units
         from
boston16 .bos16eb
         where ptype='102'
         group by substr(pid,1,7)) a
  group by units order by units;
'1','1'
'2','38'
'3','95'
'4','38'
'5','6'
'6','14'
'7','3'
'8','14'
'9','13'
'10','4'
'11','2'
'12','2'
'13','1'
'14','3'
'16','3'
'18','1'
'21','2'
'24','1'
'30','1'
'34','1'
'45','1'
'216','1'

-- more than a third of the condos (95 of them) have 3 units.

-- we also accepted the results using 'cm_id is not null' after subtracting 1 from row count:
select A.num_condo, count(*)
from (select count(pid)-1 as num_condo, cm_id from boston16.bos16eb
where cm_id is not null group by cm_id) as A
group by A.num_condo order by A.num_condo;

Question I-2. (17 points total): Queries of cross-reference table

We have built a cross-reference table that matches parcels in the 2016 assessing database with parcels in the 1996 assessing database.  The last two fields in the new bos16eb table contain the latitude and longitude of a parcel centroid for each assessing record.  We converted this to a point shapefile and then did a point-in-polygon operation to associate each 2016 parcel with the 1996 parcel (if any) that contained the point.  In those cases where the 2016 parcel centroid did not fall within a 1996 parcel, we computed the distance to the edge of the closest parcel.  (Note: this 'distance' is measured in decimal degrees since we did the computations using lat/lon coordinates.  Hence, the distance is distorted but quite sufficient for our purposes to distinguish close-by versus contained-within parcels.) The resulting cross-reference table is in the boston16 schema and is called xref_parcel16_96 with the following fields:

pid_long16 = 10-character 2016 parcel ID as text
wpb_2016 = 9 character 2016 block identifier as text (nn-nn-nnn)
pid_long96 = 10-character 1996 parcel ID as text
wpb_1996 = 9 character 1996 block identifier as text (nn-nn-nnn)
distance = 0 if 2016 centroid is inside of 1996 parcel, otherwise distance to closest 1996 parcel (in decimal degrees)


(I-2a: 2 points): How many 2016 parcels in xref_parcel16_96 are not inside any of the 1996 parcel boundaries? ____85________
select count(*) 
from boston16.xref_parcel16_96
 where distance > 0;

(I-2b: 3 points): There are 6552 rows in xref_parcel16_96 with no duplicates or null values for pid_long16.  How many unique 1996 parcels are matched with these 2016 parcels in the table?  _____6408___________

select count(*), count(distinct wpb_2016),
count(distinct pid_long16),
  count(distinct pid_long96), count(distinct wpb_1996)
  from boston16.xref_parcel16_96;
-- '6552','194','6552','6408','196'

(I-2c: 2 points): Some of the 1996 parcels are matched to more than one of the 2016 parcels.   How many unique 1996 parcels are matched more than once to a 2016 parcel?  ____86__________

There are no null values in pid_long96 and only 6408 unique 1996 parcels. So, some 1996 parcels must be matched to more than on 2016 parcel.  The next query lists all 86.

(I-2d: 4 points): Write a query that lists all the 1996 parcels that are matched to more than one of the 2016 parcels along with a count of the number of 2016 parcel matches for each such 1996 parcel.

select pid_long96, count(*) pcount
  from boston16.xref_parcel16_96
 group by pid_long96
having count(*)>1
order by count(*) desc;

-- 86 unique 1996 parcels were matched more than once.  Most are matched twice but one (0104126000) is matched 33 times.  When you map them, you will see that they are primarily along the waterfront where large 1996 parcels were subdivided with multi-unit condos built on them.

'0104126000','33'
'0105609000','8'
'0101757000','8'
'0104446000','5'
'0106246000','4'
'0101007000','3'
 ... 86 rows overall;


(I-2e: 3 points): How many distinct 1996 East Boston parcels are not matched to any 2016 parcel in boston16.xref_parcel16_96 ?  ____123_______
(Note: in boston96.parcel96shp the unique ground parcel ID is in parcel_i_1 and, for East Boston, the first two characters are '01'.)

select count(*), count(distinct parcel_i_1)
  from boston96.parcel96shp
 where parcel_i_1 not in
      (select pid_long96 from boston16.xref_parcel16_96)
   and substr(parcel_i_1,1,2)='01';

(I-2f: 3 points): How many distinct 1996 East Boston parcels in xref_parcel16_96 are recorded as triple deckers (stateclass='105') in boston96.parcel96? ______2160___________
(Note: for 1996, the assessing records that record property type are stored in boston96.parcel96.  The 10-character parcel ID is stored in parcel_id and the property type is stored in stateclass.)

select count(*), count(distinct parcel_id), count(distinct substr(parcel_id,1,7))
  from boston96.parcel96
 where parcel_id in
      (select pid_long96 from boston16.xref_parcel16_96)
   and stateclass=105;
-- '2160','2160','2124'
-- note that some 1996 parcels had last 3 digits of parcel ID changed from '000'

Part II: Mapping the mismatched parcels (14 points)

Map the 2016 East Boston parcels and highlight those parcels that are *not* matched to any 1996 parcel (as determined in questions I-2c above).  Zoom in to the more dense, northwest part of East Boston and switch selection or toggle the highlighted 2016 parcels so you can see whether they tend to cluster together.  Turn on the 1996 parcel layer and play with the transparency of the 2016 parcel layer so you can see how they overlay and get a sense of the reasons why (a) some of the 2016 parcel centroids did not fall within a 1996 parcel, and (b) why some of the 1996 parcels were associated with more than one 2016 parcel. 

Question II-1. (10 points ):

Map #1: Turn in a map that displays the 2016 East Boston parcels and highlights those whose centroid is not within a 1996 parcel - that is, the 2016 parcels in xref_parcel16_96 where distance >0.  In addition, display only those 1996 parcels that are matched in xref_parcel16_96 to at least four 2016 parcels.  Adjust zoom level, shading, symbology, and transparencies so you can distinguish these 1996 parcel outlines from the 2016 parcels. 

Here are a couple of example maps that have the correct visualization.

map1 map2

Question II-2. (4 points ):

Explain briefly (a) any pattern that you see regarding the location of the 2016 parcels whose centroid is not within a 1996 parcel, and (b) why some of the 1996 parcel are associated with more than one 2016 parcel.

The 2016 parcels whose centroid is not within a 1996 parcel tend to be on the periphery of East Boston - most likely along the dock areas where new multi-unit housing (mostly condos) has been built during the past few decades.  Several large 1996 parcels along the southwest waterfront have been subdivided into smaller parcels - presumably to support cluster housing and condo construction on what used to be large non-residential lots.

Part III: Concepts and Fixing SQL Queries  (10 points)

Question III-1. (3 points ):

What is the PRIMARY KEY for the Boston ZONING variance table (used in Lab #4)? ____CASENUMBER._____________
Which field in ZONING is the FOREIGN KEY for the Boston NEIGHBRHD table (also used in Lab #4)?
____NEIGHBRHD________________.

Explain briefly a few reasons why it is generally considered good practice to use integer codes linked to lookup
tables for attributes such as owner or town names rather than omit the lookup table and just insert the names in
the main table.

The primary key in the ZONING variance table from Lab #3 is the case number, CASENUMBER. The foreign key for neighborhood in the ZONING table is NEIGHBRHD (which matches the primary key NEI_CODE in the NEIGHBRHD table. In general, it is a bad idea to give a table the same name as one of the columns in a table, but Oracle can keep track of the difference if we do not get confused.

Using a numeric code generally saves storage space compared with using the full name. In addition, using a name invites misspelling errors and invites additional problems when real-world names get changed or abbreviated.

Question III-2. (3 points ):

Consider the following query:

select count(*), 
       count(distinct pid_long16), count(distinct wpb_2016), 
       count(distinct pid_long96), count(distinct wpb_1996)
  from boston16.xref_parcel16_96;
-- '6552','6552','194','6408','196'
Explain briefly what each of these number is counting.

The number of rows in the result; the number of distinct (unique NON-NULL) 2016 parcel identifiers, 2016 blocks, 1996 parcel identifiers, and 1996 blocks. 

Question III-3. (4 points ):

The following query using tables in the toy parcel database (toyparcels) is intended to list
the owners who own more than one parcel together with a count of the number of parcels that they own.  However, the query is incorrect.  Briefly explain why and show your corrected query.

select ownernum, oname, count(parcelid)
  from owners n, parcels p
 where count(parcelid)>1
 group by ownernum, oname;

The two tables are not joined and the count()>1 criterion should be later in a having clause.

select ownernum, oname, count(parcelid)
  from owners n, parcels p
 where n.ownernum=p.onum 
 group by ownernum, oname
having count(parcelid)>1;
18,'JOHN MCCORMACK','5' 32,'GERALD RAPPAPORT','5' 200,'VANDELAY INDUSTRIES','2'

Part IV: Queries of 1986-1995 residential sales (37 points)

The residential sales data for 1986-1995 are stored in boston96.sales and use the same parcel IDs as the 1996 assessing and parcel shapefile tables.  Let's reuse some of our queries from Lab Exercise #5 to find all the triple deckers and condos that sold during the 1986-1995 period.  Then we can use the cross-reference table boston16.xref_parcel16_96 to match them to 2016 parcels (and assessing data) and answer some questions about what has happened to those parcels and what they are worth a few decades later.

In Lab #5, we created a view (public.xxx_holdsales3) listing each East Boston ground parcel that sold during the 1986-1995 decade along with the average sale price and number of sales.   Here is a modification of this query to find the average sale price and sales count for each East Boston ground parcel where the sale record indicates that the parcel is a triple decker (stateclass = 105') and the salecode indicates that the sale was an arm-length transaction (salecode in ('SX', 'W1', 'W2', 'X1')). (Note: this view groups parcels by the first 7 characters of their parcel identifier and then replaces the last three characters with '000'.  This will be sufficient to match ground parcels in the shapefiles with assessing records for almost all of the triple deckers and condos.  The relationship is more complicated for certain other land uses and for a few triple deckers that we we not consider for the purposes of these test questions. In last year's test we provided a bad16pids table with ninety-one 2016-year pid values that were not unique in boston16.bos16eb.  This table is available in boston16.bad16pids but not needed for this year's test.)

create view public.xxx_triplesales_arms as
select substr(parcel_id,1,7)||'000' as parcel_id, base_id,
avg(cast(price as numeric)) as avg_price, count(*) as scount from boston96.sales s, (select distinct substr(parcel_i_1,1,7) as parcelshp from boston96.parcel96shp where substr(parcel_i_1,1,2) = '01') p, boston96.salecode c where p.parcelshp = substr(parcel_id,1,7) and cast(to_char(s.saledate,'YYYY') as integer) > 1985 and s.salecode=c.salecode
and (s.stateclass = 105) and (s.salecode in ('SX', 'W1', 'W2', 'X1')) group by substr(parcel_id,1,7)||'000', base_id;

Question IV-1. (13 points ):

(IV-1a: 4 points): Run this SQL statement to create a view, public.xxx_triplesales_arms (substituting your initials for xxx), and then use it to write a new SQL query that computes the overall average sale price and number of sales for triple decker sales during the decade that were considered to be arms length transactions.

Arms-length triple decker sales:  count:____400 _________, average sale price:_____140109 OR 140335___________.

create view public.jf_triplesales_arms as
select substr(parcel_id,1,7)||'000' as parcel_id,  base_id,
       avg(cast(price as numeric)) as avg_price, count(*) as scount
  from boston96.sales s,
      (select distinct substr(parcel_i_1,1,7) as parcelshp
         from boston96.parcel96shp
         where substr(parcel_i_1,1,2) = '01') p,
      boston96.salecode c
 where p.parcelshp = substr(parcel_id,1,7)
   and cast(to_char(s.saledate,'YYYY') as integer) > 1985
   and s.salecode=c.salecode
   and (s.stateclass = 105)
   and (s.salecode in ('SX', 'W1', 'W2', 'X1'))
  group by substr(parcel_id,1,7)||'000', base_id;

This section (Part IV-1) is particularly tricky because there is a column with number of sales and a column with average price in the view where each row represents a unique parcel. Note that the question asks for the number of sales and overall average sale price from the view. Hence, the sale count should be sum(scount) not count(scount), and the overall average sale price might or might not be computed by weighting each parcel's average sale price by the number of sales on that parcel. The sales-weighted average sale price is sum(avg_price*scount)/sum(scount) rather than avg(avg_price).  Since one could rationalize both the weighted and unweighted average prices, we accepted either average but you lost a point if you counted the number of parcels with sales rather than the total number of sales.

select 'triple_arms', count(scount) num_of_parcels,
       round(avg(avg_price),0) unweighted_avg_price,

       sum(scount) num_of_sales,
       round(sum(avg_price*scount)/sum(scount),0) weighted_avg_price
  from public.jf_triplesales_arms;
-- 'triple_arms','356','140109',
'400','140335'
-- several triples sold more than once and the sales-weighted average price
-- is almost the same as the unweighted average price

(IV-1b: 3 points): Adjust this view to create a new view, public.xxx_triplesales_fcl, for all triple decker sales that involved bankruptcy or foreclosure as determined by sales with (salecode in ('I1', 'L1')).  Show us your SQL for this view and use it to compute the overall average sale price and number of sales for triple decker sales during the decade that were considered to be bankrucpy or foreclosure sales.

Foreclosure triple decker sales: count:_______221________, average sale price:______58415 OR 58258__________.

create view public.jf_triplesales_fcl as
select substr(parcel_id,1,7)||'000' as parcel_id,  base_id,
       avg(cast(price as numeric)) as avg_price, count(*) as scount
  from boston96.sales s,
      (select distinct substr(parcel_i_1,1,7) as parcelshp
         from boston96.parcel96shp
         where substr(parcel_i_1,1,2) = '01') p,
      boston96.salecode c
 where p.parcelshp = substr(parcel_id,1,7)
   and cast(to_char(s.saledate,'YYYY') as integer) > 1985
   and s.salecode=c.salecode
   and (s.stateclass = 105)
   and (s.salecode in ('I1', 'L1'))
  group by substr(parcel_id,1,7)||'000', base_id; 

select 'triple_fcl', count(*) num_of_parcels, sum(scount) num_of_sales,
round(avg(avg_price),0) avg_price,

round(sum(avg_price*scount)/sum(scount),0) wgtd_avg_price
  from public.jf_triplesales_fcl; -- 'triple_fcl','154','221','58415','58258'
-- hmm, much less than half price for bankruptcy and foreclosures

(IV-1c: 6 points): Further adjust the views to create public.xxx_condosales_arms and public.xxx_condosales_fcl that do the same thing for condo sales (stateclass = 102') and then compute their sale counts and average prices.

Arms-length condo unit sales:  count:______483_______, average sale price:____89629 OR 92684__________.

Foreclosure condo unit  sales:  count:_______271_______, average sale price:_____60178 OR 53779_________.


create view public.jf_condosales_arms as
select substr(parcel_id,1,7)||'000' as parcel_id,  base_id,
       avg(cast(price as numeric)) as avg_price, count(*) as scount
  from boston96.sales s,
      (select distinct substr(parcel_i_1,1,7) as parcelshp
         from boston96.parcel96shp
         where substr(parcel_i_1,1,2) = '01') p,
      boston96.salecode c
 where p.parcelshp = substr(parcel_id,1,7)
   and cast(to_char(s.saledate,'YYYY') as integer) > 1985
   and s.salecode=c.salecode
   and (s.stateclass = 102)
   and (s.salecode in ('SX', 'W1', 'W2', 'X1'))
  group by substr(parcel_id,1,7)||'000', base_id; 

select 'condo_arms', count(*)
num_of_parcels,
round(avg(avg_price),0) unweighted_avg_price,
sum(scount) num_sales,
round(sum(avg_price*scount)/sum(scount),0) weighted_avg_price
from public.jf_condosales_arms; -- 'condo_arms','73','89629','483','92684'
-- so, many condo parcels had multiple sales and the ones with
-- more sales tended to be more expensive since the sales-weighted
-- average price is higher than the unweighted average price

create view public.jf_condosales_fcl as select substr(parcel_id,1,7)||'000' as parcel_id, base_id, avg(cast(price as numeric)) as avg_price, count(*) as scount from boston96.sales s, (select distinct substr(parcel_i_1,1,7) as parcelshp from boston96.parcel96shp where substr(parcel_i_1,1,2) = '01') p, boston96.salecode c where p.parcelshp = substr(parcel_id,1,7) and cast(to_char(s.saledate,'YYYY') as integer) > 1985 and s.salecode=c.salecode and (s.stateclass = 102) and (s.salecode in ('I1', 'L1')) group by substr(parcel_id,1,7)||'000', base_id;

select 'condo_arms', count(*) num_of_parcels,
       round(avg(avg_price),0) unweighted_avg_price,
       sum(scount) num_sales,
       round(sum(avg_price*scount)/sum(scount),0) weighted_avg_price
  from public.jf_condosales_fcl;
-- 'condo_arms','54','60178','271','53779'
-- condos that foreclosed sold for one-third less and the sales-weighted
-- average price is less than the unweighted average price indicating that
-- condos with more sales tended to have a lower price for their units

Question IV-2. (15 points ):

(IV-2a: 4 points): During the 1986-1996 decade, a number of triple deckers sold at least once via an arms length transaction and at least once via bankruptcy or foreclosure (as a triple decker). Write a query to count the number of triple decker parcels (parcel_id) that behaved in this way.

select count(*)
  from public.jf_triplesales_arms
 where parcel_id in (select parcel_id from public.jf_triplesales_fcl)
-- 66 triples sold at arms-length and also sold in bankruptcy or foreclosure

-- alternatively,
select count(*), count(distinct parcel_id) from public.jf_triplesales_arms a, public.jf_triplesales_afcl f where a.parcel_id=f.parcel_id;

-- 66 triples sold
at arms-length and also sold in bankruptcy or foreclosure

(IV-2b: 4 points): During the 1986-1995 decade, a few of triple deckers that sold at least once via an arms length transaction, also sold at least once, in another sale, as a condo via either arms length or foreclosure! Write a query to list the parcels (parcel_id) that behaved this way.

select count(*), count(distinct parcel_id) 
  from public.jf_triplesales_arms
 where parcel_id in (select parcel_id from public.jf_condosales_arms)
    or parcel_id in (select parcel_id from public.jf_condosales_fcl);
-- '3','3' different triples sold at arms-length and also sold in bankruptcy or foreclosure

select a.parcel_id, count(distinct c.parcel_id) condo_arms, count(distinct s.parcel_id) condo_fcl from public.jf_triplesales_arms a, public.jf_condosales_arms c, public.jf_condosales_fcl s where a.parcel_id = c.parcel_id or a.parcel_id = s.parcel_id group by a.parcel_id order by a.parcel_id; '0103002000','73','1' '0103121000','73','54' '0104488000','1','54'
-- the 'or' in the c/s join allows extra rows to match; one for each sale of the other type
-- so the 73 and 54 values are bogus, but not needed for the question.

(IV-2c: 4 points): Let's examine the sales transactions of the ground parcel(s) that you identified in the previous query. List the parcel_id, saledate, stateclass, price, seller, buyer, salecode, and condotag (from the boston96.sales table) for each sale involving the ground parcels identified in the previous query.  Order the output by parcel_id and saledate.

select *
  from boston96.sales
 where substr(parcel_id,1,7)in ('0103002','0103121','0103121')
 order by parcel_id, saledate;

'738','0103002000','1987-03-27 00:00:00-05','105','185000','ERSKINE HARRY ET AL','LOCHIATTO ANTHONY ET','W1','0103002','N'
'31377','0103002002','1992-10-20 00:00:00-04','102','99000','LOCHIATTO ANTHONY ET','PROGRESSIVE CONSUMER','L1','0103002','Y'
'31378','0103002004','1992-10-20 00:00:00-04','102','99000','LOCHIATTO ANTHONY ET','PROGRESSIVE CONSUMER','L1','0103002','Y'
'31379','0103002006','1992-10-20 00:00:00-04','102','99000','LOCHIATTO ANTHONY ET','PROGRESSIVE CONSUMER','L1','0103002','Y'
'802','0103121000','1986-02-28 00:00:00-05','105','125000','CAPOBIANCO THOMAS ET','THERIAULT ANNE ET AL','W1','0103121','N'
'31388','0103121002','1987-06-11 00:00:00-04','102','69000','THERIAULT ANNE ET AL','DIANGELIS-MARAGANIS','W1','0103121','Y'
'31389','0103121004','1987-06-11 00:00:00-04','102','70000','THERIAULT ANNE ET AL','HORTON ANDREA','W1','0103121','Y'
'31390','0103121006','1987-10-26 00:00:00-05','102','82425','114 TRENTON ST DEVEL','HUBBARD DIANE LYNN','W1','0103121','Y'
'31391','0103121006','1992-07-10 00:00:00-04','102','78500','HUBBARD DIANE LYNN','FEDERAL HOME LOAN MG','L1','0103121','Y'
'31392','0103121006','1993-01-07 00:00:00-05','102','21000','FEDERAL HOME LOAN MT','MURRAY JOHN B ET AL','SX','0103121','Y'

(IV-2d: 3 points):  Briefly interpret the sequence of sales that you observe for these triple deckers in the previous question.  Comment on when they sold as a triple decker, when they sold as condos, and when the sale involved a foreclosure.

They sold in 1986 or 1987 as a triple decker in arms-length transactions and were each converted to three condo units.  All subsequent sales were of condo units.  The buyer of the first parcel owned all three condo units but lost them in foreclosure in 1992.  The second parcel was bought by one person (Theriault Anne et al) in 1986, converted to condos, and sold as three separate condo units in 1987 in arms-length transactions.  One of these unit was sold via foreclosure in 1993 (for 21000, far less than the 82k and 78k prices for the other two units). 


Question IV-3. (9 points ):

Now, let's use the cross-reference table, boston16.xref_parcel16_96,  to compare sales prices for East Boston parcels during 1986-1995 with their assessed values in 2016.  In question III-1 we have already computed the number of 1986-1995 sales and average sale price for all triple deckers and condos, broken down by arms-length sales and bankruptcy or foreclosure sales.  In question III-1a, you created a view, public.xxx_triplesales_arms, that listed all the triple decker parcels that sold via arms-length transactions during 1986-1995.  We can use the cross-reference table, boston16.xref_parcel16_96, to determine the corresponding 2016 ground parcels for these triple deckers with arms-length sales and compute the average of their 2016 assessed values (av_total) in boston16.bos16eb.-`

(IV-3a: 3 points): How many of the East Boston triple decker ground parcels in public.xxx_triplesales_arms, matched ground parcels in the 2016 aasessing records in boston16.bos16eb?  _______356______________

select count(*) from public.jf_triplesales_arms
 where parcel_id in (select pid_long96 from boston16.xref_parcel16_96);
-- 356 of the triple deckers are matched to 2016 parcels

(IV-3b: 3 points): How many of these ground parcels that were triple deckers in public.xxx_triplesales_arms are listed as condos (ptype='102') rather than triple deckers (ptype='105') in the 2016 assessing table? Number of ground parcels converted to condos _______14_________.

select count(*) from public.jf_triplesales_arms
 where parcel_id in (select pid_long96 from boston16.xref_parcel16_96)
   and substr(parcel_id,1,7) in (select substr(pid,1,7) from boston16.bos16eb
                                  where ptype='102');
-- 14 triple deckers from the 356 in the 'arms-length' view are converted to condos     
                            
 

(IV-3c: 3 points): Now let's compute the average of the assessed values of all those ground parcels that appeared in public.xxx_triplesales_arms, are linked in boston16.xref_parcel16_96 to 2016 ground parcels, and are still listed as triple deckers in 2016.  (Note: A few pre-1996 triple deckers were converted to uses other than condos and a few triple decker ground parcels that had only one assessing record in 1996 ended up with more than one assessing record in 2016.  So do not worry if this count plus the count in the previous question do nut equal the number of rows in public.xxx_triplesales_arms.) Number of ground parcel meeting criteria: _____327_________  Average of their 2016 assessed value ____419798________________. 

select count(*) tparcels, round(avg(av_total),0) av_total
  from public.jf_triplesales_arms t, boston16.bos16eb a
 where parcel_id in (select pid_long96 from boston16.xref_parcel16_96)
   and substr(t.parcel_id,1,7) = substr(a.pid,1,7)
   and a.ptype='105';
-- '327','419798'

If you examine the data further, you will find that, whether or not they were sold and/or foreclosed during 1986-1995, the triple deckers and condos appreciated to about the same level by 2016 (for each unit type).  The triple deckers are worth more than individual condo units, but not three times as much, and the assessed value per square meter of living_area is considerably higher for condo units. 


Exam Turnin Checklist

Is your name and Athena ID at the top of your exam? If not, you'll lose a point!

Is your completed test saved in a plain text (ASCII) file named username_exam.txt or HTML (web page) file named username_exam.html, or MS-Word file name username_exam.doc, or PDF file named username_exam.pdf?

Did you upload your answers to Stellar: https://stellar.mit.edu/S/course/11/sp18/11.521/homework/

Did you confirm with the instructor that your exam was received?


Home | Syllabus | Lectures | Labs | CRN | 37-312 lab | MIT


Last modified: 18 March 2018 [jf]