11.521 |
Spatial Database Management and
Advanced Geographic Information Systems
|
11.523 |
Fundamentals of Spatial Database
Management
|
The exam starts at 4:05 PM in the Room W31-301 Computing Lab. It is intended to be finished by 5:30 PM but you can continue until 7:00 PM so that there is less time pressure. All the tables needed for the exam questions are accessible in Oracle via SQL*Plus using your personal Oracle account that we have been using for lab exercises and homework. These tables are also available in a personal geodatabase that can be opened using MS-Access (as well as ArcMap).
The exam is worth a total of 100 points.
An effective way for you to prepare your exam answers is to use the same familiar methods you have used for the lab assignments and homework. Create an ASCII text or HTML or Word file in your Athena locker to contain your answers. Name your exam username_exam.txt if you use text format or username_exam.html if you use HTML format. (An MS-Word formatted document is also okay.) For your own security, we suggest that you save it in the "Private" subdirectory of your locker (i.e., the user smith might create the text file on her I:\ drive from a WinAthena PC and name it I:\Private\smith_exam.txt). We strongly recommend that you retain the file containing your answers until we return the graded exams to you.
To turn in your exam, upload it to Stellar: https://stellar.mit.edu/S/course/11/sp16/11.521/homework
Finally, don't spend all your time on one or two questions. Start by looking over all the questions to get a general sense of what, and how much, is being asked, and the grade points associated with each question. Then start work on your answers but move on to the next question if you've spent more than 10 minutes on any one of them.
Before beginning your work, copy the 'exam16_data' folder from the 'data' folder in the class locker into a local drive such as C:\TEMP. All the data that you will need for the exam can be found either in your local copy of this folder or in the Oracle database. Unless specified otherwise, all tables referenced in this document are the names of tables in the Oracle database CRL that we have used for the class. All the queries required for the test can be done either in Oracle (via SQL*Plus) or in MS-Access (in 11.521_exam16_data..mdb). The 'exam16_data' folder contains the following:
Type of Data | File or Table Name | Description |
Shapefile | MA_TOWNS00.SHP | Shapefile of Mass municipalities (which we call Towns). This shapefile is a copy of the same one that we have used in class exercises. |
Shapefile | TAZ2727.SHP | Shapefile of 2,727 Traffic Analysis Zones (TAZ) in the Eastern Mass region included in the MetroFuture regional plan of Boston's Metropolitan Area Planning Council (MAPC). |
Oracle Table | TAZ2727SHP | Attribute table for traffic analysis zone shapefile of TAZ in MetroFuture area. |
Oracle Table | TAZ2727_MFDATA | Current population and housing unit estimates and projections to 2030 from alternative MetroFuture scenarios for 2,727 TAZ . |
Oracle Table | COMTYPE | Community type names and codes used by MAPC to categorize each community included in the MetroFuture scenario modeling. |
Oracle Table | BOS05EB | The same East Boston table of 2005 assessing data that we have used for lab exercises. |
Oracle Table | STCLASS | The same lookup table of State land use codes STCLASS that we have used in class exercises. In the MS-Access database the table is called MASS_LANDUSE and has an additional column, STCLASS_TXT, with the state class code expressed as a text string instead of as an integer. |
Spreadsheet | 11.521_exam16_dictionary.xlsx | Spreadsheet containing data dictionary and code sheets for the above tables. |
MS-Access database | 11.521_exam16_data..mdb | MS-Access database containing the five Oracle tables listed above, TAZ2727, TAZ2727_MFDATA, COMTYPE, BOS05EB, STCLASS. |
For your information, the Oracle tables used in this exam are owned by 'jf' (as in, JF.COMTYPE). These SQL commands are used to create a public synonym (so you do not have to prefix the table name with 'jf.') and then grant read-only (select) permission on the table to the public. For example:
create public synonym comtype for comtype;
grant select on comtype to public;
Question I-1a: The following query is intended to list the Massachusetts municipalities along with their 1980-1990 change in population for those municipalities that have more than ten polygons associated with them in the shapefile. However, the query does not work properly. Briefly explain why and show the corrected SQL query. (Note: the Oracle table jf.matown2000 is the attribute table associated with the ma_towns00.shp shapefile that we have used in lab exercises.)
SELECT town, count(town) tcount, (pop90 - pop80) pop_change FROM jf.matown2000 WHERE count(town) > 10 GROUP BY town ORDER BY tcount desc;There are two problems with the query: (a) the population change computation needs to use aggregation functions since it is not part of the 'group by', and (b) the count(town)>10 condition should be in a 'having clause' rather than the 'where' clause. A correct query is:
SELECT town, count(town) tcount, (min(pop90) - min(pop80)) pop_change FROM jf.matown2000 GROUP BY town HAVING count(town) > 10 ORDER BY tcount desc; TOWN TCOUNT POP_CHANGE --------------------- ---------- ---------- BOSTON 19 3564 WESTPORT 17 -9 BARNSTABLE 15 5869 GLOUCESTER 15 624 GOSNOLD 15 40 IPSWICH 12 307 SCITUATE 12 -487 7 rows selected.
Question I-1b: Consider the following query:
select count(*), count(pop80), count(town), count(distinct town) from jf.matown2000;The query returns the counts: 632, 632, 611, and 351 respectively. Explain briefly why all four numbers are not the same and what each number means.
The first two counts of 632 represent respectively the number of rows in the table and the number of rows with a non-null value in the 'pop80' column. The 611 count is the number of rows with a non-null value in the 'Town' column. The 351 count is the number of unique Town names in the 'Town' column.
Note, this result is common whereby numeric fields might have a 'zero' in cases that really should be null but strings, like Town name, will be null for those cases (polygons in the MassGIS shapefile in this case) that are not associated with a particular Town.
Question I-2: Explain briefly what is meant by the phrase 'many-to-many relationship.' How are many-to-many relationships typically handled when designing relational databases? Illustrate your points using one of the database schemas that we have considered in class lectures and exercises. (A verbal explanation is sufficient. No need for an ER diagram or table definitions.)
Many-to-many relationships exist when trying to relate two sets of data where particular records in one set can be associated with more than one row in the other set and vice versa. For instance, in the Websis example, students can take more than one class and any class can have more than one student. Many-to-many relationships are handled by creating two one-to-many relationships through the use of a matching table. For example, a student-class matching table is used with one row for every student-class pairing that occurs. The matching table pairs (as local foreign keys) the primary keys from each of the tables you wish to relate.
Consider the seven illustrative PARCELS tables that we have used in lecture and lab exercises.
Question I-3: What would you suggest as the primary key for the SALES table? Explain briefly. What foreign key in the SALES table would you use when joining that table with the PARCELS table?
For the SALES table, the primary key should be the combination of the parcel identifier (parcelid) and sale date (sdate) columns. The parcelid field in the sales table is the foreign key that matches the primary key parcelid in the parcels table. Note: In practice, a property can - and sometimes does - sell more than once within a single day. So we may want the sale date to be a 'date' field that measures milleseconds past midnight as well as the Julian date. Our toy sales table only includes the day.
In Part II, we will examine population and housing estimates for the metro Boston area that were generated by Boston's Metropolitan Area Planning Council (MAPC) as part of the recent MetroFuture planning process. The estimates were made for several development scenarios including a business-as-usual scenario called "Let It Be" (LIB) and an alternative that became the MetroFuture plan (abbreviated MF). The relevant data are available in the exam data directory (exam16_data) and contain the shapefiles and data tables listed earlier in this exam. Note that the shapefile, TAZ2727, showing the TAZ boundaries does not always line up precisely with Town boundaries in MA_TOWNS00. For questions involving which Town contains which TAZ, rely on the tabular attribute information in TAZ2727. All the data tables are stored in the MS-Access database, 11.521_exam16_data.mdb, and in Oracle tables that you can access from your individual Oracle accounts.
How many Mass Towns are included among the 2727 traffic analysis zones? ___164_____ Show your SQL query to count the number of unique Towns.
select count(distinct town_name) from taz2727shp order by town_name; COUNT(DISTINCTTOWN_NAME) ------------------------ 164 Since MS-Access does not support 'distinct' within a count() function, we can also do this: select count(*) from (select distinct town_name from taz2727shp); COUNT(*) ---------- 164 We could also apply the 'summarize' command within ArcMap to the 'Town_Name' column of taz2727bos shapefile.
Which Town contains the second largest number of traffic analysis zones? ____Cambridge_______ How many TAZ does that Town contain? ___104________ Show your SQL query.
select town_name, count(*) as taz_count from taz2727shp group by town_name having count(*) >50 order by count(*) desc; TOWN_NAME TAZ_COUNT -------------------- ---------- Boston 447 Cambridge 104 Newton 53 Quincy 52 Brockton 52 Note that Boston has four times as many as Cambridge, which has twice as many as the next town, Newton!
Which TAZ is expected to see the largest increase in housing units between 2000 and 2030 according to the LIB scenario? ____625___
How many housing units are projected to be added to this TAZ? ____3458______For this LIB scenario, which Town is this TAZ within? ___Cambridge!!_____
Show your SQL queries.
select taz, lib_hu2030-hu2000 as lib_delta_hu from taz2727_mfdata where mf_hu2030-hu2000 > 2000 order by mf_hu2030-hu2000 desc; TAZ LIB_DELTA_HUThis development occurs near Lechmere Square and the Science Museum at the edge of Cambridge, Somerville, and Boston and along the transportation corridor heading north out of North Station and including the planned Green Line extension into Somerville and Medford. These area is slated for significant mixed use development during this decade.
---------- ------------
625 3458.03
594 1448.53
1847 2372.76
757 129.5
145 666.25
1864 2088.52 select t.taz, town_name from taz2727_mfdata m, taz2727shp t where m.taz = t.taz and lib_hu2030-hu2000 in (select max(lib_hu2030-hu2000) from taz2727_mfdata); TAZ TOWN_NAME ---------- -------------------- 625 Cambridge Note that it was also okay just to look up 625 in the TAZ table to see that it was Cambridge: select * from taz2727shp where taz = 625; TAZ SQMTRTOT TOWN_NAME TOWN_CAPS ---------- ---------- -------------------- ---------- 625 425135.8 Cambridge CAMBRIDGE Note also that this same TAZ gets the most new housing units under the MF scenario: select taz, mf_hu2030-hu2000 as mf_delta_hu from taz2727_mfdata where mf_hu2030-hu2000 > 2000 order by mf_hu2030-hu2000 desc; TAZ MF_DELTA_HU ---------- ----------- 625 3902.71 594 2187.92 1847 2182.73 757 2165.84 145 2117.69 1864 2020
Now, let's examine the TAZ-level differences between scenarios in the estimated housing unit growth. Write a query that computes the difference in expected housing unit growth between the MF and LIB scenarios (from 2000 to 2030) for each TAZ. Call this difference MF_less_LIB. Show your SQL queries.
Which TAZ has the largest value of MF_less_LIB? ___757_____ For that TAZ, what is MF_less_LIB? __2036_________
select taz, (mf_hu2030 - lib_hu2030) as MF_less_LIB from taz2727_mfdata where (mf_hu2030 - lib_hu2030) > 500 order by (mf_hu2030 - lib_hu2030) desc; TAZ MF_LESS_LIB ---------- ----------- 757 2036.34 136 1794.32 20 1509.69 19 1509.69 145 1451.44 172 1257.7 ...Which TAZ has the smallest (most negative) value of MF_less_LIB? ___2715________ For that TAZ, what is MF_less_LIB? __-1054_________
select taz, (mf_hu2030 - lib_hu2030) as MF_less_LIB from taz2727_mfdata where (mf_hu2030 - lib_hu2030) < -500 order by (mf_hu2030 - lib_hu2030) asc; TAZ MF_LESS_LIB ---------- ----------- 2715 -1053.87 1223 -872.14 2233 -779.74 1596 -737.71 1256 -593.77 2149 -531.17 1592 -530.74
Next, let's aggregate these differences across TAZ that are considered suitable for development. The MetroFuture scenario (MF) targeted housing additions toward those communities that were considered to have areas suitable for development (in terms of infrastructure, access, and redevelopment opportunities). The attribute CODA in TAZ2727_MFDATA indicates which TAZ are considered to be suitable areas (CODA=1) for further development. Write a SQL query that reports the net difference in projected housing unit growth (MF_less_LIB) for each CODA group. Show your SQL query.
One way to write this query is to insert your previous answer as a sub-query in the 'from' clause:select coda, count(d.taz) tcount, sum(MF_less_LIB) as delta from taz2727_mfdata t, (select taz, (mf_hu2030 - lib_hu2030) as MF_less_LIB from taz2727_mfdata) d where t.taz=d.taz group by coda order by coda; CODA TCOUNT DELTA ---------- ---------- ---------- 0 909 -64248.07 1 1818 105980.32Alternatively, and even simpler in this case, you can write the query this way:
select coda, count(taz), sum(mf_hu2030 - lib_hu2030) as MF_less_LIB from taz2727_mfdata group by coda order by coda; CODA COUNT(TAZ) MF_LESS_LIB ---------- ---------- ----------- 0 909 -64248.07 1 1818 105980.32We see that the MetroFuture scenario allocated 64 thousand fewer housing units to the 909 TAZ that had CODA=0 TAZ, and 106 thousand more housing units to the 1818 TAZ in CODA=1 areas.
Now, let's aggregate these differences by Town. Write a query that computes the net difference in projected housing unit growth by Town from 2000-2030 between the two scenarios. Call this difference MF_less_LIB_byTown. Show your SQL queries..
Which Town has the largest value of MF_less_LIB_byTown? ___Boston____ For that Town, what is MF_less_LIB_byTown? ___31,588________
Which Town has the smallest (most negative) value of MF_less_LIB_byTown? __Plymouth_________ For that TAZ, what is MF_less_LIB_byTown? ___-1558________
select t.town_name, sum(mf_hu2030 - lib_hu2030) as MF_less_LIB_byTown from taz2727_mfdata d, taz2727shp t where d.taz = t.taz group by t.town_name order by sum(mf_hu2030 - lib_hu2030) desc; TOWN_NAME MF_LESS_LIB_BYTOWN -------------------- ------------------ Boston 31588.05 Somerville 3652.88 Lynn 3061.97 Cambridge 2778.33 Taunton 2498.16 ... Littleton -923.93 Dracut -926.38 Upton -967.24 Wrentham -968.93 Ashland -996.04 Plymouth -1558.37
Redo the previous question, but this time compute MF_less_LIB_byTown as a percentage of the Town's 2000 housing unit count (HU2000). Show your SQL queries and call this percentage difference MF_diff_pct_byTown
Which Town has the largest value of MF_diff_pct_byTown? ___Attleboro________ For that Town, what is MF_diff_pct_byTown? __14.5%_________
Which Town has the smallest (most negative) value of MF_diff_pct_byTown? __Upton_________ For that TAZ, what is MF_diff_pct_byTown? ____-47.4%_______
select t.town_name, 100*(sum(mf_hu2030 - lib_hu2030))/sum(HU2000) as pct_MF_less_LIB_byTown from taz2727_mfdata d, taz2727shp t where d.taz = t.taz and HU2000 > 0 group by t.town_name order by sum(mf_hu2030 - lib_hu2030)/sum(hu2000) desc; TOWN_NAME PCT_MF_LESS_LIB_BYTOWN -------------------- ---------------------- Attleboro 14.438015 Chelsea 12.0911003 Taunton 11.2149694 Boston 10.5025278 Lynn 9.13720868 ... Wrentham -28.481188 Middleton -28.645553 Littleton -31.213851 Mendon -34.047934 Bolton -42.938904 Upton -47.367287The above query is almost right. We need the "HU2000 > 0" criterion in order to avoid dividing by zero. While we know that each town had some population in 2000, it is possible for HU2000 to be zero in one or another TAZ but then to have growth during 2000-2030 in that TAZ. In that case some of the growth might not be counted in the numerator total for that town. To be sure to avoid such a situation, you can use the following syntax which also has the advantage that it works in MS-Access (where a ratio of sums is not allowed). For this query, we determine the net difference by town before computing the percentage so we can then compute a housing total for all towns that had some housing in 2000. In this case, we also converted the Town names to upper case to facilitate matching with the town names in ma_town00.shp. The upper case operator in Oracle is Upper() and in MS-Access is UCase():
select upper(n.town_name) as Town, 100*(MF_less_LIB_byTown / hu2000_bytown) as pct_mf_less_lib_bytown from (select town_name, sum(mf_hu2030 - lib_hu2030) as mf_less_lib_bytown from taz2727_mfdata m, taz2727shp t where t.taz = m.taz group by town_name) n, (select town_name, sum(hu2000) as hu2000_bytown from taz2727_mfdata m, taz2727shp t where t.taz = m.taz group by town_name) d where n.town_name = d.town_name and HU2000_bytown > 0 order by 2 desc; TOWN_NAME PCT_MF_LESS_LIB_BYTOWN -------------------- ---------------------- Attleboro 14.438015 Boston 13.1872378 Chelsea 12.4579408 Somerville 11.576232 Taunton 11.3320934 Lynn 9.13720868 ... Wrentham -28.481188 Middleton -28.645553 Littleton -31.213851 Mendon -34.047934 Bolton -42.938904 Upton -47.367287The sub-queries can be avoided by creating views for the numerator and denominator construction and then writing a third query that divides them. Note that there are a few changes in the results. If you look at the TAZ-level detail, you will find several dozen TAZ with no housing units in 2000 and non-zero growth projected. There are many more TAZ with no housing units whatsoever. These might be industrial areas or transportation hubs like the airport where no housing units exist but they are meaningful traffic analysis zones since they are travel destinations for employment, mode transfer, etc. Note also, that some TAZ have a population but no housing units. These might be dormatories, prisons, hospitals, or other such institutions.
Develop a thematic map of metro Boston Towns that is shaded to indicate the percentage difference, MF_diff_pct_byTown, computed in the previous question. Briefly discuss any pattern that you see and explain briefly your choice of classification method and color scheme to visualize the differences appropriately. (Note, if you had trouble with Question II-1f, then plot MF_less_LIB_byTown (or even HU2000 by town) so you can get partial credit for showing you can pull data into ArcMap for thematic mapping. Also, the town name is listed in upper case in some of the tables but with only leading capital letters in others. The TAZ2727SHP tables in Oracle and MS_Access includes both forms of the town name in order to facilitate table joins.)
On top of the Town-level map, overlay a TAZ level map that highlights only those two TAZ from Question II-1d that had the highest and lowest differences in added housing units, MF_less_LIB. Submit a PDF-formatted map with sufficient readability to see the highlighted TAZ and interpret whatever pattern you see in the Town-level map.
The map below is a good, readable map from submitted student work. Ten quantile categories are used so that about the same number of towns fall in each category and a color ramp is used that highlights high and low deviations from the norm. The most negative areas (new-MF << new-LIB) are primarily the outer suburbs near the second ring road (495) where the let-it-be scenario forecasts much more single-family, large lot housing development. The most positive areas (new-MF >> new-Lib) are the inner city towns and those in sub-centers and along the major transportation corridors. The highlighted TAZ are the one in Lynn (TAZ=757) with the largest value of MF_diff_pct_byTown and the TAZ in Plymouth (TAZ=1592) that had the lowest value.
![]()
This question uses the East Boston assessing data in BOS05EB that we used in labs #2 and #5. This table is available in Oracle as BOS05EB along with the lookup table for the state land use codes, STCLASS. The variable PTYPE in BOS05EB indicates the state land use code and matches the STATECLASS column in STCLASS. The column OWN_OCC in BOS05EB indicates 'Y' if the assessing record is for an owner-occupied unit and 'N" if it is not. We are interested in computing the percentage of assessing records that are for owner-occupied units and we would like to break this percentage down by PTYPE.
Write a SQL query that counts, for each value of PTYPE, the number of non-condo assessing records in BOS05EB that have OWN_OCC equal to 'Y'. Restrict your results to show only those PTYPE values that have at least 20 owner-occupied units associated with them. Show your SQL query.
select ptype, own_occ, count(*) ptotal from bos05eb where ptype is not null and own_occ ='Y' and cm_id is null group by ptype, own_occ having count(*) >= 20 order by ptype; PTY O PTOTAL --- - ---------- 013 Y 34 101 Y 783 104 Y 908 105 Y 1274 111 Y 65
Write one or more queries that counts the number of non-condo, OWN_OCC='Y' and OWN_OCC='N' assessing records for each PTYPE and computes the percentage of cases that are owner-occupied (OWN_OCC='Y') for each PTYPE. Restrict the results to show only those PTYPE values with at least 40 cases. Show your SQL query (or queries).
select y.ptype, (ycount+ncount) as ptotal, ycount, ncount, 100*ycount/(ycount+ncount) as pct_ownocc from (select ptype, count(*) as ycount from bos05eb where own_occ = 'Y'and cm_id is null group by ptype) y, (select ptype, count(*) as ncount from bos05eb where own_occ = 'N'and cm_id is null group by ptype) n where y.ptype = n.ptype and ycount +ncount >= 40 order by y.ptype; PTY PTOTAL YCOUNT NCOUNT PCT_OWNOCC --- ---------- ---------- ---------- ---------- 013 196 34 162 17.3469388 031 64 11 53 17.1875 101 982 783 199 79.7352342 104 1246 908 338 72.8731942 105 2199 1274 925 57.9354252 111 230 65 165 28.2608696 112 48 8 40 16.6666667 This same result can be reached without sub-queries by creating one view for 'Y' and another for 'N' and then combining them into a thrid query: create view v_ylist as select ptype, count(*) as ycount from bos05eb where own_occ = 'Y'and cm_id is null group by ptype; create view v_nlist as select ptype, count(*) as ncount from bos05eb where own_occ = 'N'and cm_id is null group by ptype; select y.ptype, (ycount+ncount) as ptotal, ycount, ncount, 100*ycount/(ycount+ncount) as pct_ownocc from v_ylist y, v_nlist n where y.ptype = n.ptype and ycount +ncount >= 40 order by y.ptype;
Is your name and Athena ID at the top of your exam? If not, you'll lose 2 points!
Is your completed test saved in a plain text (ASCII) file named username_exam.txt or HTML (web page) file named username_exam.html, or MS-Word file name username_exam.doc?
Did you upload your answers to Stellar: https://stellar.mit.edu/S/course/11/sp16/11.521/homework/
Did you confirm with an instructor that your exam was received?