MIT 11.521/11.523 Lab 3

Massachusetts Institute of Technology - Department of Urban Studies and Planning

11.521	Spatial Database Management and Advanced Geographic Information Systems
11.523	Fundamentals of Spatial Database Management

Lab 3: More SQL Exercises + Problem Set A
The Zoning Variance Database & Categorization via Lookup Tables

This Lab exercise #3 was posted on February 18 and will be the lab exercise for February 23. The first part of the lab is due two days later on Thursday, February 25. The rest of the lab exercise is called Problem Set A and is due (after the in-class test) on Thursday, March 11.

Introduction

This lab has three main purposes: (1) to improve our capacity for 'relational thinking' and SQL query construction using a Zoning Variance Database of property-related transactions recording the characteristics of some 1800 zoning variances filed during the mid 1980s building boom in Boston, (2) to learn how relational thinking and database support can help you accumulate local knowledge in ways that can facilitate the interpretion and analysis of large, shared 'read-only' datasets, and (3) to prepare for further analysis of East Boston land use and land value patterns by preparing a block-level basemap and constructing SQL statements that compute land value indicators..

The zoning variance database contains information collected by Clark Broida (MCP '87) about all Boston zoning variance applications from early 1984 through mid 1987. Zoning variances request permission for landuses that deviate from the established zoning and landuse regulations. Much of the development and rehabilitation that occurred in Boston during the 1980s required a zoning variance. Clark encoded these data while working as an MCP intern with the Boston Redevelopment Authority (BRA) in order to help understand typical zoning variance request patterns and to inform zoning changes. His MCP thesis (C. Broida, June, 1987) analyzed these data and the zoning review process. We will use the zoning database to practice SQL query construction and to examine the spatial pattern of (not)approved zoning variances. We will also use the zoning variance database to study a more sophisticated use of SQL's 'group by' capabilities. By creating local 'lookup' tables, we can categorize and reinterpret large, shared, 'read-only' datasets without needing 'write' access to the detailed, 'official' datasets.

We are already familiar with the East Boston parcel dataset. A set of class notes relevant to the zoning variance data related lectures and examples are available online. These notes are grouped under a 'ZONING Database' section in the SQL Notes and they include:

Zoning Variances	Schema of ZONING table (and related lookup tables with codes for Decision, Use, NSA, and Neighbrhd)
1980 Census data (by Boston NSA)	Schema of 1980 Boston Census data (grouped by neighborhood statistical area, NSA)
SQL examples using zoning variances	Annotated SQL queries of ZONING table
Grouping zoning applicants via 'lookup' tables	Annotated SQL queries illustrating use of lookup tables to categorize ownership of properties seeking zoning variances
Zoning Variance Database Evolution Chart	Stages of evolution of the ZONING variance database

The ZONING table and associated lookup tables are in our Oracle database and accessible to everyone in the class. In addition, the same tables are available as an MS-Access database in the class data locker: http://mit.edu/11.521/data/11.521_lab3.mdb.

PART 1: SQL Queries Exploring the Zoning Variance database

Logon to SQL*Plus with your own Oracle account in order to construct the SQL statements requested in this section of the lab exercise. Also, copy the MS-Access database in http://mit.edu/11.521/data/11.521_lab3.mdb into writeable (temporary) space on your local disk such as C:\USERTEMP. If you want to save queries you construct in the MS-Access database, remember to copy the database back to your Athena locker before you logout!

Review the sample SQL zoning queries in the class notes. Be sure that you understand the encoding of zoning variance information -- especially existing and proposed landuse, the handling of missing values, how particular code violations are encoded, and how the Board recommendations and decisions are encoded. Review -- and rerun -- the sample queries to get a feel for the data -- which types of code violations were typical, which ones were approved, how did they spread across neighborhoods, etc. The SQL queries in the examples are formatted for Oracle, but they will also work in the MS-Access database, 11.521_lab3.mdb, with minimal changes. To conform to the MS-Access syntax, for example, you will need to insert the 'as' keyword to rename a column. So a 'select' statement might be: "select count(*) AS rowcount from...". You may also need to examine the SQL statement in an editor before you paste it into MS-Access to be sure that it does not contain hidden characters (such as &nbsp or extra line feeds that might be invisible on a web page). Older versions of MS-Access may also complain if there are leading blanks in SQL statement lines that have been formatted for easy reading. Note also that the sample SQL statements specify table joins via constraints in the 'where' clause whereas the default MS-Access syntax involves 'inner join' statements in the 'from' clause. Nevertheless, MS-Access will correctly process the sample SQL statements. However, the Design view will not show the link between joined tables. Do you understand why?

In constructing your queries, pay attention to how you handle 'missing values'. When computing the fraction of variances approved, for example, be careful not to count variances for which the outcome is not known. You may prepare your SQL queries and output either in Oracle (using SQL*Plus) or in MS-Access.. For all questions, show your SQL statements and their resulting output.

[*]Question I-1 (do for lab #3). A few of the sample queries use the AVG functions (and a little arithmetic) to compute the percentage of variances that meet various conditions. Which are the most common yard violations: front, side, rear yard or setback violations? What fraction of all zoning variances (with known sideyard conditions) involved sideyard violations?

[*]Question I-2 (do for lab #3): List the case number, sub-neighborhood, existzonin, estimated cost, board recommendation (brarecom), board decision, offstreet parking indicator (#101), floor-area-ratio indicator (far151), and sideyard indicator (syard191) for all variances proposing to convert vacant land to housing in CHARLESTOWN (this is a neighborhood, not a sub-neighborhood). Use the column formatting commands to adjust column widths so the queries print on an 80-column page (or a Portraint-formatted page of output). Use the name ' CHARLESTOWN' in your query, not the numeric code that corresponds to this neighborhood. Sort by the case numbers in ascending order.

Question I-3: Fix the following SQL query so that it counts the proposed uses of zoning variances where the existing use is vacant land (code 10).

    SELECT PRPSEDUSE, LANDUSE, COUNT(*)
      FROM ZONING Z, USE U
     WHERE Z.PRPSEDUSE = U.USE_CODE
     GROUP BY PRPSEDUSE, LANDUSE
    HAVING EXISTUSE = 10;

Question I-4: Write a query that returns the name of the neighborhood (neighbrhd) along with the count of cases that propose to change the land use from vacant land to commercial. Sort by the count of cases in descending order, then the neighborhood names in ascending order. Use the names of the land uses ('COMMERCIAL', 'VACANT LAND') rather than numeric codes in your query.

Question I-5: Next, we want to compute the percentage of variances, by neighborhood, that proposed to convert vacant land to housing. This sounds easy, but isn't because it's hard to compute the numerator and denominator of this fraction in the same query. It is possible to do this in a single, complex SQL statement (that we will show you later on). However, it is less elegant but more easily understood to do this in three steps. First, generate a table (or, better yet, a view) for the denominator -- i.e., a table that counts (by zoning.neighbrhd) all zoning variances (with kown existing/proposed use). Next, compute the numerator -- i.e., a table (or view) showing the counts (by neighborhood) of those vacant-to-housing variances. Now combine these two temporary tables (or views) using a third SQL query to show the total and filtered counts (for each neighborhood) in side-by-side columns, along with their ratio (the desired fraction, expressed as a percent).

Include in your lab assignment all three SQL statements and the final result showing the total and filtered counts (and percents). Sort your results by the neighborhood names. Include both the name and number of the neighborhood in your results. (It is possible but hard to use SQL to generate such a table all in one step. We'll show you a way later on. Think about why this is the case.)

Question I-6: Now let's look at the percentage of variances (with a known outcome) that were approved. For all variances with a known BRARECOM and known BOARDDECIS, determine the number of variances and the percentage approved for each neighborhood (neighbrhd). Consider 'approved' to mean codes 1 and 2 (i.e., approved as submitted or approved with provisions).

Question I-7extra: Optional (just for fun): Here's another question involving percentages. Compute by neighborhood (neighbrhd), the percent of variances for which the existing land use is 'housing' and the proposed use is 'housing'. Include the neighborhood name (via the lookup table) in your results, and sort them by the neighborhood name.

Question I-8extra: Optional (material not yet covered): Repeat the query in I-4 above, only show all the neighborhoods, regardless of whether they had a qualifying case or not (i.e., the count should be zero for any neighborhoods that did not show up in the last query). Note: This query involves multiple SQL queries or a complex SQL query using techniques (outer joins and SQL statements in the 'from' clause) beyond what we have demonstrated in class; that's why it's an extra credit question. Don't spend a lot of time on it -- just understand why it isn't so easy to get the zero-case rows added to the table in one easy step (and we will explain how to do this later in the semester).

Question I-9extra: Optional (just for fun): Use the 1980 census table (aggregated to the 64 sub-neighborhood statistical areas, NSA) to determine which of the NSAs are above average in income. (First, think about and explain how the 'average' income that you choose to use here. This is not a trick question with a precise answer. There are several plausible ways to compute the 'average' (depending on whether you weight the NSAs) and you should think about which one to choose.) Next, develop SQL queries to determine the fraction of housing-to-housing variances (with a known decision) that were approved for these two groups of sub-neighborhoods: (a) those with above average income, and (b) those with below average income.

PART 2: Using Lookup Tables to Update and Categorize Read-Only Databases

In this part, you will categorize the zoning variances based on the type of land ownership of the parcels. We began discussing good and bad ways of doing this in class, and we have extensive online notes about the concepts and queries needed for the suggested strategy: "Grouping Zoning Applicants via Lookup Tables". Be sure you understand the concept behind using lookup tables and 'group by' queries to create and use your customized categorizations of variance without ever needing to change anything in the 'official' zoning table. Please also note the class reading that develops these ideas further using, as a context, the problem of correcting 'spelling errors' in owner names recorded in the official Boston parcel database: Ferreira, Joseph Jr., "Information Technologies that Change Relationships between Low-Income Communities and the Public and Non-profit Agencies that Serve Them," Chapter 7 in MIT Press Book from Spring 1996 DUSP colloquium on High Technology and Low-Income Communities,"High Technology and Low-Income Communities: Prospects for the Positive Use of Advanced Information Technology," edited by Donald A. Schön, Bish Sanyal, and William J. Mitchell, MIT Press, 1997. This Chapter 7 paper is online at: http://mit.edu/11.521/papers/techcity_7ferreira.pdf

[*]Question II-1 (do for lab #3): Using the lookup table technique, develop a new table that categorizes properties and property owners in the zoning database into groups that are some combination of public/institutional/individual ownership and residential/commercial/other land use. Don't try to be all-inclusive but focus on 4 or 5 major categories (with a good number of cases in each) and lump the rest together. There are many possibile groupings. Pick one with an eye towrd expecting different approval/denial rates across different categories. Provide an annotated list of all the SQL statements used to create your lookup table, initialize the values in the lookup table and implement one update 'rule' similar to this one (but using your table names and categories):

UPDATE apptype 
   SET agroup = 'non-housing-corp'
 WHERE (applicant is null or applicant LIKE '%TRUST%') and existuse <> 1;

Question II-2: Develop half a dozen additional update 'rules' based on owner name and/or land use code (either EXISTUSE or EXISTZONIN). Don't spend a lot of time building 'updates' to catch every last corporation, trust, or public/private land use. Once you have set up your lookup tables, settled on your categories, and made sure they properly handle your first update rule developed in the previous questions, go on to Part 3 of the lab. You can come back to this part later after you in you have finished Part 3 and all the other homework questions.

Question II-3: For each of your 5 ownership categories in II-1 above, prepare a summary table showing: the total number of variances with a known proposed use, the total number of variances with 'housing' as the proposed use, and the percentage of variances that involved a proposed use of housing. Sort by the name of your categories. Briefly indicate whether the results suggest any pattern that might be worth looking into.

Question II-3extra (optional: If you have extra time, try using 'group by' commands as illustrated in latter sections of the "Grouping Zoning Applicants via Lookup Tables" notes to produce frequency distributions of a few other zoning violations -- broken down by the ownership categories you created above. See if you can find categories of land ownership for which the rate of zoning variance approval is significantly different.

PART 3: Examining Block-Level Land Value patterns in East Boston

In this part, we return to the East Boston example that we considered in Lab #2. You will need the shapefiles, MS-Access database, and ArcMap document that you used in Lab #2. The original read-only copy is in the 'eboston05' folder of the class data locker: http://mit.edu/11.521/data/eboston05. In addition, we have given you 'select' rights to two Oracle tables containing the East Boston assessing data (BOS05EB) and the attribute table of ebos_parcels05.shp (EBOS_PARCELS05). You can develop your queries either in MS-Access or in Oracle. We want to use the East Boston parcel and assessing data to develop land value measures. We might expect land value per square foot to be a good measure - that is, the assessed land value (FY2005_LAND) divided by LOTSIZE. However, a little experimentation suggests that this measure is too volatile and that there are too many problematic or special cases as a result of missing values, inconsistent evaluation of tax exempt property, differences by land use category, etc. In order to address these complications, we will first 'dissolve' the East Boston parcel map up to the block level and then compute land value per square foot at the block level for subsets of all the parcels (based on land use and zoning).

[*]Question III-1 (do for lab #3): In this step, we will dissolve the East Boston parcel map to prepare a block-outline map. Open the ArcMap document that you saved at the end of your Lab #2 exercise. The WPB field in the attribute table of the ebos_parcels05 shapefile contains the ward-precinct-block number for the city block containing each parcel. Use the 'dissolve' command in ArcToolbox to erase the parcel boundaries within each block so you end up with a new shapefile containing only the block outlines and having the correct ward-precinct-block (WPB) identifier associated with each block. You will find the 'dissolve' command under Data-Management-Tools/Generalization/Dissolve within ArcToolbox. Choose only the WPB attribute to carry over into your new shapefile. That way, all parcel boundary arcs that have the same WPB value on each side of arc will be eliminated and you will end up with block boundaries. Remember to 'clear selected features' before dissolving. Also, beware that the original East Boston parcel layer (ebos_parcel205.shp) is saved on disk with Mass State Plane NAD83-feet coordinates. However, the ArcMap document we developed in Lab #2 added the Mass Towns layer (matowns00.shp) first and that layer uses Mass State Plane NAD83-meters. To be consistent, you will want to create the block layer (ebos_blocks05.shp) using the NAD83-feet coordinate system. You can be sure of doing this by running the 'dissolve' when the original ebos_parcels05 is the only layer that you have added to a new, empty data frame.

[*]Question III-2 (do for lab #3): Later on, it will be useful to know the area of each block. You will need to add a new attribute field (call it BlockArea to represent area in acres) which you can calculate using a few lines of visual basic script. (Although the shapefile attribute table does not already include an area field, ArcMap automatically computes areas for all polygons when shapefiles are added to ArcMap. The VBA script pulls these values from memory. Recent versions of ArcMap provide a 'calculate geometry' option that provides a wizard for calculating area without the need for VBA script). Here are the steps you need to calculate the area of each block using VBA script. (While the 'calculate geometry' wizard makes it easier to calculate areas, the VBA scripting capabilities are a more general and powerful way to access many ArcGIS tools. Hence, we choose to illustrate the use of VBA scripts for our example):

Open the attribute table of ebos_parcels05, and select the Option > Add Field menu from the bottom of the attribute table window and add the following field to the table:

BlockArea (Type: Float with 'precision' of 12 digits and 'scale' of 3 decimal places)

Click on the "BlockArea" field with the right mouse button.

Select Calculate Values... (this is where the 'calculate geometry' choice will avoid the need to use a VBA script)
Check the Advanced box
Cut and paste the following VBA statement into the VBA Script text box
This script retrieves the area of each block from the in-memory pArea property of the shapefile and puts into into the variable named dblArea within your script. Now, set convert dblArea (which is in square feet) to acres so we have

BlockArea = dblArea/43559.46

Field Calculator with Advanced Button checked

Right-click the BlockArea field once again. This time, use ArcMap's 'statistics' choice to compute the number of blocks in East Boston and the mean and standard deviation of the acreage of these blocks.

[*]Question III-3 (do for lab #3): In this step, we will prepare our land value measures and examine them in Excel. At the start of Part 3, we suggested that computing land value per acre for each parcel results in a measure that is too volatile with too many special cases. Let's take a look at the numbers. Design a query (in MS-Access or Oracle) that computes land value per acre (LVAL_PER_ACRE) for all parcels with FY2005_LAND >0 and LOTSIZE > 0. How many parcels met this criteria? What is the mean and standard deviation of the land value per acre for these parcels?

Use the Data/Import-External-Data/New-Database-Query option in Excel to pull your query results into Excel. Include in your table these fields from ebos_parcels05: pid_long, WPB, fy2005_land, lotsize, and lval_per_acre. Prepare a scattergram of fy2005_land by lotsize.

Now modify your query to compute land value per acre only for triple-deckers (3-family homes). Pull this table into Excel and plot fy2005_land vs. lotsize. Do you see any pattern? What do you think is going on?

Question III-4: Now let's compute land value per acre at the block level. To keep this step short while accounting for land use and zoning differences, let us focus only on triple-decker housing. Write a query, q_block_r3value, to compute sum(fy2005_land) / sum(lotsize) for those parcels on each block where the land use is triple decker housing and both the numerator and denominator are > 0. Write another query, q_block_r3share, to determine the fraction of each block's area that has triple decker land use and count the number of triple-decker parcels with a computed land value per acre. Finally, write a query, q_r3smoothvalue, that will associate with each triple-decker parcel (having known land value per acre) the average land value per acre for its block as long as the parcel is in a block with at least 25% of the block parcels being triple-decker and the block contains at least 4 triple-decker parcels with known land value per acre. Be sure to do this only those parcels that are triple-deckers with known land value per acre. Include in your assignment your SQL statements and these descriptive statistics: the number of blocks meeting the 25% and 4+ constraints; and the mean and standard deviation of land value per acre for those blocks.

Question III-4: In this step, we want to create a thematic map of our land value per acre measure computed at the block level in the previous question for triple deckers. Start by adding your queries or views (from MS-Access or Oracle) to your ArcMap window, joining the table to ebos_parcel05 and creating the thematic map. Then, pull the building footprint layer and shaded parcel map into ArcScene. Extrude the building footprints to the estimated building heights and make the buildings partially transparent so that it is easier to see the land value shading. Also, turn off the outline of the parcel boundaries so they don't clutter up the map. Turn in a screen shot from ArcScene that is zoomed in more or less as in the previous lab. Explain briefly whether you see any clear spatial pattern to the land values of triple deckers in East Boston?

Notes regarding Oracle access to East Boston data:

Several adjustments were necessary to upload the East Boston data from MS-Access into Oracle in a suitable manner. You do not have to repeat any of these steps but they are recorded here since they are common problems in moving data across applications.

As we indicated in an earlier class, the original field names in the BOS05EB assessing table had three fields names that had to be changed because the first 10 characters were the same. Also, there were some hard-to-see blank spaces in several field names. The bos05eb_lab2.mdb database retained one of the field names with blanks, 'fy2005_ land". The Oracle tables have the same field names as the MS-Access tables except that I removed that space in 'fy2005_land'..
When exporting data from MS-Access to Oracle, I capitalized the table names since the Oracle default is that internal table names are capitalized. If I had not done this, you would get a 'table does not exist' error from: Oracle with the statement: SELECT * FROM bos05eb; but would get the rows from either of these two statements: SELECT * FROM "bos05eb"; or SELECT * FROM BOS05EB;
A straight export into Oracle produces problematic data types for most MS-Access fields whose data type is 'double' and for text strings without a specified length. Oracle ended up with the data type, varchar(255), which is a variable length string of up to 255 characters. This is a nuisance since it yields wide printout columns and requires string-to-numeric conversion for fields like lotsize. I had to use the design tool in MS-Access to change the data types for selected fields into long integers or decimals with specified precision and scale. You can run the command, 'describe bos05eb' in Oracle to see the data types that I chose.
I uploaded the two East Boston tables into my Oracle account (jf) and called them BOS05EB_LAB3 and EBOS_PARCELS05_LAB3. Then I ran the some SQL statements to grant all other authorized users permission to 'select' from the table (that is, use it as read-only). Finally, created a public synonym so others can see the two tables as BOS05EB and EBOS_PARCELS05. Here are the SQL statements:

grant select on "EBOS_PARCELS05_LAB3" to public;
grant select on "BOS05EB_LAB3" to public;
create public synonym bos05eb for jf.bos05eb_lab3;
create public synonym ebos_parcels05 for jf.ebos_parcels05_lab3;

What to Turn in

The five Questions that have [*] preceding the Question number are the parts that constitute Lab Exercise #3 - turn in your answers to these questions before the start of class on Thursday, February 25. The rest of the questions constitute Problem Set A and are due at the start of class on Thursday, March 4.

Upload your answers to the homework section of the 11.521 Stellar site at: https://stellar.mit.edu/S/course/11/sp10/11.521/
We need your answers (including SQL statements) in digital format.

For more information about this page, please contact the 11.521 Staff Mailing List <11.521staff@MIT.EDU>.
Original Lab Created: November 1998 [jf], modified: 20 Feb 2008 [thg, jinhua, jf]