Masachusetts Institute of Technology - Department of Urban Studies and Planning


11.521
Spatial Database Management and Advanced Geographic Information Systems
11.523
Fundamentals of Spatial Database Management

Problem Set B:

Categorizing Parcel Ownership via Lookup Tables


We are releasing Problem Set B on Thursday, February 22, 2018.

This is due by 5PM on Thursday, March 15, 2018 via Stellar.

Purpose:

In this exercise, we will use our newly acquired relational thinking skills to build lookup tables that allow us to re-categorize types of parcel ownership in East Boston.

This problem set uses the ideas from my book chapter on re-interpreting city data to build a lookup table for re-categorizing East Boston parcel ownership. After regrouping zoning variances, we examine some of the characteristics of properties owned by the different groups that we construct.

Needed Info:

Be sure to include your SQL queries when you answer each of the questions. That way, you can get partial credit for getting part way to the answer and I can cut-and-paste them in pgAdmin to test them and offer advice if parts are wrong.


PSet B: Correcting and Re-categorizing East Boston Property Ownership

Using Lookup Tables to Update and Categorize Read-Only Databases

In this problem set, you will adjust and re-categorize the East Boston parcel ownership based on the owner name and the type of land ownership of the parcels. Use the bos05eb assessing table (which is in the Postgres database and also in the MS-Access bos05eb_lab2.mdb database that we used for Lab #2). Note that the owner name is given in the field labeled 'OWNER_FY05' and various land use designations (as discussed in Lab #2) are available in 'PTYPE' and 'LU' fields.

Such re-interpretation of the original Boston data may be necessary for at least two reasons: (1) Owners of more than one parcel might have their names spelled in different ways. Such differences will complicate any effort to identify common ownership and influential landowners. For institutionally-owned parcels, this issue can be quite important. For example, Boston University owns land in Boston under more than a dozen different spellings! (2) When studying land use and ownership patterns in a neighborhood, we may wish to group ownership into categories other than those used by the city. In each case, we would like to re-interpret the city data without changing our official copy of the city dataset. Please also note the class reading that develops these ideas further: Ferreira, Joseph Jr., "Information Technologies that Change Relationships between Low-Income Communities and the Public and Non-profit Agencies that Serve Them,"  Chapter 7 in MIT Press Book,"High Technology and Low-Income Communities:  Prospects for the Positive Use of Advanced Information Technology," edited by Donald A. Schön, Bish Sanyal, and William J. Mitchell, MIT Press, 1997. An earlier version of this Chapter 7 paper is online at: http://mit.edu/11.521/papers/techcity_7ferreira.pdf

We have begun discussing good and bad ways of re-categorizing a dataset in class, and we have extensive online notes about the concepts and queries that are needed. For example, the book chapter has illustrations using Boston parcel data. In addition, here are some notes and queries that apply the ideas to re-group the zoning variances: "Grouping Zoning Applicants via Lookup Tables". Be sure to read the book chapter for the basic concepts and then be sure you understand the relational query approach using the lookup tables and 'group by' queries to create and use your customized categorizations of variance (without ever needing to change anything in the 'official' zoning table). Note also that there are slight differences in the queries for Boston parcels and for the zoning variances depending on how various columns can be used to distinguish owners and whether we also want to include land use categories in the re-grouping. Your assignment is similar to the Boston parcel case in the book chapter, but not quite the same since you will be grouping by owner-type plus land use type (and not just owner type).

Question 1:  Using the lookup table technique, develop a new table that categorizes properties and property owners of East Boston parcels into groups that are some combination of public/institutional/individual ownership and residential/commercial/other land use. Don't try to be all-inclusive but focus on 5 or 6 major categories (out of the 3x3=9 possible combinations - and the many ways of dividing the cases into these groupings). Try to pick groups that have a good number of cases in each of your primary re-grouped categories (and then lump all of the rest of the cases into an additional 'other or unknown' category). There are many possible groupings. Pick one with an eye toward having some mix of owner-type and use-type combinations. Provide an annotated list of all the SQL statements used to create your lookup table, initialize the values in the lookup table and implement one update 'rule' similar to this one (which is appropriate for the zoning variance case discussed in lecture, but needs to be changed for use with East Boston parcel data and your owner-type+use-type categorization task).

update apptype 
   set agroup = 'non-housing-corp'
 where (applicant is null or applicant like '%TRUST%') and existuse <> 1;

Question 2: Develop half a dozen additional update 'rules' based on owner name and/or land use code. Don't spend a lot of time building 'updates' to catch every last corporation, trust, or triple-decker vs. apartment land use. Once you have set up your lookup tables, settled on your categories, and made sure they properly handle your first update rule developed in the previous question, go on to Part 3 of this question. You can come back to this part later and tweak your categorization by adjusting and adding to your update rules, and then rerun them to get a new classification - that is one of the benefits of using this approach.

Question 3:  Write a SQL query that tags each East Boston parcel with a dummy variable indicating whether or not that parcel is within a block for which at least two-thirds of the parcels on the block are zoned for residential use. Summarize the results in a table that shows the number of blocks where at least two-thirds of the parcels are residential and the total number of parcels that reside within each two of block.

Question 4:  For each of your 5-6 ownership categories in II-1 above, prepare a summary table showing: the total number of parcels within each of your ownership categories, the total number of triple-decker residential parcels within each of your ownership categories, and the percentage of parcels, in each of your ownership categories, that are within East Boston blocks for which at least two-thirds of the parcels on the block are residential (that is, the parcel tags that you constructed in 3 above). Briefly indicate whether the results suggest any pattern that might be worth looking into. [NOTE: depending on your categorization, the percentage could be 0 or 100% for some of your cases. That is okay - just write queries that would compute the percentage correctly for any grouping strategy.]

Question 3extra (optional):  If you have extra time, try using 'group by' commands as illustrated in latter sections of the "Grouping Zoning Applicants via Lookup Tables" notes to produce frequency distributions of a few other parcel characteristics related to land use, owner occupancy, and the like -- broken down by the ownership categories you created above. See if you can find categories of land ownership for which the rates for these various factors are significantly different. Also, the same queries can be used for all 100k Boston parcels, not just the 7k in East Boston. Try running your queries for all the parcels in Boston using the shapefile, parcel1996.shp, in the class data locker. (This shapefile is the same one used for the book chapter examples.)

Observation: For simple queries, it's easier to use ArcGIS's SQL tools to regroup your parcel or variance records directly -- or to use MS-Access or even to query *.dbf versions of the ZONING database and lookup tables (rather than do the queries in Postgres and bring them over with all the ODBC connection issues). But, suppose you wanted to categorize all parcels in Boston, or metro Boston, and then allow other people to use the institutional/landuse categories of ownership that you identified. In these cases, you would like multi-user access to your SQL queries and it could take a while to develop and accumulate all the SELECT and UPDATE SQL statements that built the cross-reference table. Hence, you'd like to leave all those queries and results on a network database server that could protect private data while letting you reach it quickly from any Athena or PC 'client' when you had a chance to work on it and update the results.


OPTIONAL Question (just for fun...)

Use the groupings explained in the "Grouping Zoning Applicants via Lookup Tables" and/or your grouping of East Boston parcel ownership to check for differences in zoning variances or other parcel characteristics across groups. For example, for all those variances whose board decision is known, what is the approval rate (code 1 or 2) broken down by your institutional/landuse categories from Lab #3? What fraction are in high income neighborhoods and tracts (as defined earlier) by each institutional/landuse category? What about for the vacant-to-housing subset? Map the locations of the variances in the various categories. Do you see any patterns worth pursuing? Are they closer to major roads, downtown, more/less dense residential neighborhoods,...? Is it the ownership type or landuse type (e.g., single family, two-family, triple-deckers, or apartments) that account for different approval rates (to the extent that you find differences)?


 Home | Syllabus | Lectures | Labs | CRON | MIT

Created by Joe Ferreira; Modified 12th Feb 2018 by Rounaq Basu