Massachusetts Institute of Technology - Department of Urban Studies and Planning


11.521
Spatial Database Management and Advanced Geographic Information Systems
11.523
Fundamentals of Spatial Database Management

Lab 2: Land Use and Land Value Patterns in East Boston

Getting Comfortable with Relational Queries and
Map-Database-Spreadsheet Interaction

9 February 2010


Due: Thursday, 18 February 2010 (because there is no lab on Tuesday, Feb. 16, which follows a Monday schedule)

PURPOSE

The primary focus of this lab is to examine land use and land value patterns in East Boston using a parcel map and property assessment data for Fiscal Year 2005 (which uses assessed values as of Jan. 1, 2004). To facilitate visualization of structures, we also use a building data layer with building footprints and building heights from a 2002 study based on aerial photos and LIDAR data (to get roof heights).

 

The lab helps us practice the relational thinking and data management skills needed to assemble and analyze typical urban basemaps and datasets using common desktop GIS and database management tools. Most of the typical complexities are hidden when we see land use maps in a GIS demo. For example, how can we handle multi-unit condos or mixed use properties? How can we maintain and share the data layers as property sales, redevelopment, and reassessment occur?

 

In this lab, we will use Mass municipal boundaries and Boston metro building footprints and roof heights from MassGIS; a parcel map showing property boundaries for East Boston parcels from Boston's assessing office; tabular data containing land use and ownership, property characteristics, and tax assessments from the Boston Department of Neighborhood Development; and land use classification codes and color schemes from the Mass Department of Revenue and the American Planning Association. We will use ArcMap, ArcScene, MS-Access and Excel to pull together the various data layers, to customize them to suit our interest, and to analyze and visualize some of the land use patterns in East Boston.

 

Since we assume some prior experience with ArcGIS and MS-Access, this lab does not provide step by step instructions about their user interface. For help with SELECT statement syntax, see the 'SQL Help' page within the SQL Notes section of the class website. For MS-Access help, see the online help that comes with MS-Access and Lab #5 from the Fall 11.520 class (http://mit.edu/11.520/www/labs/lab5/lab5.html). Since we have no lab next week, we will have two lectures before the next lab and we will use those lectures to clarify the concepts and techniques involved in this exercise (lab #2) and the next one (#3). Meanwhile, if you come to the lab before the 4:10 to 5 PM lab exercise presentation, read through the entire lab #2 exercise and see what you can do on your own. You will need to understand the 'Tables' and 'Queries' parts of MS-Access and, to help you get started, we have provided a few sample queries within the MS-Access database used in the lab exercise. In addition, MIT Information Services and Technology (IS&T) supports web-based computer training for the MIT community (via Element-K) that includes MS-Access tutorials. For further information about registering for and using the tutorials, see: http://web.mit.edu/ist/topics/training/wbt.html

 

Here is a link to the topic outline and notes for today's in-class lab presentation: lab2_inclass.txt

 

PART 1: SQL 'Warm-up' Exercises using the 'Toy' Parcel Database

Before starting to map East Boston data, we want to improve our skill with thinking relationally about data manipulation and SQL queries. Toward this end, we will use the seven small tables in the 'toy' parcel database that we introduced in Lab #1. We will develop queries using MS-Access and SQL*Plus to answer several questions about parcel characteristics, ownership, tax payments, and fires. Since the toy parcel database is small, it will be easy for you to view the various tables to check whether the queries are returning the intended results.

 

The entity-relationship diagram for the toy parcel database is here and the MS-Access database containing the seven tables is is called 11.521_lab1.mdb and is available in the class data locker: /afs/athena.mit.edu/course/11/11.521/data/11.521_lab1.mdb. You can access this locker via drive Z:\ on the lab computers.

 

As usual, copy the class dataset, 11.521_lab1.mdb, to a writeable folder on your local drive (C:\USERTEMP) before beginning your own queries. (Note: on WinAthena machines, drive I: and the Desktop are not local drives but reside on your network locker.) Two sample queries were included in 11.521_lab1.mdb, and the text of Lab #1 also listed several more queries written in standard SQL (structured query language). Additional SQL queries using the toy parcels database are given in the SQL Help pages. These SQL queries can be run in SQL*Plus when you are logged into the Oracle database engine that is available online for class use. Be sure you have examined the MS-Access tables and queries and run the lab #1 SQL queries in Oracle before you start this new lab #2 exercise. To use SQL*Plus with Oracle you can log in as 'parcels' using the password we gave you last week or you can log in using your own WinAthena userid and password. The 'parcels' account can see fewer Oracle tables and has more limited capabilities but can run all the queries needed for Part 1 of Lab #2 today.

 

For your information, the SQL statements used to create the 'parcels' tables within Oracle and set various permissions and aliases are shown in the 'PARCELS Database Schema' link from the 'SQL Notes' section of the class Web pages. For this exercise, you will not need to understand the syntax of these SQL statements to create the original tables and insert the rows.

Querying the 'Toy' Parcel Database

Construct queries to answer each of the following questions. We suggest that you run them both in MS-Access (using 11.521_lab1.mdb) and in Oracle (using SQL*Plus). For simple queries, the GUI in MS-Access is quite helpful. You may want to start with MS-Access, then look at the SQL statement after you use the GUI to get each query working properly. Finally, edit the MS-Access SQL statement in a text editor until it works as intended when cut-and-pasted into SQL*Plus. The biggest differences in syntax between Oracle and MS-Access SELECT statements is that they use slightly different default methods for expressing joins between tables, and MS-Access requires the word 'AS' between an expression and the alias name that you assign to the expression.. You will need to do some editing before all the Oracle queries in the help notes will run in MS-Access. Note that many of the 10 assigned queries require only a slight modification of earlier queries. Once you get used to the SQL syntax, you will find it handy to start with simple queries and then add clauses and qualifiers to exclude subsets, handle nulls, and the like. Hand in only the SQL statements that work in Oracle as well as a copy of the tables produced by the queries.

 
 
 
 
 
 
 
 
SELECT COUNT(*), COUNT(tax), MIN(tax), MAX(tax), AVG(landval)
  FROM tax;
SELECT COUNT(*), COUNT(tax), MIN(tax), MAX(tax), AVG(landval)
  FROM tax
 WHERE tax IS NOT NULL;
  

PART 2: Land Use and Land Value Patterns in East Boston

This part of the Lab examines land use and land value patterns in East Boston using a parcel map and assessed value data for Fiscal Year 2005. We focus on East Boston rather than metro Boston or just the City of Boston in order to use detailed parcel-level data without getting bogged down with slow running queries. There are 100+ thousand parcels in Boston but only 6,558 in East Boston. Once we debug our queries and analyses using East Boston, we could run them on the entire city. (As we shall see later on, using the Model Builder in ArcGIS helps us automate the processing steps so the test runs on a small sample of parcels can be reliably repeated for a much larger sample.)

For this part of the lab exercise, we will use shapefiles and additional tables saved within an MS-Access database. The data are availabe in the class data locker (Z:\athena.mit.edu\course\11\11.521\data) and described in the following table. Be sure to copy the Mass town boundary shapefile (all 7 files) and the entire eboston05 folder to C:\USERTEMP before you begin using any of the files.

File
Description

Shapefile: .\matowns00.shp

Massachusetts municipal boundaries (year 2000 version from MassGIS) - a more recent version is available from MassGIS (http://www.mass.gov/mgis/towns.htm); uses Mass State Plane Mainland, NAD83 meters
Shapefile: .\eboston05\ebos_buildings02.shp Building footprints and roof heights (in meters above ground) for East Boston buildings extracted from metro Boston footprint layer provided by MassGIS (http://www.mass.gov/mgis/lidarbuildingfp2d.htm) and available in the MIT Library Geodata Repository; uses Mass State Plane Mainland, NAD83 meters
Shapefile: .\eboston05\ebos_parcels05 East Boston parcel boundaries: pid_long is the 10-digit ID for each 'ground' parcel; uses Mass State Plane Mainland, NAD83 feet; contains 6,558 parcel polygons

MS-Access database:
.\eboston05\bos05eb_lab2.mdb

MS-Access database with 7,235 records of tabular assessing data for East Boston parcels and condominiums; also includes various lookup tables and sample queries.
Shapefile and APA symbol files:
.\eboston05\LBCS\
Sample shapefile (apatest.shp) and symbolization descriptions (*.avl files) for classifying and coloring land use maps using the Land-Based Classification Standard (LBCS) recommended by the American Planning Association (http://www.planning.org/lbcs).

ArcMap document:
.\eboston05\eboston05_lab2.mxd

Saved ArcMap document with MA towns, East Boston parcels and buildings in one Data Frame (using Mass State Plane Mainland, NAD83 meters) and the APA's LBCS color scheme examples in the other Data Frame

In this part of the Lab Exercise, we want to create a land use map for East Boston that shades land use based on the APA's land-based classification. A zoom-in of the results in ArcMap will look something like this (with the building outlines shaded in brown to help visualize the spatial structure):

East Boston land use map

However, the original parcel map layer has none of the land use information from the assessing data. Before we can join the assessing data to the parcel map, we need to handle some of the one-to-many complications (due to condos) and match the East Boston land use codes to the categories and symbology recommended by the American Planning association.

Preparing the Assessing Tables

Open your local copy of the East Boston MS-Access database, bos05eb_lab2.mdb. Note the descriptions of the various tables and queries already included in the database. Highlight the 'tables' object and double-click on bos05eb. This is the main table containing assessing information for East Boston land and properties. Highlight that table (by clicking once on the name) and then right-click on the table and choose the Design option in the menu to see the table schema and description of the attribute fields. (We have described the fields that will be useful for this lab.)

Creating a thematic map of East Boston land use is not hard - once we identify the appropriate information in the assessing tables and move it into ArcMap in a form that is ready for mapping. Here are three difficulties that we have to overcome:

Difficulty #1: Note that East Boston has 6,558 parcels but 7,235 records in the assessing table (for FY2005). The difference results because condominiums generally have more than one separately owned housing or business unit located on a single 'ground' parcel. In the case of Boston's assessing records, it gets even more complicated. The Boston assessing tables are not 'normalized' to insure that every record (that is every row) in the table has the same meaning as every other row for each attribute field (that is, column). In most cases, each row contains information about a single 'ground' parcel (and any property on the parcel). Every square foot of East Boston is contained in one or another of the non-overlapping ground parcels. The first column, PID, is usually the parcel ID for the ground parcel. However, in the case of condominiuims, the condo units are taxed separately and generally have different property characteristics and different owners. To record all this information, the assessing table handles condos by including one row for each condo unit plus an additional row for the ground parcel (and the assessment information about the land and property held in common by the condo association). The row with grond parcel information contains the ground parcel ID for the condo in the PID column (just as for non-condo parcel records). However, the rows describing individual condo units contain the condo unit ID in the PID column. This ID has the same first seven characters as the ground parcel ID and differs only in the last three digits (allowing for up to 1000 units in a condominium). You can tell which rows refer to condos by examining the second column, CM_ID (which stands for condominium-main ID). If the value in this column is null (that is, empty), then the row is not a condo, otherwise the value is ground parcel ID for the condo. We will handle these condo complexities on the MS-Access database side and pull into ArcMap a simpler assessing table that is ready for mapping.

Difficulty #2: There is one more data definition complication that will hamper our efforts to link the assessing data to the parcel map. ArcGIS shapefiles use a dBase format for encoding tabular data (the data tables ending in *.dbf) and the dBase format can distinguish only the first 10 characters of field names. (Actually, ArcMap uses a dBase format that *can* handle more than 10 characters in the name. However, the ODBC (open data base connect) driver that we use to connect ArcMap to MS-Access uses internal tools that cannot handle dBase files with field names longer than 10 characters. The East Boston assessing tables have three field names with the same first 10 characters. To overcome this difficulty, we have already shortened the three long field names to be: n_units_r, n_units_c, and n_units_rc instead of sec_num_units_r, etc. In fact, we will not even use these three fields in the exercise - but beware that we would have had trouble had we tried to copy the entire table into ArcMap. In fact, ArcMap would not complain about importing the table and it would look fine. However, if we tried to join the table to the parcel table, ArcMap would not complain but none of the rows would match!

Instead of trying to move the entire assessing table into ArcMap, we will construct a query to pull only the columns that we need. For large and complex table handling, this practice is often helpful - both for flexibility, ease of documentation, and to avoid problems such as the one just mentioned with importing data that we do not even need.

Difficulty #3: The East Boston assessing table uses the three-digit numeric state land use classification codes (called ptype in the table) that are required by the Mass Department of Revenue, DOR (http://www.mass.gov/Ador/docs/dls/bla/classificationcodebook.pdf ). We want to shade our land use map with the colors recommended by the APA land-based classification standards, LBCS (http://www.planning.org/lbcs). The APA standards are more complicated and distinguish land use instances that might apply to an activity, site, structure, function, or ownership category. To avoid making this lab exercise unduly long, we have already created a (simplistic) association between Mass DOR categories and the LBCS for land use activities. The 'lookup' table called MASS_LANDUSE in bos05eb_lab2.mdb lists all the DOR property type codes as well an approximate LBCS code. The mass_landuse table also helps us with one other complexity. The East Boston assessing table encodes property type (ptype) as a 3-character alphanumeric field but the original DOR mass_landuse table encodes property type (stateclass) as a 3-digit integer. We have already run an 'update' query to populate the last column of mass_landuse (stclass_txt) with state class codes that have been converted to a 3-character text field. (The update query is also saved in bos05eb_lab2.mdb for your information but you do not need to rerun it.)

PREPARE A NEW MS-ACCESS QUERY that does the following:

We suggest that you build this query in four stages. First prepare a query (q_noncondo) that pulls from the assessing table the desired columns for every row that refers to a non-condo ground parcel. In the case of condos, we only want the rows where LU=CM. (Ignore those few records where the CM_ID field is not null but the landuse is not CM. These are special cases that we do not consider in this lab!) Let's use this q_noncondo query to make a new table called t_noncondo. We choose the Query/Make-Table option and make a new table instead of just saving the q_noncondo as a query (without the 'make-table' option) since the 'append' command in the next step requires that you start with an actual table and add rows to it. Next, prepare a query (q_condosum) that pulls the desired columns from the assessing table for each of the condos and uses the SUM function to add up (in the correct columns) all of the square footage and tax amounts for each particular condominium association in order to include the totals in a single record for each condo association.. Third, combine the first two queries (using the APPEND type of MS-Access query) to expand t_noncondo so that it has one row for each mapable parcel in East Boston (i.e., one row for each condo plus all the original rows for the non-condos). Finally, join this appended table with 'mass_landuse' to lookup the LBCS classification code. Call this final query q_eb05new.

To speed up the lab exercise, we have already created the 'group by' query (called 'q_condosum) and the 'append' query (called q_ebunion) that combines q_noncondo and q_condosum. Note, in order to use the q_ebunion query to append the q_condosum rows to the t_noncondo table that you created, you must have created t_noncondo with the same columns in the same order as the table returned by the q_condosum query.

What to turn in: Include in your lab assignment, the text of your queries q_noncondo and q_eb05new.

Mapping the Assessing Data

ArcMap sometimes has trouble with data types when importing data directly from MS-Access queries (without first creating the table in MS-Access). To avoid these issues, you may wish to modify your query q_eb05new so that it saves a table (call it eb05new) inside MS-Access.

Open the ArcMap document called eboston05_lab2.mxd that is saved inside your local copy of the eboston05 folder. Add the table, eb05new, that you just created in MS-Access. Join the eb05new table to the mapable ebos_parcels05 layer.

Next, we need the color shading scheme that we want to use for the various land uses. The Data Frame labeled 'LBCS Color Scheme' shows the various colors that the APA recommends for land use maps. We don't need to manipulate anything in this Data Frame - they are just there so you can see the land use category names and major colors. (If you are interested, take a look at the APA specs (http://www.planning.org/lbcs) describing the standard. You are supposed to pick an intensity of color to differentiate within a category. For example, residential is yellow. You can shade a darker color for housing that is more dense and lighter yellow for single family homes on large lots.) For now, we have already included an (approximate) LBCS land use code in our eb05new table and we will simply load the symbol description layer that APA provides for shading the main land use categories. Make sure the East Boston Parcels Data Frame is active and double-click on the ebos_parcels05 layer. Click on the 'symbology' tab and set the 'fields' value to 'eb05new.lbcs'. Next, click the 'Import' label and click the radio button that gets the layer definitions from an ArcView 3 legend file (*.avl). Navigate to the LBCSAVL sub-folder and click an 'activity.avl'. Import the complete symbol definition and be sure to pick eb05new.lbcs as the field containing the land use code to be mapped. You may also want to set the properties for all symbols (click the 'symbol' label at the top of the legend) so that the parcel outlines are a light gray. The thematic map will now shade the parcels in a manner similar to that shown in the map above in this exercise.

In subsequent labs, we will return to these East Boston datasets and examine land value and population density in more detail. Think about what you would have to do at this point to shade the parcel map in a way that usefully represented land value and density.

PART 3: 3D Visualization of Land Use and Buildings in East Boston

Before wrapping up this exercise, let us use ArcScene to visualize our land use map in 3D. Leave ArcMap open, and start ArcScene. Copy and paste each of the layers from the ArcMap 'East Boston Parcels' Data Frame into the ArcScene Scene Layer. You may have to edit the properties of the data layers after pasting them into ArcScene so the data source can find the shapefiles.

Even after resetting the data sources, you will find that there is a problem with the parcel layer. That is because Boston saves its GIS files in Mass State Plane coordinates (just like the State) but uses the North American Datum (NAD) from 1983 that uses feet instead of meters. (There is a difference in the origin and not just the scale.) Since the first layer that you copied into ArcScene was the building footprints (in NAD83 meters), that is the coordinate system that ArcScene uses.

Go back to ArcMap, right click on the ebos_parcels05 layer and choose Data/Export and save all features using the coordinate system of the data frame into a local shapefile named ebos05_meters. Now, add this new shapefile (with all the joined attribute data) to ArcScene. You will see that the parcel layer is now visible and properly located within Boston!

Although ArcScene can do 3D rendering, we have not given any relief to our datasets and the map features are shown in 'flatland'. Edit the properties of the ebos_buildings02 layer, click the 'extrusion' tab and check the 'extrusions features in layer' option. Click the icon to the right of the 'extrusion value or expression' area and choose the ROOF field. This is the average height above ground of the roof of each building in the building layer (as estimated using the LIDAR data from 2002). Multiply this roof value by 3 to exaggerate the heights for easy viewing and click OK twice to get back to ArcScene. The image will take a moment to be rendered. Practice using the mouse to zoom in/out and rotate the image.

Finally, we would like to shade the parcels based on land use. Edit the properties of the ebos05_meters layer and import the same activity.avl definitions that we used earlier. Use these colors to shade the parcels based on their (approximate) LBCS code.

What to turn in: We want you to turn in some evidence that you have gotten this far and are able to shade parcels based on land use with the building footprints extruded to the building height to give a sense of the built form of East Boston. Since we want you to turn in the exercise in electronic form, the easiest way is to use the desktop tools to do a screen capture and then use the windows 'Paint' software to save the image as a JPG for inclusion in your assignment.

PART 4: Examining Parcel Land Values in Excel (optional)

Now that we have prepared tables in MS-Access that summarize assesing information at the ground parcel level, we might want to move the table into Excel for some exploratory statistical analysis. Since this lab exercise is getting long, this part is optional! Try pulling eb05new into Excel and comparing the land value per square foot (of lotsize) and the building value (using gross area or living area). You may want to further subset the data (in MS-Access or Excel) to focus only on residential properties or only on triple deckers on, say, lots under 12000 square feet (one quarter acre). Do you find any interesting relationship? (...either in Excel or when you map the indicators.)

Later in the semester, we will explore land and building value patterns in more detail. This is enough for now to give us a sense of the issues, difficulties, and work-arounds associated with mixing and matching typical parcel-scale datasets in order to explore land use and land value patterns within urban areas.

PART 5: WHAT TO TURN IN

You can use any text editor you wish to cut and paste your lab exercise answers (text, table, and images) into a single document that you turn in to me as an email attachment.

For Part 1 - SQL exercises: Turn in the SQL queries (using the standard SQL that runs in Oracle) and the tabular output to the 10 short questions using the toy parcel database.

For Part 2 - East Boston data preparation: Turn in the text of your queries q_noncondo and q_eb05new

For Part 3 - Mapping East Boston Parcels and Building: Turn in the screen shot of ArcScene with the LBCS shading of parcels and the building footprints extruded to the roof heights. Orient the view so that it is similar to the 2D view shown above with parts of the airport (lower right) and residential neighborhoods (center) are visible and zoomed in enough to distinguish buildings and see the parcel shading.

For Part 4 - Excel Exploration and Graphics: Nothing to turn in (Optional)

While not required, we encourage you to format your SQL statements on multiple lines to improve their readability. Use the examples we provide as a guide. Be careful to sort your results as stated in the questions.

We should have the Stellar account for the class set up by the Lab #2 due date so that you can upload your answers. We only need your answers in digital format (...save the trees!). The problem set is due at the start of class on Thursday, 18 February 2009.

 


 Home | Syllabus | Lectures | Labs | CRON | MIT


 SQL exercises: [2002-2008, thg, jinhua & jf]; East Boston exercise 2008 [jf]

Llast modified: 9 February 20109 [jf]