Massachusetts Institute of Technology - Department of Urban Studies and Planning


11.521
Spatial Database Management and Advanced Geographic Information Systems
11.523
Fundamentals of Spatial Database Management

Lab 2: Land Use and Land Value Patterns in East Boston

Getting Comfortable with Relational Queries and
Map-Database-Spreadsheet Interaction

  13 February 2018


Due: 5PM Thursday, 22 February, 2018 (since we follow Monday's schedule next Tuesday)

PURPOSE

The primary purposes of this lab are (1) to practice MS-Access and PostgreSQL query construction, and (2) to begin examining land use and land value patterns in East Boston using a parcel map and property assessment data for Fiscal Year 2005 (which uses assessed values as of Jan. 1, 2004). To facilitate visualization of structures, we also use a building data layer with building footprints and building heights from a 2002 study based on aerial photos and LIDAR data (to get roof heights).

 

The lab helps us practice the relational thinking and data management skills needed to assemble and analyze typical urban basemaps and datasets using common desktop GIS and database management tools. Most of the typical complexities are hidden when we see land use maps in a GIS demo. For example, how can we handle multi-unit condos or mixed use properties? How can we maintain and share the data layers as property sales, redevelopment, and reassessment occur?

 

In this lab, we will use Mass municipal boundaries and Boston metro building footprints and roof heights from MassGIS; a parcel map showing property boundaries for East Boston parcels from Boston's assessing office; tabular data containing land use and ownership, property characteristics, and tax assessments from the Boston Department of Neighborhood Development; and land use classification codes and color schemes from the Mass Department of Revenue and the American Planning Association. We will use ArcMap, ArcScene, MS-Access and Excel to pull together the various data layers, to customize them to suit our interest, and to analyze and visualize some of the land use patterns in East Boston.

 

Since we assume some prior experience with ArcGIS and MS-Access, this lab does not provide step by step instructions about their user interface. For help with SELECT statement syntax, see the 'SQL Help' page within the SQL Notes section of the class website. For MS-Access help, see the online help that comes with MS-Access and the MS-Access notes by Lulu Xue and Shan Jiang: rec2_database_datafile.pdf. (The 'Administrative' section of last Thursday's lecture notes has further information about MS-Access help and tutorials.) We will use the next few lectures to clarify the concepts and techniques involved in this exercise (lab #2).


Before doing the Part 1 queries, skim through the entire lab #2 exercise to get a sense of what you are asked to do. You will need to understand the 'Tables' and 'Queries' parts of MS-Access. We will be using two MS-Access databases today: the 11.521_lab1.mdb database with the 'toy' parcel database that we introduced in Lab #1 and the bos05eb_lab2.mdb database that contains the assessing data for East Boston parcels. The key tables in both databases are also available in Postgres, so we can practice SQL queries in both environments.

 

Here is a link to the topic outline and notes for today's in-class lab presentation: lab2_inclass.txt

PART 1: SQL 'Warm-up' Exercises using the 'Toy' Parcel Database

Before starting to map East Boston data, we want to improve our skill with thinking relationally about data manipulation and SQL queries. Toward this end, we will use the seven small tables in the 'toy' parcel database that we introduced in Lab #1. We will develop queries using MS-Access and phpPgAdmin to answer several questions about parcel characteristics, ownership, tax payments, and fires. Since the toy parcel database is small, it will be easy for you to view the various tables to check whether the queries are returning the intended results.

 

The entity-relationship diagram for the toy parcel database is here and the MS-Access database containing the seven tables is called 11.521_lab1.mdb and is available in the class data locker: /afs/athena.mit.edu/course/11/11.521/data/11.521_lab1.mdb. You can access this locker via drive Z:\ on the lab computers.

 

As usual, copy the class dataset, 11.521_lab1.mdb, to a writeable folder on your local drive (e.g., C:\TEMP) before beginning your own queries. (Note: on WinAthena machines, drive I: and the Desktop are not local drives but reside on your network locker.) Two sample queries were included in 11.521_lab1.mdb, and the text of Lab #1 also listed several more queries written in standard SQL (structured query language). Additional SQL queries using the toy parcels database are given in the SQL Help pages. These SQL queries can be run in PgAdmin on phpPgAdmin when you are logged into the Postgres database engine that is available online for class use. Be sure you have examined the MS-Access tables and queries and run the lab #1 SQL queries in Postgres before you start this new lab #2 exercise. To use phpPgAdmin with Postgrest you can log in as 'sqlguest' using the password we gave you last week or you can log in using your own WinAthena userid and password. The 'sql' account can run all the queries needed for Part 1 of Lab #2 today, but you should later use your own account for us to be able to see who owns each table or view that is created.

 

[ Extra, optional, information: The SQL statements used to create the 'parcels' tables within Postgres and set various permissions and aliases are shown in the 'PARCELS Database Schema' link from the 'SQL Notes' section of the class Web pages. For this exercise, you will not need to understand the syntax of the create statements that create the original tables and insert the rows. We will cover these operations later in the semester. ]

Querying the 'Toy' Parcel Database

Construct queries to answer each of the following questions. We suggest that you run them both in MS-Access (using 11.521_lab1.mdb) and in Postgres (using phpPgAdmin or PgAdmin III). For simple queries, the GUI in MS-Access is quite helpful. You may want to start with MS-Access, then look at the SQL statement after you use the GUI to get each query working properly. Finally, edit the MS-Access SQL statement in a text editor until it works as intended when cut-and-pasted into the SQL command line on phpPgAdmin. The biggest differences in syntax between Postgres and MS-Access SELECT statements is that they use slightly different default methods for expressing joins between tables, and MS-Access requires the word 'AS' between an expression and the alias name that you assign to the expression. You may need to do some editing before some queries in Postgres run in MS-Access. Note that many of the 9 assigned queries require only a slight modification of earlier queries from Lab #1. Once you get used to the SQL syntax, you will find it handy to start with simple queries and then add clauses and qualifiers to exclude subsets, handle nulls, and the like. Hand in only the SQL statements that work in Postgres as well as a copy of the tables produced by the queries.

 
 
 
 
 
 

PART 2: Land Use and Land Value Patterns in East Boston

This part of the Lab examines land use and land value patterns in East Boston using a parcel map and assessed value data for Fiscal Year 2005. We focus on East Boston rather than metro Boston or just the City of Boston in order to use detailed parcel-level data without getting bogged down with slow running queries. There are 100+ thousand parcels in Boston but only 6,558 in East Boston. Once we debug our queries and analyses using East Boston, we could run them on the entire city. (As we shall see later on, using the Model Builder in ArcGIS helps us automate the processing steps so the test runs on a small sample of parcels can be reliably repeated for a much larger sample.)

For this part of the lab exercise, we will use shapefiles and additional tables saved within an MS-Access database. The data are availabke in the class data locker (Z:\course\11\11.521\data - which is the same location as S:\11.521\data on CRON machines) and described in the following table. Be sure to copy the Mass town boundary shapefile (all 7 files) plus the entire eboston05 folder to C:\TEMP before you begin using any of the files.

File
Description

Shapefile: .\matowns00.shp

Massachusetts municipal boundaries (year 2000 version from MassGIS) - a more recent version is available from MassGIS (http://www.mass.gov/mgis/); uses Mass State Plane Mainland, NAD83 meters
Shapefile: .\eboston05\ebos_buildings02.shp Building footprints and roof heights (in meters above ground) for East Boston buildings extracted from metro Boston footprint layer provided by MassGIS (http://www.mass.gov/mgis/) and available in the MIT Library Geodata Repository; uses Mass State Plane Mainland, NAD83 meters
Shapefile: .\eboston05\ebos_parcels05 East Boston parcel boundaries: pid_long is the 10-digit ID for each 'ground' parcel; uses Mass State Plane Mainland, NAD83 feet; contains 6,558 parcel polygons

MS-Access database:
.\eboston05\bos05eb_lab2.mdb

MS-Access database with 7,235 records of tabular assessing data for East Boston parcels and condominiums; also includes various lookup tables and sample queries. Note that there is more than one assessing record for each of the 6,558 parcel records.
Shapefile and APA symbol files:
.\eboston05\LBCS\
Sample shapefile (apatest.shp) and symbolization descriptions (*.avl files) for classifying and coloring land use maps using the Land-Based Classification Standard (LBCS) recommended by the American Planning Association (http://www.planning.org/lbcs).

ArcMap document:
.\eboston05\eboston05_lab2start.mxd

Saved ArcMap document with MA towns, East Boston parcels and buildings in one Data Frame (using Mass State Plane Mainland, NAD83 meters) and the APA's LBCS color scheme examples in the other Data Frame

In this part of the Lab Exercise, we want to create a land use map for East Boston that shades land use based on the APA's land-based classification. Within the c:\temp\eboston05 folder that you copied from the class locker to c:\temp, you will find the ArcMap document, eboston05lab2start.mxd, with which to start the exercise. At this point, then, you should be able to double-click on the c:\temp\eboston05 copy of eboston05_lab2start.mxd to open ArcMap and load the shapefiles needed for today's exercise. In last week's Lab #1, you changed your default workspace and scratch space to be c:\temp and your default geodatabase to be c:\temp\scratch.gdb. If the machine you are using does not currently have a file geodatabase called c:\temp\scratch.gdb you can use ArcCatalog to create it.  (This is just a performance issue.  If ArcMap cannot find the default geodatabase, it will revert to a default geodatabase saved on your network H:\ drive - which will slow down performance a bit.)  So you should have ArcMap set to use the local c:\temp location for both datasets and working files. You may want to check the ArcMap document properties to be sure that your default geodatabase is set to c:\temp\scratch.gdb. You might also want to check Geoprocessing/environments to see if your workspace and scratchspace are also set to be c:\temp.

Once you have started ArcMap and loaded the datasets for this exercise, a zoom-in of the East Boston datasets in ArcMap will look something like this (with the building outlines shaded in light brown to help visualize the spatial structure). NOTE: you can get here faster by double clicking on this ArcMap document in your local copy of the datasets for this exercise: c:\temp\eboston05\eboston05_lab2start.mxd

East Boston land use map

The attribute table for the East Boston parcel map (ebos_parcels05) layer has only a few geographic identifiers and none of the land use information from the assessing data. Before we can join the assessing data to the parcel map, however, we need to handle some of the one-to-many complications (due to condos). We will also want to match the East Boston land use codes to the categories and symbology recommended by the American Planning Association.

Preparing the Assessing Tables

Open your local copy of the East Boston MS-Access database, bos05eb_lab2.mdb. Note the descriptions of the various tables and queries already included in the database. Highlight the 'tables' object and double-click on bos05eb. This is the main table containing assessing information for East Boston land and properties. Highlight that table (by clicking once on the name) and then right-click on the table and choose the Design option in the menu to see the table schema and description of the attribute fields. (We have described the fields that will be useful for this lab.) We have also added some metadata to explain the tables. Right-click on a table and choose 'properties' to see the description of any particular table.

Creating a thematic map of East Boston land use is not hard - once we identify the appropriate information in the assessing tables and move it into ArcMap in a form that is ready for mapping. However, there are three difficulties that we have to overcome:

Difficulty #1: One-to-many issues in handling condos: Note that East Boston has 6,558 ground parcels (in the shapefile) but 7,235 records in the assessing table (for FY2005). A few of the parcel IDs in the assessing data will not match any of the parcels in the shapefile. However, the main reason for the different numbers is because condominiums generally have more than one separately owned housing or business unit located on a single 'ground' parcel.

In the case of Boston's assessing records, it gets even more complicated. The Boston assessing tables are not 'normalized' to insure that every record (that is every row) in the table has the same meaning as every other row for each attribute field (that is, column). In most cases, each row contains information about a single 'ground' parcel (and any property on the parcel). Every square foot of East Boston is contained in one or another of the non-overlapping ground parcels. The first column, PID, is usually the parcel ID for the ground parcel. However, in the case of condominiums, the condo units are taxed separately and generally have different property characteristics and different owners. To record all this information, the assessing table handles condos by including one row for each condo unit plus an additional row for the ground parcel (and the assessment information about the land and property held in common by the condo association). The row with ground parcel information contains the ground parcel ID for the condo in the PID column (just as for non-condo parcel records), but the rows describing individual condo units contain the condo unit ID in the PID column. This ID has the same first seven characters as the ground parcel ID and differs only in the last three digits (allowing for up to 1000 units in a condominium). You can tell which rows refer to condos by examining the second column, CM_ID (which stands for condominium-main ID). If the value in this column is null (that is, empty), then the row is not a condo, otherwise the value is the ground parcel ID for the condo. We will handle these condo complexities on the MS-Access database side and pull into ArcMap a simpler assessing table that is ready for mapping.

Difficulty #2: dBase format limitations: There is one more data definition complication that will hamper our efforts to link the assessing data to the parcel map. ArcGIS shapefiles use a dBase format for encoding tabular data (the data tables ending in *.dbf) and the dBase format can distinguish only the first 10 characters of field names. Actually, ArcMap uses a dBase format that *can* handle more than 10 characters in the name,but the ODBC (open data base connect) driver that we use to connect ArcMap to MS-Access uses internal tools that cannot handle dBase files with field names longer than 10 characters.

The East Boston assessing tables have three field names with the same first 10 characters. To overcome this difficulty, we have already shortened the three long field names to be: n_units_r, n_units_c, and n_units_rc instead of sec_num_units_r, etc. We will not even use these three fields in the exercise - but beware that we would have had trouble had we tried to copy the entire table into ArcMap. In fact, ArcMap would not complain about importing the table and it would look fine, but if we tried to join the table to the parcel table none of the rows would match properly!

Instead of trying to move the entire assessing table into ArcMap, we will construct a query to pull only the columns that we need. For large and complex table handling, this practice is often helpful - both for flexibility, ease of documentation, and to avoid problems such as the one just mentioned with importing data that we do not even need.

Difficulty #3: Data type mismatches: The East Boston assessing table uses the three-digit numeric state land use classification codes (called ptype in the table) that are required by the Mass Department of Revenue, DOR (http://www.mass.gov/dor/docs/dls/bla/classificationcodebook.pdf). We want to shade our land use map with the colors recommended by the APA land-based classification standards, LBCS (http://www.planning.org/lbcs). The APA standards are more complicated and distinguish land use instances that might apply to an activity, site, structure, function, or ownership category. To avoid making this lab exercise unduly long, we have already created a (simplistic) association between Mass DOR categories and the LBCS for land use activities. The 'lookup' table called MASS_LANDUSE in bos05eb_lab2.mdb lists all the DOR property type codes as well an approximate LBCS code.

The mass_landuse table also helps us with one other complexity. The East Boston assessing table encodes property type (ptype) as a 3-character alphanumeric field but the original DOR mass_landuse table encodes property type (stateclass) as a 3-digit integer. To address this difficulty, we have already run an 'update' query to populate the last column of mass_landuse (stclass_txt) with state class codes that have been converted to a 3-character text field. (The update query is also saved in bos05eb_lab2.mdb for your information but you do not need to rerun it.) Note that this issue is more than a simple data type conversion. Codes < 100 are expressed as two-digit numbers but, for example, the code = 22 should be sorted as '022' rather than '220' and you could have trouble matching codes if you are not careful about how to convert. This is a common problem when cross-referencing our '02139' zipcode.

Difficulty #4: Spaces buried in column names: The East Boston assessing table (bos05eb) is provided exactly as it was made available by the Assessing Office. The column labeled 'FY2005_ LAND' comes with a space ' ' in between the underscore and 'L'. Not only is it hard to see that there is a space in the name, but also the space is an invitation to future trouble. When you manipulate the table in MS-Access, you will see that the name is included within square brakcets [] so that MS-Access is not confused. But when you try to pull the table into ArcGIS or Postgres, the space will cause troubles if you do not take care to enclose the column name within quotes (either single quotes or double quotes depending upon the software package!). Be aware of this issue and, for example, rename the column on the fly to an acceptable name whenever you pull this column into a SQL query.

PREPARE A NEW MS-ACCESS QUERY that does the following:

We suggest that you build this query in four stages. First prepare a query (q_noncondo) that pulls from the assessing table the desired columns for every row that refers to a non-condo ground parcel. In the case of condos, we only want the rows where LU=CM. (Ignore those few records where the CM_ID field is not null but the landuse is not CM. These are special cases that we do not consider in this lab!) Let's use this q_noncondo query to make a new table called t_noncondo. We choose the Query/Make-Table option and make a new table instead of just saving the q_noncondo as a query (without the 'make-table' option) since the 'append' command in the next step requires that you start with an actual table and add rows to it. Next, prepare a query (q_condosum) that pulls the desired columns from the assessing table for each of the condos and uses the SUM function to add up (in the correct columns) all of the square footage and tax amounts for each particular condominium association in order to include the totals in a single record for each condo association. Third, combine the first two queries (using the APPEND type of MS-Access query) to expand t_noncondo so that it has one row for each mapable parcel in East Boston (i.e., one row for each condo plus all the original rows for the non-condos). Finally, join this appended table with 'mass_landuse' to lookup the LBCS classification code. Call this final query q_eb05new.

To speed up the lab exercise, we have already created the 'group by' query (called 'q_condosum) and the 'append' query (called q_ebunion) that combines q_noncondo and q_condosum. Note, in order to use the q_ebunion query to append the q_condosum rows to the t_noncondo table that you created, you must have created t_noncondo with the same columns in the same order as the table returned by the q_condosum query.

What to turn in: Include in your lab assignment, the text of your queries q_noncondo and q_eb05new.

Mapping the Assessing Data

ArcMap sometimes has trouble with data types when importing data directly from MS-Access queries (without first creating the table in MS-Access). To avoid these issues, you may wish to modify your query q_eb05new so that it saves a table (call it eb05new) inside MS-Access.

Open the ArcMap document called eboston05_lab2.mxd that is saved inside your local copy of the eboston05 folder. Add the table, eb05new, that you just created in MS-Access. Join the eb05new table to the mapable ebos_parcels05 layer.

Next, we need the color shading scheme that we want to use for the various land uses. The Data Frame labeled 'LBCS Color Scheme' shows the various colors that the APA recommends for land use maps. We don't need to manipulate anything in this Data Frame - they are just there so you can see the land use category names and major colors. (If you are interested, take a look at the APA specs (http://www.planning.org/lbcs) describing the standard. You are supposed to pick an intensity of color to differentiate within a category. For example, residential is yellow. You can shade a darker color for housing that is more dense and lighter yellow for single family homes on large lots.) For now, we have already included an (approximate) LBCS land use code in our eb05new table and we will simply load the symbol description layer that APA provides for shading the main land use categories. Make sure the East Boston Parcels Data Frame is active and double-click on the ebos_parcels05 layer. Click on the 'symbology' tab and set the 'fields' value to 'eb05new.lbcs'. Next, click the 'Import' label and click the radio button that gets the layer definitions from an ArcView 3 legend file (*.avl). Navigate to the LBCS sub-folder and click on 'activity.avl'. Import the complete symbol definition and be sure to pick eb05new.lbcs as the field containing the land use code to be mapped. You may also want to set the properties for all symbols (click the 'symbol' label at the top of the legend) so that the parcel outlines are a light gray. The thematic map will now shade the parcels in a manner similar to that shown in the map above in this exercise.

In subsequent labs, we will return to these East Boston datasets and examine land value and population density in more detail. Think about what you would have to do at this point to shade the parcel map in a way that usefully represented land value and density.

PART 3: 3D Visualization of Land Use and Buildings in East Boston

Before wrapping up this exercise, let us use ArcScene to visualize our land use map in 3D. Leave ArcMap open, and start ArcScene. Copy and paste each of the layers from the ArcMap 'East Boston Parcels' Data Frame into the ArcScene Scene Layer. You may have to edit the properties of the data layers after pasting them into ArcScene so that ArcScene can find the shapefiles.

Even after resetting the data sources, you will find that there is a problem with the parcel layer. That is because Boston saves its GIS files in Mass State Plane coordinates (just like the State) but uses the North American Datum (NAD) from 1983 that uses feet instead of meters. (There is a difference in the origin and not just the scale.) Since the first layer that you copied into ArcScene was the building footprints (in NAD83 meters), that is the coordinate system that ArcScene uses.

Go back to ArcMap, right click on the ebos_parcels05 layer and choose Data/Export and save all features using the coordinate system of the data frame into a local shapefile named ebos05_meters. Now, add this new shapefile (with all the joined attribute data) to ArcScene. You will see that the parcel layer is now visible and properly located within Boston!

Although ArcScene can do 3D rendering, we have not given any relief to our datasets and the map features are shown in 'flatland'. Edit the properties of the ebos_buildings02 layer, click the 'extrusion' tab and check the 'extrusions features in layer' option. Click the icon to the right of the 'extrusion value or expression' area and choose the ROOF field. This is the average height above ground of the roof of each building in the building layer (as estimated using the LIDAR data from 2002). Multiply this roof value by 3 to exaggerate the heights for easy viewing and click OK twice to get back to ArcScene. The image will take a moment to be rendered, then you will see each building footprint extruded to the height of the roof providing a quick-and-dirty visualization of the building mass along each street. Practice using the mouse to zoom in/out and rotate the image.

Finally, we would like to shade the parcels based on land use. Edit the properties of the ebos05_meters layer and import the same activity.avl definitions that we used earlier. Use these colors to shade the parcels based on their (approximate) LBCS code.

What to turn in: We want you to turn in some evidence that you have gotten this far and are able to shade parcels based on land use with the building footprints extruded to the building height to give a sense of the built form of East Boston. Since we want you to turn in the exercise in electronic form, the easiest way is to use the desktop tools to do a screen capture and then use the windows 'Paint' software to save the image as a JPG for inclusion in your assignment.

PART 4: Examining Parcel Land Values in Excel (optional)

Now that we have prepared tables in MS-Access that summarize assessing information at the ground parcel level, we might want to move the table into Excel for some exploratory statistical analysis. Since this lab exercise is getting long, this part is optional! Try pulling eb05new into Excel and comparing the land value per square foot (of lot size) and the building value (using gross area or living area). You may want to further subset the data (in MS-Access or Excel) to focus only on residential properties or only on triple deckers on, say, lots under 12000 square feet (one quarter acre). Do you find any interesting relationship? (...either in Excel or when you map the indicators.)

Later in the semester, we will explore land and building value patterns in more detail. This is enough for now to give us a sense of the issues, difficulties, and work-arounds associated with mixing and matching typical parcel-scale datasets in order to explore land use and land value patterns within urban areas.

WHAT TO TURN IN

You can use any text editor you wish to cut and paste your lab exercise answers (text, table, and images) into a single document that you turn in to me as an email attachment.

For Part 1 - SQL exercises: Turn in the SQL queries (using the standard SQL that runs in Postgres) and the tabular output to the 9 short questions using the toy parcel database.

For Part 2 - East Boston data preparation: Turn in the text of your queries q_noncondo and q_eb05new

For Part 3 - Mapping East Boston Parcels and Building: Turn in the screen shot of ArcScene with the LBCS shading of parcels and the building footprints extruded to the roof heights. Orient the view so that it is similar to the 2D view shown above with parts of the airport (lower right) and residential neighborhoods (center) are visible and zoomed in enough to distinguish buildings and see the parcel shading.

For Part 4 - Excel Exploration and Graphics (OPTIONAL): Screen print of eb05new in Excel and data subsets in MS-Access

While not required, we encourage you to format your SQL statements on multiple lines to improve their readability. Use the examples we provide as a guide. Be careful to sort your results as stated in the questions. Upload your lab exercise results to the homework location on the Stellar site for the class: . We only need your answers in digital format (...save the trees!). Since the is no class next Tuesday (which follows a Monday schedule due to the Feb. 19 holiday), this problem set is not due until the start of class on Thursday, 22 February 2018.

 


 Home | Syllabus | Lectures | Labs | CRON | MIT


 SQL exercises: [2002-2008, thg, jinhua & jf]; East Boston exercise 2008 [jf]

Last modified: 11 February 2018 [rb]