Massachusetts Institute of Technology
Department of Urban Studies and Planning


11.188: Urban Planning and Social Science Laboratory

11.205: Intro to Spatial Analysis

Lab Exercise 3:
Working with American Community Survey Data

March 9, 2021 Lab Exercise (Due March 15)


Overview - Mapping the Percent of Workers in Metro Boston Who Drive to Work Alone

In this exercise you will use the American Community Survey (ACS) to create a thematic map indicating the percentage of workers in Eastern Mass block groups who drive alone to their jobs. The data about mode choice for workers comes from the 2009-2013 ACS data and the boundary files for MA census block groups comes from the 2010 TIGERline files. Our final map will look similar to other thematic maps that we have generated, but it takes much more effort and understanding when the data sets are large and not already integrated with the map files. Planners are generally asking new questions that require finding and integrating new data in a manner appropriate for the questions at hand. In this case, it takes some effort and understanding to identify and locate the relevant ACS variables, construct the appropriate indicator for each 2010 block group, and then join the new indicator to the block group map. This lab also covers new ground because you will be working with a different Census Bureau data product, the ACS, as opposed to the 2000 Decennial Census data that we used for the example developed in lecture. In 2000, the Census Bureau began phasing in the ACS to replace a large portion of the decennial Census. In 2010, the Census Bureau completed this task. The ACS is a bit different than the decennial census and it is well worth reading both the lecture notes about census data and the first 12 pages of A Compass for Understanding and Using the American Community Survey: What General Data Users Need to Know.

In order to develop this map, you will have to create a table that contains the percentage of workers who drive alone to work (by car, truck, or van) for each block group in Boston Metropolitan Area. But the raw ACS data provide counts of the number of workers within each block group that use each transportation mode to get to work. Unlike the simple example (involving median earnings) shown in class, computing this percentage will require normalizing the ACS counts. (Here's a link to some additional discussion about when and how to >normalize> the data.).

The in-lab discussion notes are here: Lab #3 notes.


Part I: Data and Resources for this Lab Exercise

Before beginning the QGIS work for this exercise, be sure to download the entire ./data/lab3_data folder from the class 'data' folder into C:\temp (or your flash drive or portable HD) and open your local copy of 11188_lab3.qgz that you find within that folder.  [Alternatively, you may download the one zipped file, ./data/lab3_data.zip, into C:\temp and then unzip it into that directory to uncompress all the files that we need.]

For this exercise, we will use a shapefile for the geometry for 2010 Massachusetts Block Groups and the 2009-2013 ACS data. We will also use one of the shapefiles that provide boundaries for 2010 Massachusetts counties. The ACS uses census boundaries (e.g. block groups, tracts, etc.) of the most recent Decennial Census upon the data's release. Since we will be using 2009-2013 ACS data, we want to use 2010 block groups. For your convenience, we downloaded these files, and others that you will need for the lab, and placed them into a sub-folder of the class locker, ./data/lab3_data/. We also included a QGIS project, 11188_lab3.qgz, in that folder that loads the county shapefile (MA_County_2010.shp) and the block group shapefile (MA_BG_2010.shp), with relative path addresses.


Part II: Set the Coordinate System

If you started QGIS by opening 11188_lab3.qgz, then your QGIS Data Frame includes the MA County shapefile layer and the Data Frame will display the data using MA State Plane coordinates. However, if you first opened a different map document, the Data Frame may display features using a different coordinate system and any new shapefiles that you add may be displayed in another coordinate systems - perhaps geographic coordinates (lat/long) instead of the desired projection (using Mass State Plane coordinates). If necessary, change the viewing projection in the project to "NAD83 / Massachusetts Mainland" (EPSG: 26986) using the project CRS button displaying the EPSG in the bottom right corner of your QGIS project window.

Add into QGIS (if you have not already done so) the MA_County_2010.shp shapefile of MA counties and the MA_BG_2010.shp shapefile of MA block groups.

Before we can compute the percentage of workers who drive alone to work, we need to find the relevant ACS data, add it into QGIS, and join it (using the summary level-state-county-tract-block-group ID) to the block group data layer. Before doing this, we will focus on eastern MA and export our own shapefile of the eastern MA block group boundaries.

We can change the names of our layers so that they are easier to read and use. For example, change the Massachusetts county layer to "MA County Layer" and the Massachusetts block group layer to "MA Block Group Layer" using the "Rename Layer" option that appears when you right-click on the layer.


Part III: Select Block Groups in the Boston Metropolitan Area and Export as your own Shapefile

STEP 1:

In order to work with a smaller file, we want to create a new shapefile that represents the Block Groups that fall within major Boston area counties (i.e., Barnstable, Bristol, Essex, Middlesex, Norfolk, Plymouth, Suffolk, and Worcester). We can select these block groups using the "Select By Expression" tool in QGIS and writing out a simple query, as shown below.

[Side Note]

The "original" Census block group shapefile has a 3-digit county code, but does not provide county names. However, the "original" Census county shapefile has a 5-digit FIPS ID (2-digit state ID followed by 3-digit county ID) along with the county name. Although you don't have to make these changes for this lab (as we are providing you with "nice" easy-to-use shapefiles), here's what we did along the way:


This exported file is what we provided as "MA_BG_2010.shp", which is different from the "original" file you will get when you download it from the Census website.


 

STEP 2:

Now, we need to export only the selected features as a separate layer. We will:


This opens up a new window "Save Vector Layer as", where we can specify the name and location of the layer we want to export along with the CRS.



After exporting and adding this layer of block groups in Eastern MA to the map, you may choose to remove (or at least deactivate by unchecking) the other layers on the map - all MA counties and all MA block groups.


Part IV: Working with ACS Data

These instructions explain how to identify a particular ACS variable of interest, find the relevant data table, and join the tabular ACS data to a geographic boundary file. In the following steps, we will first identify which ACS table and fields we are interested in. Second, we will import the table into QGIS to do some basic data processing. Lastly, we will join the tabular data to a geographic boundary file.

STEP 1: Determining the desired ACS variable.

As explained in lecture, the ACS data includes thousands of variable breakouts. Finding the desired variable, and then locating the file that contains that variable for a particular state, can be a time consuming task. The main US Census website for ACS summary file documentation is here: https://www.census.gov/programs-surveys/acs/technical-documentation/summary-file-documentation.2013.html. The "Sequence Number/Table Number Lookup File" section of that page contains links to lookup tables that help you locate relevant variables. The current name of the spreadsheet file linked to that section is called, "ACS_5yr_Seq_Table_Number_Lookup.xls," We have downloaded and saved in the class locker this Excel-formatted spreadsheet with such metadata for the 5-year 2009-2013 ACS. Open the ACS_5yr_Seq_Table_Number_Lookup.xls spreadsheet and search for the string/key words "means of transportation to work". Click through all the entries that are found until you see "B08301. MEANS OF TRANSPORTATION TO WORK". The '28' in Column C (Sequence Number) tells us that the several columns of data in the B08301 table are included in Summary File 0028 and the starting and ending Summary File positions in the raw data file are 157 and 177. Lets take a look at exactly what data is in this table. An Excel-formatted spreadsheet with more detailed information about these variables has been saved here: ACS2013_TableShells.xlsx in our class locker. This spreadsheet was downloaded using the 'Table Shells' link on the ACS documentation page (https://www.census.gov/programs-surveys/acs/technical-documentation/summary-file-documentation.2013.html). The explanation of table B08301 (in row 8,633 out of 36,955 rows in the spreadsheet!) explains that the universe of persons for whom 'means of transportation to work' is reported comprises workers 16 years of age and over. There are 21 different variables in the B08301 table. The first column, B08301001 reports the total count for the 'universe' and the third column reports the count of workers who drove to work alone (among those who drove in a car, truck, or van). So we need the first and third rows of this table to compute the fraction who drove alone.

STEP 2: Locating and importing the table of interest into QGIS.

Now lets take a look at the metadata for the ACS files. The file named "BG_Metadata_2013", can be viewed directly on the Census website here. The data are structured with the table name first, followed by either an "e" or an "m", followed by an integer. The "e" and "m" indicate whether the attribute is an estimate or a margin of error (unlike with the Decennial Census, the ACS estimates include a margin of error). The last integer is the row number in the table.

The table that has the B08301 data is called "X08_COMMUTING". We've already created a spreadsheet (formatted as an Excel *.xls file) that contains an abbreviated version of table X08 in ./data/lab3_data. The file is called "X08_Short.xls". To save a bit of time, we are going to have you import the comma separated value (csv) version of this file "X08Short.csv" into QGIS instead of exporting the X08_COMMUTING file from the ACS geodatabase into Excel and then loading it into QGIS.

[Side Note]

How did we create X08Short.xls? We created a copy of table X08 in the geodatabase. Then, we deleted the fields from X08 that we did not need for this lab (i.e. we got rid of all ACS data fields that we did not need but kept B08301e1, B08301e3, and GEOID). We then exported the shortened table to an MS-Excel *.xls format, and also saved a csv copy of the xls file.

STEP 3: Build a table containing the percentage of workers who drive alone to work in Metro Boston.

Feature IDs of various Census spatial units may vary depending on what conventions are followed. For example, you'll see in "X08_Short.csv" that there is a "long" column 'GEOID' and a shorter version 'STFID'. The former is what was provided by the ACS file. However, we need the latter to be able to join with "ma_blkgrp.shp". You'll notice that 'GEOID' is actually 'STFID' with a prefix of '15000US', so we performed a simple substring operation on 'GEOID' to remove this constant prefix and created a new column 'STFID' to help with the join.

We will now join the "X08short" file with the "ma_blkgrp.shp" shapefile using the 'STFID' and 'GEOID10' columns respectively, as shown below.

 


After the join is complete, export the layer as a new layer called "ma_blkgrp_acs.shp". You can choose to work in a virtual layer as well, but operations may be significantly slower due to the relatively large size of the shapefile (~5,000 features).

When you open up the attribute table of this file ("ma_blkgrp_acs.shp"), you'll see three unintuitive (and 'weird') column names at the end. GIS software have the tendency to prefix the column names of joined columns with the layer name to differentiate the original layer from the layer that is being joined. For example, 'GEOID' from 'X08Short' will become 'X08Short_GEOID'. Since the default length of a column name is 10, QGIS can only accommodate 10 characters and 'X08Short_GEOID' will be shortened to 'X08Short_G'.

To increase legibility, you may choose to rename the two ACS fields we will work with (although this is optional). Right click on the layer, select 'Properties', and go to the 'Manage fields' tab. There, you can enter 'edit' mode and rename one or more columns. I chose to rename 'B08301e1' as 'Tot_Commute' (total number of commuters) and 'B08301e3' as 'DA_Commute' (number of drive alone commuters). You could have renamed the 'original' columns in the "X08Short" file (before or after importing it to QGIS), but the name would have been abbreviated anyway after the join to "ma_blkgrp_acs.shp" due to the 10-character limit on column names.

After renaming these columns, we need to calculate the % of drive alone commuters, which is simply a ratio of these two columns. However, these two columns are of data type 'string', not 'decimal'! We will need to convert them to 'decimal' data type first before taking the ratio.

Open the attribute table of "ma_blkgrp_acs.shp" and click on the 'Toggle editing mode' icon. After entering 'edit' mode, click the 'Open Field Calculator' icon. Create a new field 'TotalC_real' by initializing it as to_real("Tot_Commute") within the expression window. Don't forget to change the default output field type of 'decimal' from 'integer'. Similarly, create another new field 'DAC_real' that is the decimal representation of the 'DA_Commute' column.



Now that we have our two columns as decimals, we can go ahead with calculating the ratio. Click the 'New Field' icon. Initialize a new field of decimal type called 'DA_Ratio' (drive alone ratio) as DAC_real / TotalC_real. The default precision value of 3 works fine in this case, but you may choose to change that for other projects.

You also want to be mindful of cases where the denominator (i.e., number of total commuters) is zero. There are around 17 block groups for which the denominator is zero (along with the numerator - Why do you think that's the case?). When we take the ratio with a zero denominator, QGIS produces a null output. Thus, we find these block groups to have DA_Ratio = NULL. How should we handle them during visualization?

STEP 4: Visualize the percentage of workers who drive alone to work in Metro Boston.

Now that we have created a separate column 'DA_Ratio', we can go to 'Symbology' within 'Properties' of the "ma_blkgrp_acs" layer.

This is what you'd expect to see at this stage.



Part V: Does your commute mode depend on where you live?

We have a hunch that the type of community that a block group is located in might impact the percentage of people who drive to work alone. The Metropolitan Area Planning Commission (MAPC) has categorized communities/towns in the Metro-Boston as being "inner core", "regional urban centers", "maturing suburbs", "developing suburbs", or "rural towns". They have released a table with the names of towns and the category of the town. We've already loaded this table into ./data/lab3_data for you as "ComType.csv". We've also loaded a "cross-walk" that lists what town every block group in Massachusetts is (primarily) located within, "BGLookup.csv".

Before we start working with these tables, lets take a quick look at them.


[Side Notes]

[1] The "original" ComType file released by MAPC had the town names (i.e., the 'Community' field) listed in title case (i.e., the first letter of each word was in upper case and other letters were in lower case). Alternatively, the BGLookup file has town names (i.e., the 'Town' field) listed in upper case. We had to change the 'Community' field in the MAPC file to upper case so these two files can be joined. How did we do that? Similar to the 'to_real()' command we showed earlier, the 'upper()' command in the Field Calculator does the trick. See here for details.

[2] How did we create the BGLookup file? We had separate shapefiles for block groups and towns in MA, on which we performed a spatial join (what's the best way to spatially join two layers - intersect, overlap, contains, nearest, ...? do you want one-to-one or one-to-many operations?). See here for details.

We want you to create a separate map where you start off with the map you created earlier in Part IV and overlay those MA cities and towns that are classified by MAPC as CommunityType = 'Regional Urban Centers'. Add the MA_Town_2010.shp shapefile from the class data locker, perform the appropriate joins with the community type and/or the block group lookup tables, and highlight the towns that are regional urban centers. The final map that you create will be a thematic map of the 'DA_Ratio' which also highlights regional urban centers. As always, include the appropriate cartographic elements (scale, legend, title, author & date, source(s), north arrow, and classification choice) in your map and create a layout.



Part VI: Getting Census data from the web (Optional)



Part VII. Assignment

Please submit your lab exercise to Stellar. We prefer that you submit one PDF file that includes all the parts of the lab assignment. The assignment is due on Monday, March 15, 2021.


Developed by Thomas H. Grayson and Joe Ferreira, 1998.
Modified 2000-2015 by Anne Kinsella Thompson, Thomas H. Grayson, Sarah Williams, Jeeseong Chung, Jinhua Zhao, Xiongjiu Liao, Mi Diao, Lulu Xue, Shan Jiang, Jingsi Xu Eric Schultheis, Melissa Chinchilla, and Juan Camilo Osorio.

Last Modified by Rounaq Basu on March 7 , 2021.

Back to the 11.188 Home Page. Back to the CRON Home Page.