11.188: Urban Planning and Social Science Laboratory 11.205: Intro to Spatial Analysis |
Lab Exercise 4:
Working with
American Community Survey Data
February
24, 2020 Lab Exercise (Due
March
9)
In this exercise you will use the American
Community Survey (ACS) to create a thematic map indicating the
percentage of workers in Eastern Mass block groups who drive alone to
their jobs. The data about mode choice for workers comes from the
2009-2013 ACS data and the boundary files for Mass census block groups
comes from the 2010 TIGERline files.
Our final map will look similar to other thematic maps that we have
generated, but it takes much more effort and understanding when the
data sets are large and not already integrated with the map files.
Planners are generally asking new questions that require finding and
integrating new data in a manner appropriate for the questions at
hand. In this case, it takes some effort and understanding to identify
and locate the relevant ACS variables, construct the appropriate
indicator for each 2010 block group, and then join the new indicator
to the block group map. This lab also covers new ground because you
will be working with a different Census Bureau data product, the ACS,
as opposed to the 2000 Decennial Census data that we used for the
example developed in lecture. In 2000, the Census Bureau began phasing
in the ACS to replace a large portion of the decennial Census. In
2010, the Census Bureau completed this task. The ACS is a bit
different than the decennial census and it is well worth reading both
the
lecture notes about census data and the first 12 pages of A
Compass for Understanding and Using the American Community Survey:
What General Data Users Need to Know.
In order to develop this map, you will have
to create a table that contains the percentage of
workers who drive alone to work (by car, truck, or van) for each block
group in Boston Metropolitan Area. But the raw ACS data provide counts
of the number of workers within each block group that use each
transportation mode to get to work. Unlike the simple example
(involving median earnings) shown in class, computing this percentage
will require normalizing the ACS counts. (Here's
a link to some additional discussion about when and how to >normalize>
the data.).
The in-lab discussion notes are here: Lab #4 notes.
Before beginning the ArcMap work for this exercise, be sure to download the entire ./data/lab4_ACS_09_13 folder from the class 'data' folder into C:\temp (or your flash drive or portable HD) and open your local copy of 11188_lab4_ACS_09_13.mxd that you find within that folder. [Alternatively, you may download the one zipped file, ./data/lab4_ACS_09_13.zip, into C:\temp and then unzip it into that directory to uncompress all the files that we need.]
For this exercise, we will use a geodatabase that contains the geometry for 2010 Massachusetts Block Groups and the 2009-2013 ACS data. You will also use one of the shapefiles that provide boundaries for 2010 Massachusetts counties. The ACS uses census boundaries (e.g. block groups, tracts, etc.) of the most recent Decennial Census upon the data's release. Since we will be using 2009-2013 ACS data, we want to use 2010 block groups. The geodatabase is called: ACS_2013_5YR_BG_25.gdb (the file is compressed and will need to be "unzipped" before use). The county file should be sourced as "2010 TIGER/Line Data." The geodatabasse should be sourced as "2010 Tiger/Line and 2009-2013 American Community Survey data." For your convenience, we have included the county shapefile from the Census Bureau's TIGER/Line FTP site. We downloaded the geodatabase from a section in the Census Bureau's site that provides TIGER/Line data joined to selected demographic and economic data. For your convenience, we downloaded these files, and others that you will need for the lab, and placed them into a sub-foler of the class locker, ./data/lab4_ACS_09_13/.We also included an ArcMap document, 11188_lab4_ACS_09_13.mxd, in that folder that loads the county shapefile (MA_County_2010.shp), with relative path addresses, and includes several geospatial web services.
Later in the Lab you will use the large (158 MB) geodatadatabase of 2009-2013 ACS, 5-year estimates for Massachusetts census block groups. A zipped version of this geodatabase is included in the files that we downloaded from the data locker. Before we can use this geodatabase, we need to "unzip" it. Using compression software (e.g. 7zip), "unzip" the file with the geodatabase, ACS_2013_5YR_BG_25.gdb, into a local directory. (Extracting or unzipping the file into C:\temp\lab4_ACS_09_13 using the default settings is fine, but will bury the actual geodatabase, ACS_2013_5YR_BG_25_MASSACHUSETTS.gdb, one level deeper into a folder with the same name.) Next, in ArcMap, after opening 11188_lab4_ACS_09_13.mxd, click the Add Data icon, navigate to the location of the "unzipped" geodatabase. You will notice that your geodatabase contains multiple files. The first item in the list, ACS_2013_5YR_BG_25_MASSACHUSETTS, contains the boundaries of the Massachusetts census block groups for the 2009-2013 ACS data. For now, just add this layer to your ArcMap document.
If you started ArcMap by opening 11188_lab4_ACS_09_13.mxd, then your ArcMap Data Frame includes the Mass County shapefile layer and the Data Frame will display the data using Mass State Plane coordinates. However, if you first opened a different map document, the Data Frame may display features using a different coordinate system and any new shapefiles that you add may be displayed in another coordinate systems - perhaps geographic coordinates (lat/long) instead of the desired projection (using Mass State Plane coordinates). If necessary, change the viewing projection in the Data Frame to "State Plane/NAD 1983/NAD_1983_StatePlane_Massachusetts_Mainland_FIPS_2001" from the "Coordinate System" tab in the Data Frame Properties window (right click on data frame, and select properties). In select a coordinate system window, expand Projected Coordinate Systems, followed by State Plane, NAD 1983 (meters), and NAD 1983 StatePlane Massachusetts FIPS 2001 (meters).
Add into ArcMap (if you have not already done so) the MA_County_2010.shp shapefile of MA counties and the geometry for Massachusetts block groups ACS_2013_5YR_BG_25_MASSACHUSETTS located in the geodatabase, ACS_2013_5YR_BG_25.gdb. You can see what is in a geodatabase by navigating to the geodatabase in ArcMap (or ArcCatalog). In ArcCatalog, you can preview each of the files in the geodatabase. In ArcMap, you can use the 'add data' button to add a layer or table from the geodatabase to your map just like you would with a shapefile.
The block group layer was stored in the geodatabase in geographic coordinates (lat/lon) but the Data Frame converts them on-the-fly to display the layer in Mass State Plane coordinates. The block group geometry boundary file has no ACS data attributes - just the geographic identifiers that identify the block group. (Take a look by opening the attribute table.) Before we can compute the percentage of workers who drive alone to work, we need to find the relevant ACS data, add it into ArcMap, and join it (using the summary level-state-county-tract-block-group ID) to the block group data layer. Before doing this, we will focus on eastern Mass and export our own shapefile of the eastern MA block group boundaries.
Let's change the names of our layers so that they are easier to use. Change the Massachusetts county layer to "Massachusetts County Layer". Change the Massachusetts block group layer to "Massachusetts Block Group Layer".
In order to work with a smaller file, we want to create a new shapefile that represents the Block Groups that fall within major Boston area counties (Essex, Middlesex, Worcester, Suffolk, Plymouth, Norfolk, Bristol, and Barnstable). We can select these block groups using the Select By Location tool in ArcMAP.
In the Massachusetts County layer, select Essex, Middlesex, Worcester, Suffolk, Plymouth, Norfolk, Bristol, and Branstable counties. If you are familiar with counties in Massachusetts, you can simply use the graphical selection tools to select the eight counties. If you aren't, use the Selection > Select By Attribute tool to select these eight counties by name from the attribute table of the county layer. (SQL will be similar to the following equation - "NAME10" ='Middlesex' OR "NAME10" ='Worcester' OR ...)
With the counties selected you are ready to use the Select By Location tool. This tool works on the currently active theme, so you will need to make sure the Massachusetts Block Groups Layer is active. Go to the Selection menu and select Select By Location.When the dialog window appears, choose to select features from your active layer (Massachusetts Block Groups 2010) using the (default) spatial selection method "Target layer(s) features intersect with the Source layer feature" where the 'source' is the Counties layer. Refer to lab 2 for how to use the Select By Location tool. What problems might occur due to selecting features that intersect with the counties layer? Could you use "have their centroids within" selection option instead? What problems might that cause? Since Census blocks add up to Census block groups which add up to Census tracts which add up to counties which add up to states, could you use a select by attribute instead? On what variable? [Note: the names of the layers in the Figures may differ from your shapefile names.] *Total number of selected features may differ from screen shot.
STEP 3: All the Block Groups that intersect with Essex, Middlesex, Worcester, Suffolk, Plymouth, Norfolk, Bristol, and Branstable should now be highlighted. You are ready to create a new file based on these selected features (right click the layer--"Massachusetts Block Groups" and then click ArcMap's Data > Export Data tool). Call this new layer b_blkgrp.shp, save it to your working directory and add this layer into ArcMAP. [Note: When exporting the data you can choose whether to use the same coordinate system as "this layer's source data" or "the Data Frame." Do you understand the difference?] As usual, these operations will be much faster if your 'working directory' is a truly local drive.
In your data frame, remove the original ACS_2013_5YR_BG_25_MASSACHUSETTS layer of block groups. You may also remove the County layer although it is small and useful to show county boundaries in Eastern Mass. From this point you will only need to use the b_blkgrp.shp layer. Be sure to save your ArcMap session at this point.
These instructions explain how to identify a
particular ACS variable of interest, find the relevant data table, and
join the tabular ACS data to a geographic boundary file. In the
following steps, we will first identify which ACS table and fields we
are interested in. Second, we will import the table into MS-Access to
do some basic data processing. Lastly, we will join the tabular data
to a geographic boundary file.
As explained in lecture, the ACS data includes thousands of variable breakouts. Finding the desired variable, and then locating the file that contains that variable for a particular state, can be a time consuming task. The main US Census website for ACS summary file documentation is here: https://www.census.gov/programs-surveys/acs/technical-documentation/summary-file-documentation.2013.html. The "Sequence Number/Table Number Lookup File" section of that page contains links to lookup tables that help you locate relevant variables. The current name of the spreadsheet file linked to that section is called, "ACS_5yr_Seq_Table_Number_Lookup.xls," We have downloaded and saved in the class locker this Excel-formatted spreadsheet with such metadata for the 5-year 2009-2013 ACS. Open the ACS_5yr_Seq_Table_Number_Lookup.xls spreadsheet and search for the string/key words "means of transportation to work". Click through all the entries that are found until you see "B08301. MEANS OF TRANSPORTATION TO WORK". The '28' in Column C (Sequence Number) tells us that the several columns of data in the B08301 table are included in Summary File 0028 and the starting and ending Summary File positions in the raw data file are 157 and 177. Lets take a look at exactly what data is in this table. An Excel-formatted spreadsheet with more detailed information about these variables has been saved here: ACS2013_TableShells.xlsx in our class locker. This spreadsheet was downloaded using the 'Table Shells' link on the ACS documentation page (https://www.census.gov/programs-surveys/acs/technical-documentation/summary-file-documentation.2013.html). The explanation of table B08301 (in row 8,633 out of 36,955 rows in the spreadsheet!) explains that the universe of persons for whom 'means of transportation to work' is reported comprises workers 16 years of age and over. There are 21 different variables in the B08301 table. The first column, B08301001 reports the total count for the 'universe' and the third column reports the count of workers who drove to work alone (among those who drove in a car, truck, or van). So we need the first and third rows of this table to compute the fraction who drove alone.
Now lets take a look at the metadata for the files within the ACS block group geodatabase, ACS_2013_5YR_BG_25.gdb, that we provided for this lab. The file named BG_METADATA_2013, can be viewed directly on the Census website: here, or you can use ArcCatalog to preview it within your copy of the geodatabase. The data are structured with the table name first, followed by either an "e" or an "m", followed by an integer. The "e" and "m" indicate whether the attribute is an estimate or a margin of error (unlike with the Decennial Census, the ACS estimates include a margin of error). The last integer is the row number in the table. Lets make sure that this is right. Add this table (BG_METADATA_2013) into your ArcMAP table of contents pane (or Drag-and-drop the table if you have navigated to it via ArcCatalog) and use "find" to search for B08301. As expected, B08301e1 is "MEANS OF TRANSPORTATION TO WORK: Total: Workers 16 years and over -- (Estimate)" and B08301e3 is "MEANS OF TRANSPORTATION TO WORK: Car, truck, or van: Drove alone: Workers 16 years and over -- (Estimate)"
Next, click the "add data" button in ArcMap and navigate to the ACS geodatabase. Click the geodatabase to look at the files inside it. There are a bunch of tables named by the type of data that they contain and one geographic boundary file. Looking at the table names, it seems like the table that has the B08301 data would be X08_COMMUTING. Add that table to the map. You could also preview this file in ArcCatalog to confirm that it contains the B08301 columns that we want and then drag-and-drop it into your ArcMap table of contents.
Lets look inside the table X08. First, make sure that "List by Source" is the selected view type in your table of contents. If you have "List by Drawing Order" selected you will not see the table in the table of contents. Next, right click the table and select "Open". We want to make sure that the table contains the fields B08301e1 and B08301e3. Does it?
ArcGIS can use two types of geodatabases, a file geodatabase or a personal geodatabase. There are several differences between the two geodatabase types, but the one that we need to note is that a personal geodatabase is already an MS-Access database (in the older *.mdb format and with some spatial index tables added by ArcGIS.). However, a file geodatabase is just a 'zipped up' collection of files. There are two ways we can determine what type of geodatabase the ACS geodatabase is. We could look at the site where we downloaded it from to see if they tell us or we could right click on the ACS geodatabase in ArcCatolog toolbar in ArcMap (or the stand alone ArcCatalog program), select properties, and look at the "General" tab.
Since the ACS geodatabase is a file geodatabse we will need to export the table X08_COMMUTING as a file type that can be imported into MS-Access (i.e. mdb., .txt, .xls, etc.). Using ArcCatalog, you could 'export' a table within the file geodatabase into other formats or directly into another personal geodatabase. To make things run smoothly, we've already created a spreadsheet (formatted as an Excel *.xls file) that contains an abbreviated version of table X08 in ./data/lab4_ACS_09_13. The file is called X08_Short.xls. To save a bit of time, we are going to have you import X08Short.xls into MS-Access instead of exporting the X08_COMMUTING file from the ACS geodatabase into Excel and then loading it into MS-Access.
How did we create X08Short.xls? We created a copy of table X08 in the geodatabase. Then, using the ArcGIS delete field tool we deleted the fields from X08 that we did not need for this lab (i.e. we got rid of all ACS data fields that we did not need but kept B08301e1, B08301e3, and GEOID). We then exported the shortened table to an MS-Excel *.xls format using ArcGIS.
Open MS-Access and open lab4_ma.mdb.
Select the external data tab (Open Other Files). Next, click the "Excel" option.
A pop-up window will appear. Navigate to X08Short.xls and import the table.
Double click X08Short on the left hand side to open the table you just imported into MS-ACCESS. The table has three fields. The first field, GEOID, is the field that lets us join the tabular data to our block group geometry file. Since we already figured out what ACS field names mean, we know that the second field, B08301e1, is a count of all workers 16 and over and the third field, B08301e3, is a count of workers who drove to work alone.
We want to calculate the percent of workers who drove to work alone. We have a hunch that the type of community that a block group is located in might impact the percentage of people who drive to work alone. The Metropolitan Area Planning Commission (MAPC) has categorized communities/towns in the Metro-Boston as being "inner core", "regional urban centers", "maturing suburbs", "developing suburbs", or "rural towns". They have released a table with the names of towns and the category of the town. We've already loaded this table into ./data/lab4_ACS_09_13 for you as Comtype. We've also loaded a "cross-walk" that lists what town every block group in Massachusetts is (primarily) located within, BGLookUp.
Before we start working with these tables, lets take a quick look at them. Open the X08Short table that you just imported from Excel into MS-Access. The table has 4,985 records. Why? Open ComType. How many records does it have? Open BGLookUp. The table has 4,981 records. (Why? Because, when we created BGLookUp, we excluded "off shore" block groups)
To calculate the percentage of workers who drove alone to work and associate the community type data with each block group, we are going to build two queries. As in many things, there are several different ways one could accomplish this task. One way would be to do it in ArcGIS. You could also do it in Excel. We are going to show you how to do it in MS-Access. When you are doing this type of task in the future, you will have to determine what method works best for your particular project.
We are now going to build our first query. Click the "Create" tab in the main menu and the "Query Design" to get a screen like the following:
Select tables BGLookUp and ComType. Add them in and close the "Show Table" window.
Now join the two tables by dragging and dropping the Town field in the BGLookUp table onto the COMMUNITY field in the ComType. (Do you understand what this does?) The query window will look something like the following.
Add all the fields in the BGLookUp table to the query by double-clicking the "*". Add the CommunityType field in the ComType table by double-clicking the field. Your query window will look something like the following. [NOTE: the 'Type#' column assigns community type numbers that are further refined into 9 categories whereas the CommunityType column distinguishes only five different types. We will use only these 5 types in this exercise.]
Click the "Run" button on the design tab's ribbon. Right-click the query tab and select "save". Call the query q_LookUpAndType.
We are now going to to create a query that joins the ACS data in XOShort and calculates the percentage of workers who drive alone to work. Click the "Create" tab in the main menu and the "Query Design".
Select the query we just made, q_LookUpAndType and XOShort (Look under Queries tab and Tables tab). Add them in and close the "Show Table" window.
Now join the query and the table by dragging and dropping the GEOID_Data field in the q_LookUpAndType query onto the GEOID field in the XOShort table. Do you understand what this does?
Add all the fields in the q_LookUpAndType query to the query by double-clicking the "*". Add the B08301e1 and B08301e1 fields in the XOShort. Now we are going to create a field in the query that calculates the percentage of workers who drive to work alone
In one empty column at the bottom of the query design view, right-click and choose "Build". In the expression builder that pops up, type in "DRatio: [X08Short]![B08301e3] / [X08Short]![B08301e1]" and click OK. Do you understand what this expression does? What would be wrong with entering "DRatio: [X08Short]![B08301e1] / [X08Short]![B08301e3]"? How about "DRatio [X08Short]![B08301e3] / [X08Short]![B08301e1]"? You can omit the table name"[X08Short]!" if the table associated with the column names are unambiguous. Your Query window will look something like the following.
You can preview your table by clicking "Run!" under the Query Menu. You should have 4978 rows in the resulting table. NOTE, however, that the ratio will be undefined for any block group that has no workers aged 16 and over because the denominator will be zero. Take a look at those rows in your results and you will see "#Num!" instead of a number. To avoid this problem, edit your query to include the criteria '>0' in the Criteria row under the B08301e1 column. Do you see why the criterion is applied to this column? When you run the revised query, you should have 4957 rows.
You can save this query (via file save) and MS-Access will save the SQL instructions that describe your query (rather than the output table itself). However, ArcGIS sometimes has problems with data types when importing queries from Access. In order to clarify data types, we will use this query to create a new Access table. Switch back to the 'Design View' for the query. (Click the 'View' button on the left of the main menu. Click the arrow below 'View' to see the choices.) In the 'Design' tab, choose "Make-table" query, enter the table name "DARatio" and click OK. At this point you have identified the name of the table you would like to save - but you have not yet created the table!
Under the Query Tools (Design) menu, click "Run!". One or two message windows will pop up as shown below. Click Yes to confirm both. The second message window will arise if you did not fix the divide-by-zero problems with your query. In that case, the message is complaining about the non-numeric result for 21 rows and changes these values to 'null' (i.e., missing) when saving the table. In our case, you have already fixed the query to select only those block groups for which B08301e1 is 0. [BEWARE: MS-Access will insert a '#Num' value (i.e., not numeric) for any block group without workers aged 16 and over, and it will properly calculate the fraction for the rest. However, when using ArcMap to do a similar calculation, ArcMap may stop processing records as soon as it encounters a 'divide by zero' error. All the remaining rows are then likely to contain a zero value. So it is better practice to do calculations involving division only on those rows for which the denominator is never zero or missing.]
Turn in a screen shot of your DARatio table in Access with a dozen or so rows visible along with the columns you used to calculate the ratio, the column with the calculated ratio, the the column with the unique geographic identifier, and the status line at the bottom of the MS-Access table that shows the total number of rows in the table.
Next, we want to add into ArcMap the table DARatio from the MS Access database. ArcMap will recognize MS-Access database files (ending in *.mdb) in the same way that you can 'Add' a shapefile to ArcMap by finding them in the file system. So, we add the table just as we would add a shapefile by navigating to the file directory where we stored lab4_ma.mdb. (It is also possible to define a new OLE database connection from within ArcCatalog or ArcMap using the Microsoft Jet driver in order to import MS-Access tables into ArcMap. That method will import MS-Access queries as well as tables but can have data type conversion problems). If you cannot see your MS-Access database, then it is probably not in the .mdb format. You will need to save the MS-Access database in the 2002-2003 version of their format - that is, the format for MS-Access files ending in .mdb.
After adding your DARatio table, join it to the Boston Block Group layer (b_blkgrp.shp) using the field GEOID_Data in the DARatio table and the field GEOID_Data from the Boston Block Group attribute table. You may receive some errors, just click through these for now.
Make a thematic map showing the ratio of drive-alone workers by block group using the "DRatio" field in the table (Choose a reasonable symbology method and explain your choice). The map (without any highlighting of Towns by Community Type) should be similar to the following one:
As always, include the appropriate cartographic elements (scale, legend, title, author & date, source(s), north arrow, and classification choice) in your map and create a layout. In addition, we would like you to highlight those Massachusetts cities and towns that are classified by MAPC as CommunityType="Regional Urban Centers." The community type information is in your DARatio table. Add the matowns00.shp shapefile from the class data locker and highlight the towns that are urban regional centers. (NOTE: We want a map that shows a thematic map of your ratio and also highlights the ma_towns that are regional urban centers. To do this, you need to determine which towns are in which 'community type.' This information is in the DARatio table that you created and also in the BGlookup table that came in the MS-Access database (lab4_ma.mdb) that we provided. You will need to choose an appropriate method to bring the 'CommunityType' info into the matowns00.shp so that you can highlight the correct towns.)
Also, in an annotation section, or on a separate page, write a few sentences that interpret your results (e.g., are the percentages driving alone surprisingly large or small; which classification method did you choose and why, do you see any pattern regarding block groups near and far from metropolitan centers or transportation corridors?) Do any of the MassGIS or ESRI web services help interpret the map by providing useful background or overlay data?
Export the final layout into a ".pdf" file, save it in your network locker and submit your map and explanatory text to us via Stellar as one file.
For this exercise, the attribute data will
come from the detailed census tabulations that are available online.
The census block group geographies are also available online but we
have provided a copy in the class locker (as discussed in Part I). The
MIT Libraries maintains a GeoWeb site that allows these and many other
geospatial data sets to be downloaded. You are not required to use
this online geospatial data repository for this census lab exercise,
but the GeoWeb is likely to be a useful data resource for your
individual project work later in the semester. You may want to check
out the GeoWeb now and get comfortable with the interface and
resources that it offers.
The MIT Library's GeoWeb is here and the homepage for all the
MIT Library GIS Services is: here.
For this optional lab exercise, you can use
the GeoData Search Tool to download Massachusetts Block Group 2000 and
Massachusetts County Boundaries 1991. You will need an account with
the Library and you will need an MIT web certificate to register for
the account. You should already have needed certificates installed in
your web browser. You will be asked to log in. Follow the prompts.
Use the MIT Library's GeoWeb in a browser to
download the needed shapefiles. Search for file names. Once located
select the file from search results. On the right hand side, select
"Download as." You will download as shapefile.
Save the shapefile in your working folder.
Open a new ArcMap window and import your file. You may have to unzip
the downloaded folder to access these files. Note: Your results may
look a little different from the screenshot since the MIT Libraries
have updated some of the names of their datasets. For this lab
exercise, we used two layers from the MIT GeoWeb:
The ArcMap window now should look like this:
Please submit to Stellar your lab exercise including your MS-Access screen shot from Part IV and your ArcMap layout from Part VI, plus your text discussing your results. We prefer that you submit ONE file that includes all the parts of the lab assignment. The assignment is due on Monday, March 9, 2020.
Developed by Thomas
H. Grayson and Joe Ferreira, 1998.
Modified 2000-2015 by Anne Kinsella Thompson, Thomas H.
Grayson, Sarah Williams, Jeeseong Chung, Jinhua Zhao, Xiongjiu
Liao, Mi Diao, Lulu Xue, Shan Jiang, Jingsi Xu Eric Schultheis,
Melissa Chinchilla, and Juan Camilo Osorio.
Last Modified by Rida Qadri on February 11 , 2020.
Back to the 11.188 Home Page. Back
to the CRON Home
Page.