11.520: A Workshop on Geographic Information Systems |
11.188: Urban Planning and Social Science Laboratory |
Lab Exercise 5:
Working with 2000 Census Data & MIT Geodata Repository
Due October 18, 2010
In this exercise you will use the Census STF3 files to create a thematic map indicating the percentage of workers in block groups in and around Middlesex County who drive alone to their jobs. The data about mode choice for workers comes from the 2000 US Census data and the boundary files for Mass census block groups comes from the MIT Geodata Repository maintained by the MIT Libraries. (The MIT Geodata Repository provides online access to many large, useful GIS datasets.)
In order to develop this map, you will have to create a table that contains the percentage of workers who drive alone to work (by car, truck, or van) for each block group in Boston Metropolitan Area. But the raw census data provide counts of the number of workers within each block group that use each transportation mode to get to work. Unlike the simple example (involving median earnings) shown in class, computing this percentage will require normalizing the census counts. (Here's a link to some additional discussion about when and how to normalize the data.)
Useful Resources :
The in-lab discussion notes are here: Lab #5 notes but they do not contain anything beyond what is in this lab exercise.
For this exercise, the attribute data will come from the detailed census tabulations that are available online, but the census block group geographies will come from a different source - an online geospatial data repository at the MIT Libraries. We are going to use two geographical layers from the MIT Geodata Repository--Massachusetts Block Group 2000 and Massachusetts County Boundaries 1991. We have already used the MIT Library's GeoWeb (http://web.mit.edu/geoweb) in the first lab exercise. The homepage for all the MIT Library GIS Services is here: http://libraries.mit.edu/gis/index.html, and the 'spatial data' webpage is here: http://libraries.mit.edu/gis/data/index.html. In order to use the MIT Geodata Repository to download boundary files directly into ArcMap, you will need to register with the MIT Libraries and activate the MIT GeoData Seach Tool extension to ArcGIS. The Search Tool extension is already available for use by the Lab machines, so you will only need to install for your personal machine. Detailed information about creating an account and downloading the search tool is here: http://libraries.mit.edu/gis/data/repository/about.html. For this lab exercise you will need an acocunt with the Library and you will need an MIT web certificate to register for the account. At this point, you should create your account either by following the instructions in the above link, or by clicking on the registration button of the Search Tool's login window. (When registering use your MIT Athena (kerberos) ID but we suggest that you choose a different password since your Athena account is your most personal and important MIT account and you should not repeat its password for other accounts.)
Open ArcMap and make sure that the MIT Geodata Repository tool is available. Even though the MITGeodataTool.dll is installed, we may still have to tell ArcMap to include it in our own personal ToolBar. If you do not see the MIT GeodataRepository tool, then right click in the toolbars areas of the ArcMap window and choose "MIT Geodata Repository Toolbar" in the context menu. You may also need to click the "MIT GeoData Repositoy" toolbar after choosing View/Toolbar in order for the toolbar to become visible. This toolbar looks something like this: (The version installed on lab machines may differ slightly from the images shown here.)
When you first click any of the items on the toolbar, you will be prompted to login via a screen like this:
You need to register for an account in MIT Geodata Repository before you can use it. Click on "No Account? Register Now" and follow the instructions in the pop up Internet Explorer window to create a user account if you have not already registered. [Note also, that you must click 'Login to server' rather then just hit 'enter' after typing your password!!]
There are several ways to locate the 2000 Census Block Group boundary files. One way is to use the MIT Library's GeoWeb in a browser (as explained in Lab Exercise #1) and choose 'save link to this map' once you have found the dataset. Then you can click 'Data from Geoweb' on the Geodata Repository toolbar and then (once you are logged in) you can paste the link into the window that opens up. The link will be used by ArcMap to pull the census block group boundary files directly into ArcMap from the Geodata Repository.
Alternatively, you can click on the "Search Metadata " option on the Geodata Repository toolbar and browse the data in the Repository. Once you have logged in, you will see this screen:
Choose "Keyword" for the search type and search for "Massachusetts block group". Click on "Search" to get a search result that includes the following:
Click on the row for "Massachusetts (Block Groups 2000)" and click "Add Selected Layer to Map". It will take a while for the layer to be added from the MIT Geodata Repository server. The ArcMap window now should look like,
If your ArcMap Data Frame is empty when you add the Massachusetts census blocks from the Geodata Repository, the Data Frame will display the data using Mass State Plane coordinates (because that is how they are stored in the Repository.) However, depending upon which shapefiles you may already have loaded into your ArcMap session, the Block Groups layer may be displayed in geographic coordinates (lat/lon) instead of the desired projection (using Mass State Plane coordinates). If necessary, change the projection to "State Plane / NAD 1983 / NAD_1983_StatePlane_Massachusetts_Mainland_FIPS_2001" from the "Coordinate System" tab in the Data Frame Property window. In “Select a coordinate system” window, expand “Predefined”, then “Projected Coordinate Systems”, followed by “State Plane”, “DAD 1993”, and “NAD 1983 StatePlane Massachusetts_Mainland_FIPS_2001”.
Add in the "Massachusetts County Boundaries 1991" layer from the MIT Geodata Repository in the same way as you added the "Block Groups" layer.
At this point, we have successfully added the 2000 census block group boundary files for Massachusetts into our ArcMap session. However, these boundary files have no census data attributes - just the geographic identifiers that identify the block group. (Take a look by opening the attribute table.) Before we can compute the percentage of workers who drive alone to work, we need to find those data, add it into ArcMap, and join it (using the state-county-tract-block-group ID) to the block group data layer. Before doing this, we will focus on eastern Mass and export our own shapefile of the eastern Mass block groups boundaries.
In order to work with a smaller file, we want to create a new Shapefile that represents the Block Groups that fall within major Boston area counties (Essex, Middlesex, Worcester, Suffolk, Plymouth, Norfolk, Bristol, and Branstable). We can select these block groups using the Select By Location tool in ArcMAP.
STEP 1.In the Massachusetts County Boundaries layer, select Essex, Middlesex, Worcester, Suffolk, Plymouth, Norfolk, Bristol, and Branstable counties. If you are familiar with counties in Massachusetts, you can simply use the graphical selection tools to select the eight counties. If you aren't, use the Selection > Select By Attribute tool to select these eight counties by name from the attribute table of the county layer.
STEP 2. With the counties selected you are ready to use the Select By Location tool. This tool works on the currently active theme, so you will need to make sure the Massachusetts Block Groups 2000 layer is active. Go to the Selection menu and select Select By Location.When the dialog window appears, choose to select features from your active layer (Massachusetts Block Groups 2000) that Intersect with the selected features of the U.S. Counties layer. Refer to the lab 3 for how to use the Select By Location tool.
STEP 3: All the Block Groups that intersect with Essex, Middlesex,
Worcester, Suffolk, Plymouth, Norfolk, Bristol, and Branstable should now be
highlighted. You are ready to create a new file based on these selected features
using ArcMap's Data > Export Data tool (Refer the images below). Call
this new layer b_blkgrp.shp, save it to your working directory and add
this layer into ArcMAP. (As usual, these operations will be much faster if your 'working directory' is a truly local drive.)
STEP 4: In your data frame, remove the two layers from the MIT Geodata Repository. We no longer need these file and their size could affect processing speed. From this point you will only need to use the b_blkgrp.shp layer. Be sure to save your ArcMap session at this point.
These instructions explain how to identify a particular census variable of interest and extract it from the freely distributed text files containing the raw SF3 2000 US census data. However, since the processing of the raw text files is cumbersome, we have already imported into MS-Access the raw datasets that contain the tables that you will need. We have retained the full set of instructions in case you want to identify and extract other variables (e.g., for your project later in the semester).
STEP 1: Determining the desired Census variables and the raw text file that includes the desired variables.
Open Summary File 3 : Technical Documentation (.pdf) and search the key word "means of transportation to work". On Page 439, you will find "P30. MEANS OF TRANSPORTATION TO WORK FOR WORKERS 16 YEARS AND OVER [16]". There are 16 columns in the P30 matrix, with variable P030001 reporting the total count of workers 16 years of age and over (that is the universe for P30), and variable P030003 reporting the count of workers who drove to work alone. So the fields P030001 and P030003 are the two columns of our interest for this lab. Next, we have to identify the raw text file which contains the data for these two variables for Massachusetts.
Table 2-2 (File/Table Segmentation) in Chapter 2 of the Summary File 3 : Technical Documentation (.pdf) provides the necessary cross reference information associating the raw text files with particular census variables (specifically, the tables or matrix numbers such as P30). We can see that the file "st00003.uf3" contains the fields from P25 to P37. The P030001 and P030003 of our interest are in this file. For your convenience, we have also provided a direct link to the cross-reference table here: 2000 US Census Variable locator. that explains which Census variables appear in which dataset files.
Hence, the raw text files for Massachusetts that we want are "ma00003.uf3" for the P030001 and P030003 variables plus "mageo.uf3" for the geographic identifiers. We have already loaded this raw text file into MS-Access and saved the result in M:\data\census2k\lab5_ma.mdb
STEP 2: Copy the access template file, the geographical header file, and the data file "st00003.uf3" to your own working directory I:\11.520\lab5
(Note: The US Census Bureau provides its own detailed explanation of how to import the raw text files for SF3 census data into various databases such as MS-Access. Here is the link: http://www.census.gov/support/SF1ASCII.html. Steps 2 and 3 summarize the way to bring the raw data for Massachusetts into MS-Access. )
The st00003.uf3 file for Massachusetts is ma00003.uf3. All 76 text files, including st00003.uf3, are available (in 'zipped' form) at http://www2.census.gov/census_2000/datasets/Summary_File_3/Massachusetts/. The geographical header file for Massachusetts is mageo.uf3, and is also available at the same online site.
For your convenience, both ma00003.uf3 and mageo.uf3 have been downloaded to our class data locker: M:\data\census2k. This census subdirectory is also visible online as: http://mit.edu/11.520/data/census2k
In order to import these text files into MS-Access, you will also need the MS-Access template file "sf3.mdb." This template file contains variable names, data formats, and the like and is available at http://www.census.gov/support/2000/SF3/. A copy of this sf3.mdb template file has already been downloaded to M:\data\census2k.
In order to manipulate the three files that you need, you should copy them from M:\data\census2k to a local drive on your computer. (You can put them in your network locker - e.g., in I:\11.520\lab5 - but some of the files are large and will not been needed for long.)
STEP 3: Import ma00003.uf3 and mageo.uf3 to sf3.mdb
Before you import the raw text files ma00003.uf3 and mageo.uf3, change their extension name to ".txt" so that MS Access can recognize them as text files. Start MS-Access by double-clicking on your copy of the MS-Aaccess database file sf3.mdb. You will see dozens of tables that are all empty but contain specifications for the structure of each possible census datasset (that is data schema for all the tables).
Then click the File-->Get External Data-->Import menu choice as shown below:
In the Import Window, change the file type to ".txt" files, browse to your working directory and find the "mageo.txt" file.
In the Import Text Wizard window, be sure to click "Advanced". After the Import Specification Window pops up, click "Specs...". Scroll to the end to find "SF3GEO Specification" and click Open. Then click OK to return to the Import Text Wizard and click "Finish". The sequences are illustrated as below. (Note that the check boxes on the first few graphic below don't matter since loading in the "SF3GEO Specification" will change the file format type from however it starts out to become 'Delimited' and the code page choice will become '...ASCII'. That is what you want to match the format of the text files.)
Once the "mageo.txt" file is imported, a new table "Mageo" will appear in the MS-Access database control window. You need to import "ma00003.txt" in the same way except using "SF30003 Import Specification".
Note: If you worked through all the optional parts of Steps 2 and 3, you have now built your own MS-Access database from the raw files, and you can now skip to Step 5.
STEP 5: Build a query to construct a table that contains the percentage of workers who drive alone to work for each block group in Metro Boston.
Note: The illustrations below use the MS-Access database from the lab5_ma.mdb database.
Click the "Create" tab in the main menu and the "Query Design" to get a screen like the following:
Select both table "Ma00003" and "Mageo" tables (do you understand why you need both?). Add them in and close the "Show Table" window.
Now join the two tables by dragging and dropping the 'LOGRECNO' field in one table onto the 'LOGRECNO' field in the other table. (Do you understand what this does?) The query window will look something like the following. (Note: It is more reliable to join the tables based on 'LOGRECNO,' the census-designated row number, rather than on 'ID,' the row ID in the table.)
Follow the instructions below to complete the query.
1. Double Click "SUMLEV" in “Mageo” and enter "150" in the Criteria box since the summary level that we want is for block groups (within the simpler geographic nesting hierarchy that does not include 'place').
2. Double Click on P030001 and P030003 in “Ma00003” so that both fields show up in the bottom columns of the query construction interface.
3. Next, we want to define a column that computes the fraction of drivers who drive to work alone. In one empty column of the bottom table, right click and choose "build". In the expression builder that pops up, type in "Ratio: [Ma00003]![P030003] / [Ma00003]![P030001] " and click OK.
4. We also need to construct a geographic identifier that matches the state+county+tract+blockgroup code in our blockgroup map. In another empty column of the bottom table, right click and again choose "build". In the expression builder, type in "BLK_ID: [Mageo]![STATE]+[Mageo]![COUNTY]+[Mageo]![TRACT]+[Mageo]![BLKGRP]" and click OK. (Yes, you can cut-and-paste the formula from this exercise!)
5. Link the two tables via the common field LOGRECNO (which stands for 'logical record number').
The final query design should look like the following: (Actually, add the additional constraint that P030001 > 0 so block groups with no workers are excluded.)
6. You can preview your table by clicking "Run!" under the Query Menu. You should have 5053 rows in the resulting table. You can also save the query (via file save). However, ArcGIS sometimes has problems with data types when importing queries from Access. In order to clarify data types, we will use this query to make a new Access table. Switch back to the 'Design View' for the query. (Click the 'View' button on the left of the main menu. Click the arrow below 'View' to see the choices.) In the 'Query Type' section of the main menu, choose "Make-table query", enter the table name "BlockGroupDriveAloneRatio" and click OK.
7. Under the Query Tools (Design) menu, click "Run!". Two message windows will pop up as shown below. Click Yes to confirm both. (Note that, when you use summary level = '150' rather than '090' in the Access query, you will paste 5053 rows rather than 6728 and the second warning message will not appear. Summary level '150' has a single row for each block group. Summary level '090' may have more than one row per block group - it has one row for each unique block group within a 'place'. Since place boundaries sometimes divide block groups into two or more parts, those splits block groups will show up more than once in the summary level = '090' rows). The second message window is complaining because some of the selected block groups have no population aged 16 and over, in which case the percentage formula is trying to divide by zero. You could avoid the error message by further restricting the query to select only those block groups for which P030001 > 0.
STEP 6: Clean up the MS Access database and save the smaller piece that you need.
A new table BlockGroupDriveAloneRatio has been created from the above steps. This table has all the information we need to finish the lab exercise. In order to save disk space in your own directory, you can export the 5053 row table that you just created into another smaller MS-Access database and then delete your copy of the too-large lab5_ma.mdb (or "sf3.mdb") database by following the steps below.
Use MS-Access to create a new, empty MS-Access database in your working directory and name it, "DriveAloneRatio.mdb." [***If you are using the MS-Access 2007 Version, when you create a new blank database, save your new database by choosing Microsoft Office Access Databases (2002-2003 format) (*.mdb);, instead of Microsoft Office Access 2007 Databases(*.accdb)***]. Then, open your local copy of the database named lab5_ma.mdb (and let MS-Access close the new, empty "DriveAloneRatio.mdb" database). In the main database window pane of this MS-Access database, find your BlockGroupDriveAloneRatio table in the "Tables" listing and click it once to highlight the table name. Now click the 'External Data' tab on the main menu, and then, in the 'export' section, click the icon for the 'More' option and choose 'Access database.' In the 'export -access database' window that pops up, browse your file system to find and choose your empty DriveAloneRatio.mdb database. When you choose to export your table to this database, you will see a screen like this"
Keep the same table name and click OK to confirm the export. In Windows Explorer, you can check the file size of the new database. It should be around 500kb. Open the file DriveAloneRatio.mdb to check if it contains the correct data.Once you confirm that your new, small database is okay, you can delete your copy of the file "lab5_ma.mdb" database to save disk space,. Later in the lab, we only need to use the table in "DriveAloneRatio".
Next, we want to add into ArcMap the table BlockGroupDriveAloneRatio from the MS Access database DriveAloneRatio.mdb that we just created. ArcMap will recognize MS-Access database files (ending in *.mdb) so we can add the table by just as we would add a shapefile by navigating to the file directory where we stored DriveAloneRatio.mdb. (Remember from class lecture that you can also define a new OLE database connection (from within ArcCatalog or ArcMap) using the Microsoft Jet driver in order to import MS-Access tables into ArcMap. This method will save MS-Access queries as well as tables but can have data type conversion problems).
After adding your BlockGroupDriveAloneRatio table, jJoin it to the Boston Block Group Layer using the field BLK_ID in the BlockGroupDriveAloneRatio table and the field "STFID" from the Boston Block Group Attribute table. [Note, earlier versions of the census block group layer labeled the column 'BLK_KEY'. You want the column containing the state+county+tract+block-group ID that matches the one in you Access table.]
Make a thematic map showing the ratio of drive-alone workers by block group using the "Ratio" field in the table (Choose a reasonable symbology method and explain your choice). The map should be similar to the following one (but with fewer no-data polygons.The map that is shown below was generated using the wrong summary level = '090' and then ignored those blockgroups that appeared more than once in the table as a result of being split by 'place' boundaries).
As always, include the appropriate cartographic elements in your map and create a layout. In an annotation section, or on a separate page, write a few sentences that interpret your results (e.g., are the percentages driving alone surprisingly large or small; which classification method did you choose and why, do you see any pattern regarding block groups near and far from metropolitan centers or transportation corridors?)
Export the final layout into a ".pdf" file, save it in your network locker and submit your map and explanatory text to us via Stellar.
Developed by Thomas H. Grayson and Joe Ferreira, 1998.
Modified 2000-2010 by Anne
Kinsella Thompson, Thomas H. Grayson, Sarah Williams, Jeeseong Chung, Jinhua Zhao, Xiongjiu Liao, Mi Diao, Lulu Xue, Shan Jiang and Joe Ferreira.
Last Modified by Joe Ferreira 4 October 2010.
Back to the 11.520 Home Page. Back to the CRON Home Page.