11.520: A Workshop on Geographic Information Systems |
11.188: Urban Planning and Social Science Laboratory |
Population Characteristics | Housing Characteristics |
Age | Tenure |
Gender | Value or Contract Rent |
Race | Vacancy Status |
Hispanic Origin | Number of Rooms |
Marital Status | Units in Structure |
Household Type | Congregate Housing |
Household Relationship | |
Population Characteristics | Housing Characteristics |
Social Characteristics | Age of Housing |
Education | Heating Fuel |
Citizenship | Facilities |
Ancestry | Vehicles |
Language | Mortgage Status |
Disability | |
Children | |
Place of Birth | |
Economic Characteristics | |
Income | |
Labor Force Status | |
Employment | |
Place of Work | |
Public Assistance | |
Retirement Income | |
The Census organizes and aggregates data into a series of geographic hierarchies
Standard Hierarchy of Census Geographic Entities (from Census 2000 Summary File 1 Technical Documentation, prepared by the U.S. Census Bureau, 2001, p. A-25)
Summary
LevelGeographic Unit 010 United States 020 Region: Northeast (NE), Midwest (MW), South (S) and West (W) Regions 030 Division:
Northeast: New England, Mid Atlantic
Midwest: East North Central, West North Central
South: South Atlantic, East South Central, West South Central
West: Mountain, Pacific040 State (includes Washington, D.C. & Puerto Rico) 050 County 060 County Subdivision 070 Place 080 Census Tract / Block Numbering Area
(average 4,000 persons)090 Block Group (average 1,000 persons) 100 Block (average 85 persons)
Summary Level Geographic Unit 040 State (includes Washington, D.C. & Puerto Rico) 050 County 140 Census Tract 150 Block Group
Summary Level | Geographic Unit |
400 | Urbanized Areas |
300 | Metropolitan Areas (MSAs, CMSAs) |
200 | American Indian and Alaska Native areas |
800 | ZIP codes |
Hierarchy of American Indian, Alaska
Native, and Native Hawaiian Entities (from Census 2000 Summary File 1 Technical Documentation, prepared by the U.S. Census Bureau, 2001, p. A-26) |
ArcView shapefiles and ArcInfo coverage formats are readily available for 1990 and 2000 Census geography boundaries
The most useful files distributed by the Census Bureau are the Summary Tape Files (now renamed simply Summary Files) that aggregate the individual census forms to various levels of census geography. American FactFinder provides a forms-based online interface to many US Census datasets including SF1 and SF3. The FactFinder website is convenient when you want data for a single census tract or a small number of areas. It is also convenient when you want a few percentages (such as percent owner-occupied) that would otherwise require downloading the numerator and denominator needed for your own calculation. If you need to download many variables or data for many areas, you may be better off accessing the core SF1 and SF3 datasets described above via the following links. (The GIS lab in the Rotch Library has many Census CDs and other third-party tools that may also be helpful. Later in the semester, we will also use the MIT Library's online geodata repository that contains direct ArcGIS access to many useful datasets including some US Census data.)
The Census Bureau distributed the 1990 Census files as DBF files on CD-ROMs. The Census Bureau has posted the contents of many 1990 CD-ROMs online. These are available via HTTP and FTP. Also,
In fact, the 1980 STF 1 and STF 3 are now online! You can obtain the 1980 STF 1 via HTTP or FTP and the 1980 STF 3 via HTTP or FTP. Documentation is available from the Odum Institute for Research in Social Science.
The Census Bureau is distributing the 2000 Census files on CD-ROMs, DVD-ROMs in a proprietary format and online in flat ASCII format via HTTP and FTP.
The 1980 STF 1 and STF 3 files had varieties similar to those of the 1990 Census.
- Here are detailed notes on determining unemployment rates from the raw US Census tables - including importing the raw text files into MS-Access (Note: All these steps are NOT needed for class and homework since we have already downloaded and formatted key Mass datasets. The extra steps preceeded by '###' are included here to clarify the complete process in case you have need for some other project to obtain other census tables besides the one used in our exercises.)
- How should we measure unemployment rate: Census definition is: " the fraction of adults aged 16 or over who are in the labor force and are unemployed (during the sample week in April 1999)"
- Find the relevant SF3 census 2000 variables: we use the SF3 technical documentation (Ch. 3) to find variable P43: employment status by sex, and the name of the text file that includes the raw data for this variable (ma00004.uf3)
- ### Find and download the zipped datafile that contains P43 for Massachusetts as an ASCII 'flat file' - this file is called: ma00004.uf3
- ### Find and download the zipped datafile that contains the geographic identifiers for Massachusetts - this file is called: mageo.uf3
- ### Find and download the MS-Access templates that will let you pull the ASCII plain-text data into MS-Access:
- Explained in the 'readme.txt' file in the same directory as the zipped data files. Note, that readme.txt also includes the cross-referencing of the census variables (such as P43...) with the text file that bundles the data (such as ma00003.uf3).
- The zipped template for MS-Access 2000 is here: http://www.census.gov/support/2000/SF3/Acc2000.zip
- ### Import the relevant Mass data into Access tables
- rename the unzipped text files to end in 'txt'
- Use the File/Get-external-data/Import option in MS-Access, with the file type set for text files, and select the unzipped file that you renamed with a 'txt' suffix;
- In the dialogue box that lets you tell MS-Access how to parse the text file, click 'Advanced' and choose the 'specs' that apply to the particular data file (for example, ma000043)
- Develop MS-Access query to join the geography and P43 tables.
- Here are the variable names that correspond to each of the 15 columns for P43 data
P43. SEX BY EMPLOYMENT STATUS FOR THE POPULATION 16 YEARS AND OVER [15] Universe: Population 16 years and overP043001: Total: P043002: Male: P043003: In labor force: P043004: In Armed Forces P043005: Civilian: P043006: Employed P043007: Unemployed P043008: Not in labor forceP043009: Female: P043010: In labor force: P043011: In Armed Forces P043012: Civilian: P043013: Employed P043014: Unemployed P043015: Not in labor force
- Join the tables using the 'logrecno' column
- Build a state+county+tract+blockgroup 12-digit block group identifier so you can join to the blockgroup map
- Compute the percent unemployed = 100*(P043007+P0430015)/(P043005+P043012)
- Choose appropriate summary level (150) in order to get right counts for block groups
- Refine and use query to pull relevant rows and columns for block groups in all of Mass (or just for Middlesex County if we only want Cambridge and its neighbors north of the Charles River (all of which are in Middlesex County).
- Join tabular data to map of blockgroups for Middlesex County (obtained use MIT geodata tool from Library SDE server)
The four steps above that are marked with ' #' are not needed for the class exercises since we have already built an MS-Access database with the Census variables needed for the lab and homework exercises.
This data extraction and mapping exercise is complicated because the datasets are so large and include so many variables and geographic identifiers. But it is illustrative of the issues and steps involved in (a) understanding very large and highly structured datasets, and (b) using desktop tools to find, download, and mix-n-match geometry and tabular data from different online sources.
Note that the US Census provides many online tools to obtain census data. Likewise, there are many third-party tools and CDs that repackage the data in smaller chunks, with or without maps, and sometimes in pre-processed forms (e.g., after normalizing to percent owner-occupied rather than just as the raw counts). These assorted tools fill many nitche markets. Relatively few census data users understand the data structure and raw files at the level described in these lecture notes - i.e., at the level needed to find and use any of the thousands of columns of data that are available at each level of geography..
Last modified 1 October 2008 by Joe Ferreira.
Back to the 11.520 Home Page. - Back to the CRON Home Page.