11.188: Urban Planning and Social Science
Laboratory 11.205:
Introduction to Spatial Analysis
|
March 3, 2021
Massachusetts viewed in Mass State Plane Coordinates (NAD83 meters) | Massachusetts viewed in lat/lon Geographic Coordinate System (WGS84) |
ESRI world shaded relief viewed in Mass State Plane (NAD83 meters) | ESRI world shaded relief viewed in lat/lon GCS (WGS84) |
Example: median 1999 personal earnings from the 2000 US Census
SELECT mageo.SUMLEV, mageo.LOGRECNO, mageo.STATE, mageo.COUNTY,
mageo.TRACT, mageo.BLKGRP,
mageo.STATE||mageo.COUNTY||mageo.TRACT||mageo.BLKGRP AS blkkey
FROM mageo
WHERE mageo.SUMLEV="150";
SELECT ma00007.LOGRECNO, cast(ma00007.P085001 as integer) inc_all,
cast(ma00007.P085002 as integer) inc_male,
cast(ma00007.P085003 as integer) inc_female
FROM ma00007;
P89. POVERTY STATUS IN 1999 BY AGE BY HOUSEHOLD TYPE [39] | |
Universe: Population for whom poverty status is determined | |
Total: |
P089001
|
Income in 1999 below poverty level: |
P089002
|
Under 65 years: |
P089003
|
In married-couple families |
P089004
|
In other families: |
P089005
|
Male householder, no wife present |
P089006
|
Female householder, no husband present |
P089007
|
Unrelated individuals |
P089008
|
65 to 74 years: |
P089009
|
In married-couple families |
P089010
|
In other families: |
P089011
|
... << continue for 39 separate poverty variables >> |
|
Income in 1999 at or above poverty level: |
P089021
|
..<< repeat breakdowns for non-poverty group >> |
|
75 years and over: |
P089034
|
...<< repeat breakdown within age group >> |
|
Unrelated individuals |
P089039
|
Population Characteristics | Housing Characteristics |
Age | Tenure |
Gender | Value or Contract Rent |
Race | Vacancy Status |
Hispanic Origin | Number of Rooms |
Marital Status | Units in Structure |
Household Type | Congregate Housing |
Household Relationship | |
Population Characteristics | Housing Characteristics |
Social Characteristics | Age of Housing |
Education | Heating Fuel |
Citizenship | Facilities |
Ancestry | Vehicles |
Language | Mortgage Status |
Disability | |
Children | |
Place of Birth | |
Economic Characteristics | |
Income | |
Labor Force Status | |
Employment | |
Place of Work | |
Public Assistance | |
Retirement Income | |
The Census organizes and aggregates data into a series of geographic hierarchies
Standard Hierarchy of Census Geographic Entities (from Census 2000 Summary File 1 Technical Documentation, prepared by the U.S. Census Bureau, 2001, p. A-25)
Summary
LevelGeographic Unit 010 United States 020 Region: Northeast (NE), Midwest (MW), South (S) and West (W) Regions 030 Division:
Northeast Region: New England, Mid Atlantic
Midwest Region: East North Central, West North Central
South Region: South Atlantic, East South Central, West South Central
West Region: Mountain, Pacific040 State (includes Washington, D.C. & Puerto Rico) 050 County 060 County Subdivision 070 Place 080 Census Tract / Block Numbering Area
(average 4,000 persons)090 Block Group (average 1,000 persons) 100 Block (average 85 persons)
Summary Level Geographic Unit 040 State (includes Washington, D.C. & Puerto Rico) 050 County 140 Census Tract 150 Block Group
Summary Level | Geographic Unit |
400 | Urbanized Areas |
300 | Metropolitan Areas (MSAs, CMSAs) |
200 | American Indian and Alaska Native areas |
800 | ZIP codes |
Hierarchy of American Indian, Alaska Native,
and Native Hawaiian Entities (from Census 2000 Summary File 1 Technical Documentation, prepared by the U.S. Census Bureau, 2001, p. A-26) |
The most useful files distributed by the Census Bureau are the Summary Tape Files (now renamed simply Summary Files) that aggregate the individual census forms to various levels of census geography. data.census.gov provides a forms-based online interface to many US Census datasets including SF1 and SF3. The FactFinder website is convenient when you want data for a single census tract or a small number of areas. It is also convenient when you want a few percentages (such as percent owner-occupied) that would otherwise require downloading the numerator and denominator needed for your own calculation. If you need to download many variables or data for many areas, you may be better off accessing the core SF1 and SF3 datasets described above via the following links. (The GIS lab in the Rotch Library has many Census CDs and other third-party tools that may also be helpful. Later in the semester, we will also use the MIT Library's online geodata repository that contains direct ArcGIS access to many useful datasets including some US Census data.)
The Census Bureau distributed the 1990 Census files as DBF files on CD-ROMs. The Census Bureau has posted the contents of many 1990 CD-ROMs online. These are available via HTTP and FTP. Also,
In fact, the 1980 STF 1 and STF 3 are now online!
The Census Bureau is distributing the 2000 Census files on CD-ROMs, DVD-ROMs in a proprietary format and online in flat ASCII format via HTTP and FTP.
The 1980 STF 1 and STF 3 files had varieties similar to those of the 1990 Census.
Some Cambridge homes sold more than once during 1989!
How can we change table to have one row per
unique house
-- Need QUERY to aggregate rows referencing the same house:
-- Use 'Layer / Create Layer / New virtual layer' with this query:
SELECT address, count(*) as scount, avg(realprice)as avg_price
FROM SALES89
group by address;
Here are detailed notes for determining unemployment rates from the raw 2000 US Census tables - including importing the raw text files into MS-Access
- In lecture, I will illustrate a simpler example: use 2000 census data to find and thematically map by census block group the median earnings of all Cambridge residents who are workers (and males and females separately)
- Variable P85 is MEDIAN EARNINGS IN 1999 (DOLLARS) BY SEX FOR THE POPULATION 16 YEARS\par AND OVER WITH EARNINGS
- All the data for this example are in ./11.188/data/census2k
- You do not need MS-Access software to access and use the raw Census data; you may import the raw data into other software that can read plain text of CSV-formatted files. the MS-Access templates facilitate adding the headers onto the data tables and specifying data types.
Note: All these steps are NOT needed for class and homework since we have already downloaded and formatted key Mass datasets. We provide you with MS-Access databases into which the needed census data tables have already been imported. The extra steps preceeded by '###' are included here to clarify the complete process in case you have need for some other project to obtain other census tables besides the one used in our exercises.
- How should we measure unemployment rate: Census definition is: " the fraction of adults aged 16 or over who are in the labor force and are unemployed (during the sample week in April 1999)"
- Find the relevant SF3 census 2000 variables: we use the SF3 technical documentation (Ch. 3) to find variable P43: employment status by sex, and the name of the text file that includes the raw data for this variable (ma00004.uf3)
- ### Find and download the zipped datafile that contains P43 for Massachusetts as an ASCII 'flat file' - this file is called: ma00004.uf3
- ### Find and download the zipped datafile that contains the geographic identifiers for Massachusetts - this file is called: mageo.uf3
- ### Find and download the MS-Access templates that will let you pull the ASCII plain-text data into MS-Access:
- Explained in the 'readme.txt' file in the same directory as the zipped data files. Note, that readme.txt also includes the cross-referencing of the census variables (such as P43...) with the text file that bundles the data (such as ma00003.uf3).
- The zipped template for MS-Access 2000 is available from links on this page: http://www.census.gov/support/cen2000_sf3_ascii.html
- ### Import the relevant Mass data into Access tables
- rename the unzipped text files to end in 'txt'
- Open the MS-Access database containing the template (it is called SF3.mdb in the class locker) and use the File/Get-external-data/Import option in MS-Access, with the file type set for text files, and select the unzipped file that you renamed with a 'txt' suffix; (For the 2007+ version of Access, use the External-Data tab and then the 'text file' choice in the 'import' section.)
- In the dialogue box that lets you tell MS-Access how to parse the text file, click 'Advanced' and choose the 'specs' that apply to the particular data file (for example, ma000043)
- Develop MS-Access query to join the geography and P43 tables.
- Here are the variable names that correspond to each of the 15 columns for P43 data
P43. SEX BY EMPLOYMENT STATUS FOR THE POPULATION 16 YEARS AND OVER [15] Universe: Population 16 years and overP043001: Total: P043002: Male: P043003: In labor force: P043004: In Armed Forces P043005: Civilian: P043006: Employed P043007: Unemployed P043008: Not in labor forceP043009: Female: P043010: In labor force: P043011: In Armed Forces P043012: Civilian: P043013: Employed P043014: Unemployed P043015: Not in labor force
- Join the tables using the 'logrecno' column
- Build a state+county+tract+blockgroup 12-digit block group identifier so you can join to the blockgroup map
- Compute the percent unemployed = 100*(P043007+P0430014)/(P043005+P043012)
- Choose appropriate summary level (150) in order to get right counts for block groups
- Refine and use query to pull relevant rows and columns for block groups in all of Mass (or just for Middlesex County if we only want Cambridge and its neighbors north of the Charles River (all of which are in Middlesex County).
- Join tabular data to map of blockgroups for Middlesex County (obtained use MIT geodata tool from Library SDE server)
The four steps above that are marked with ' #' are not needed for the class exercises since we have already built an MS-Access database with the Census variables needed for the lab and homework exercises.
This data extraction and mapping exercise is complicated because the datasets are so large and include so many variables and geographic identifiers. But it is illustrative of the issues and steps involved in (a) understanding very large and highly structured datasets, and (b) using desktop tools to find, download, and mix-n-match geometry and tabular data from different online sources.
Note that the US Census provides many online tools to obtain census data. Likewise, there are many third-party tools and CDs that repackage the data in smaller chunks, with or without maps, and sometimes in pre-processed forms (e.g., after normalizing to percent owner-occupied rather than just as the raw counts). These assorted tools fill many nitche markets. Relatively few census data users understand the data structure and raw files at the level described in these lecture notes - i.e., at the level needed to find and use any of the thousands of columns of data that are available at each level of geography..
Last modified: 8 March 2021 [jf]
Back to the 11.188 Home Page. - Back to the CRON Home Page.