Extracting
&
Querying Census Data plus Site Suitability Analysis
Distributed: |
Wednesday,
March 10, 2021
|
Due:
(at start of class/lab) |
Friday, March 19,
2021 - Question #1 |
Extra
Notes-for Part II |
The main focus of this homework is Question #2 where we undertake a classic Ian McHarg-style 'site suitability analysis'. See this note on spatial analysis "classics" by John Corbett, Ian McHarg: Overlay Maps and the Evaluation of Social and Environmental Costs of Land Use Change. For our hypothetical 'site suitability analysis' example (in question #2), we will find suitable locations for a Cambridge senior center by identifying the possible locations that meet each of several criteria: close to major roads, near the target population, far from hazards, etc.
However, this homework also serves to improve your facility with data manipulation tools including use of 'group by' operation with SQL queries in QGIS. Question #1 focuses on database manipulation and analysis using some of the 2010 census data involved in Question #2. There are only two questions in this homework set but each is more complicated than you might think because of the difficulty of decomposing the problems into doable steps.
BEFORE YOU START WITH QGIS - AND LONG BEFORE THE DUE DATES FOR EACH PART OF THE HOMEWORK - PLEASE BACK AWAY FROM THE COMPUTER AND READ THROUGH THE WHOLE ASSIGNMENT. Getting the 'big picture' first will help you develop a better GIS strategy and will reduce time and energy wasted.
All the data needed for this homework set are available in the class 'data' locker on AFS as indicated in the text. In addition, we have copied the data into a zipped data package called hwk2-package.zip available in the 'Materials' section for the class on Stellar: https://stellar.mit.edu/S/course/11/sp21/11.188/index.html The following files are included in this package:
File | Description |
Census_ACS | Folder containing ACS 2009-2013 census data and boundary files |
ACS_2013_5YR_BG_25_MASS.shp | Shapefile of block group boundaries for Mass as of 2013 |
2010_Mass_Tracts.shp | Tract boundaries with only geographic identifiers |
cb_2013_us_state_500k.shp | 'Tiger' file road centerlines for Massachusetts for 2013 (not required but may help with visualization) |
ACS_0913_B17020.csv | CSV-formatted data-only table with ACS 2009-2013 census variables needed for the homework |
TractToTown.csv | CSV-formatted cross-reference table relating block group identifiers to Mass municipality names |
Shapefiles | Folder of other shapefiles used in homework |
ma_town00.shp | Shapefile of Massachusetts municipalities (same as used previously) |
majmhda1.shp | Shapefile of Massachusetts major roads from MassGIS |
camb_area_lu_1999.shp | Shapefile from MassGIS of land use in and around Cambridge (circa 1999) |
mass_tri_facilities.shp | Point Shapefile of Toxic Release Inventory (TRI) facilities in and around Cambridge (circa 2000) |
The following table of Senior Citizen poverty statistics was developed from the 2009-2013, 5-year, ACS estimates.
DATA
- Find the relevant demographic data from 2009-2013 ACS. Open the relevant 5-year Appendices and search the key word "POVERTY STATUS". You will find table "B17020.POVERTY STATUS IN THE PAST 12 MONTHS BY AGE". There are 16 columns in this table. In order to find out what columns contain what variable we need to examine the table shell for B17020. Read the description of the variables in this section carefully and decide which variables you want to use to calculate the percentage of the below-poverty-level seniors.
- To simplify your homework assignment, we have already downloaded the relevant 2009-2013 ACS table into a CSV-formated text file, ACS_0913_B17020.csv, that is saved in the 'Census_ACS' sub-directory of the class data locker (
Q:\data\Census_ACS\ACS_0913_B17020.csv
) as well as the homework-2 data package on Stellar. We have also included a table TractToTown.csv that cross-references all the tracts that are in the 5-town area in an around Cambridge. The construction of the table is explained below in the 'Appendix'. (We have also saved both tables in an MS-Access database: hw2_ACS_lite.mdb and included that in the package for anyone familiar with MS-Access.)
- For this exercise, senior citizens are persons 60 years old or older.
Question 1a: Fill in the empty cells in the following partially completed table and answer the questions below. Instead of using QGIS, you are free to use other database management software (such as MS-Access, postgres, or even Excel or ArcMap). Note that you will have to make careful use of the 'group by' operation in your SQL queries (or equivalent 'total' or 'summarize' commands in MS-Access or ArcMap) in order to compute the correct values for many of the columns. Note, also, that all five towns are in the same County, namely Middlesex County in Massachusetts.
Summary of Senior Citizen Poverty Statistics for Cambridge and Abutting Towns, 2009-2013
Town
Number of tracts
Number of seniors with determined poverty status
Average of percentages calculated for each tract
Total
Tracts with some population for whom poverty status is determined
Tracts with seniors for whom poverty status is determined
# of seniors below the Federal Poverty Level
Percent of population with a determined poverty status that are poor seniors
Arlington
8
631
Belmont
8
8
1.13
Cambridge
32
2.03
Somerville
18
1282
Watertown
6
6
6
651
2.05
1.93
Overall
72
1.77
Now, take a look at the last two columns and note that the overall percentage of impoverished seniors within a town is generally different from the town average of the percentages that one can calculate separately for each tract within that town. Answer the following three questions (in a short paragraph or two each).
Question 1b: Explain briefly why the overall average (the second to last column) and the average of the tract percentages (the last column) could be different.
Question 1c: In this instance which method (that is which of the last two columns) tends to be larger? Why?
Question 1d: Which set of numbers are most appropriate for use in our analysis for Question #2? Why?
Turn in the table with all your computed values added and write at most a paragraph or two for each question - 1b, 1c, and 1d.
Tips: Filling in the table looks simple enough -- just run some of those 'Field calculator' commands for the requisite tracts. But it will take some thought to determine what each column is measuring, which census rows to include and how to do the summarizing. Another tricky part of this question, which we've taken care of by providing TractToTown is figuring out which tract is within which town. The tracts can span a town boundary, or the boundaries of towns and block groups might not line up exactly.. For this exercise, we define a tract to be "in" a town if the centroid of the tract is inside the town boundary. (See Appendix I below for information about various ways to examine the spatial relationship between tracts and towns.)
A local non-profit group is interested in locating a site for building a senior center in Cambridge. Given your expertise in GIS, you are hired as a GIS analyst by this company to help them locate the best site. After a long meeting with the organization and the community you agree to run some numbers in order to get a handle on the locations and characteristics of potentially suitable Cambridge sites. You settle on the following criteria to get rolling with your site selection process:
Use QGIS and the various layers that we have used in class exercises to undertake a basic site suitability analysis. (Feel free to augment your maps with some other map layers stored in the class data directory in order to improve the visual quality of the presentation - but stick with the site selection criteria listed above).
Question 2a: First, prepare 3 maps showing the locations that are acceptable based on the criteria identified above. Map these criteria separately:
- proximity to major roads and distance from the TRI Facilities.
- appropriate land use characteristics
- tracts having a high percentage of seniors below the poverty level.
Question 2b: Next, prepare a fourth map that shows the Cambridge locations that meet all these criteria as well as the 1-continguous-hectare constraint.
Question 2c: Along with the maps, provide a page or two (not a treatise) of discussion concerning:
- any choices or interpretations that you made in generating the suitable locations and which you feel bear some explanation, and
- your conclusions from this initial analysis regarding suitable sites. Regarding your conclusions, don't just pick one site--there isn't a definitive 'best site' given the criteria we've suggested. Be sure to include some interpretation regarding the extent to which the analysis helps you focus on one part of town, on one or another criterion, on a set of proximity issues that might forecast special interest concerns/complaints, etc. Would you suggest tightening or relaxing some of the criteria? Can you suggest important considerations that are not well captured in this suitability analysis?
Hand in a discussion of the answers to question 2c along with the four maps. That is, what you hand in should be a short report on your facility siting analysis that explains what you did, interprets your results, and discusses your conclusions - with the four maps referenced in your text and included in-line after each is mentioned (or all together on separate pages at the end of the text).
FAQ and other Suggestions
Need Help?
If you need help, and you think that your question might be of interest to the whole class, please post your question on Piazza. We strongly encourage you to use this option since this will be the quickest way to get help. If you would prefer to ask just the class staff, send e-mail to 11.188@mit.edu. Please don't be shy about asking help if you are struggling with understanding the assignment or having trouble with a particular aspect of ArcMap. In the past, we've heard stories of students struggling for many hours over minor issues that had easy solutions. We'd like to avoid these misadventures, so please contact us sooner rather than later if you get stuck. (When you finally get unstuck, spend a moment reflecting on what you could have done to get unstuck earlier if you had the vocabulary, GIS understanding, or roadmap of QGIS to use the various help files and references.) Also, we recognize the value of group work and encourage study group discussion of the homework as well as lab exercises - but we require that you turn in individual work that reflects your own learning and hands-on discovery.
As indicated in the homework text, the Massachusetts town boundaries do not line up precisely with the US Census tract boundaries even though, in almost all instances, the tracts fall within a single town. These differences are known as a 'sliver problem' and, in this case, the reason for the problem is that the tract boundaries are much less detailed and precise than the Massachusetts Town boundary layer. The following graphics illustrate the problem. [They use a shapefile 5Towns.shp containing the borders of the five municipalities in and around Cambridge and north of the Charles River. This shapefile was exported from the MassGIS matown00.shp shapefile.
The image above shows the five towns in and around Cambridge (and north of the Charles river) together with all the tracts that are selected using "Select by Location" function with "intersect" specified as the topological relation between two layers. Many tracts outside of the five-town boundary are also included! If you zoom into a small area of a town boundary, as shown below, you will see why so many extra tracts are selected. The two themes have different levels of detail and there is a 'sliver' problem in trying to reconcile the common boundaries.
For your convenience, we have provided in the ms-access database, hw2_ACS_lite.mdb, a cross-reference table, TractToTown, that was created by doing a "Select by Location" operation for each of the five towns, with "have their center in" as the topological relation. The table has two critical columns for this assignment: GEOID10 has the state+county+tract identifier and TOWN has the corresponding town name. For example, the image below shows the tracts that "have their center in" Cambridge. A new field "Town" is created and its value is set to be "Cambridge" for those tracts.
This method works when only a handful towns are considered. If you want to analyze hundreds of towns, you don't want to undertake this town-by-town selection process by hand. There are more advanced tools to solve this problem such as creating a new shapefile that contains points representing the centroid for each tract in Middlesex County and then doing a spatial join between this new centroid layer and the town layer. However, creating the centroid layer from the tract shapefile is too much of a distraction for this homework set.
Written in
1996-2001 by Kamal Azar, Joseph Ferreira, and Tom Grayson
Modified by Myounggu Kang in October 2002 to incorporate
Census 2000 data
Modified by Eric Schultheis in February 2015 to incorporate
2009-2013 ACS data
Modified 2003-2015 by Jeeseong Chung, Jinhua Zhao, Shan Jiang,
Eric Schultheis and Joe Ferreira.
Modified by Juan Camilo Osorio and Hongmou Zhang on 06 March,
2017.
Last modified by Joe Ferreirai on March 9, 2021.
Back to the 11.188
Home Page. - Back
to the CRON
Home Page.