Massachusetts Institute of Technology
Department of Urban Studies and Planning


11.188: Urban Planning and Social Science Laboratory

11.205: Introduction to Spatial Analysis

In-Lab TEST - April 6, 2020 -with Answers


Test Instructions

Good luck!

Datasets for the Test

For this test, we will use US county-level data from the November 2016, US presidential election (between Donald Trump and Hillary Clinton).. The relevant data for the test is bundled together in a folder called test20data in the AFS class locker.  The folder is also compressed into a 'zip' file called, test20data.zip, in our '11.188 2020 DATA' Dropbox folder.  Copy test20data.zip to a writeable space on your local drive (e.g., C:\TEMP) and extract all its files into a local folder: C\temp\test20data.

Look into your local copy of test20data for an ArcMap document called 11.188_test20_start.mxd that has already included some of the shapefiles and tables that you will need. This ArcMap Document utilizes one new Metropolitan Statistical Area shapefile from the US Census plus several shapefiles that were obtained from the MIT Library Geodata Repository and then projected to a North_America_Albers_Equal_Area_Conic projection (NAD 1983) projection (in meters). (The standard EPSG code for this coordinate system is 102008 so I have included it in the shapefile names, but you will not need to use this fact.) For each of these shapefile, only the contiguous 48 United States were saved (the "lower 48").

Filename

Description

us_a1city_2000_lower48
_epsg102008.shp

Point feature shapefile showing the location of major US cities (within the lower 48 states) having 2000 US Census population of at least 50,000.

us_f7states_2006_lower48
_epsg102008.shp

Polygon feature shapefile showing state boundaries for the lower 48 states.

us_e25msa_2010_lower48
_epsg102008.shp

Polygon feature shapefile from US Census Bureau of the US 2010 Census boundaries of metropolitan statistical areas (MSAs). The original shapefile has been limited to the lower 48 states and projected to the EPSG=102008 coordinate system.

us_f7counties_1996_lower48
_epsg102008.shp

Polygon feature shapefile from US Census Bureau of the 1996 County boundaries within the lower 48 states together with the state, county name, Federal Information Processing Standard code (FIPS), and estimated 1990 and 1996 population.

2016_US_County_Level
_Presidential_Results.csv
(Optional) A comma-separate-value (CSV) text file containing the 2016 US Presidential Election results by County showing the votes for Donald Trump and Hillary Clinton, the total votes, the percentage of votes for these two candidates, the difference in votes, and the percentage point difference in their votes.  The same data are duplicated within the personal geodatabase (below) in the election16_county table so you do not need to use this CSV file.

election16.mdb

An MS-Access database (that is usable as a personal geodatabase) containing two key tables: (1) election16_county with county-level presidential election results and a few 5-year 2015 ACS census variables, and (2) election16_county_data_dictionary explaining the meaning of each field in election16_county. The rest of the tables (GDB...) are extra geodatabase tables that ArcMap utilizes.

Within the ArcMap document, 11.188_test20_start.mxd, the several state, county, MSA, and city shapefiles have already been added. The counties are shown thematically using the default (Jenks Natural Breaks) classification broken into 10 categories based on the 1996 population of each county. The county boundaries are omitted in order to avoid map clutter and make the map more readable. Data are shown for the 3111 counties in the lower 48 states.

In addition to the shapefiles, spreadsheet, CSV file,, and ArcMap document, the test20data folder also contains an MS-Access database, election16.mdb, with the ELECTION16_COUNTY table of county-level presidential election results along with selected census data extracted from the 5-year 2015 ACS census. For your information, the MS-Access database also contains a data-dictionary, election16_county_meta, explaining the columns in the election16_county table.  Even if you do not have MS-Access installed on your local machine, you may connect to these tables from ArcMap.

The 2016 election dataset was obtained from https://github.com/tonmcg/County_Level_Election_Results_12-16. Portions of the CSV file with election data were then imported into MS-Access and joined with ACS census data to produce ELECTION16_COUNTY. The 5-year 2015 ACS census data included with the election data come from the S1701 table on Poverty Status and the S2301 table on Employment and have been downloaded from the ACS website: https://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?src=bkmk. In addition, we have added population counts for years 2000 and 2010 to the MSA shapefile, us_e25msa_2010_lower48_epsg102008.shp. These population data were obtained from an Excel spreadsheet downloaded from: https://www.census.gov/population/www/cen2010/cph-t/cph-t-7.html. [In order to join the population data for Los Angeles with the most similar shapefile polygon, we had to change the CBSA_code for Los Angeles-Long_Beach-Santa_Ana from 31100 to 31080 (Los_Angeles-Long_Beach-Anaheim). This editing issue is not relevant for the test.]


Part I: Short Queries (32 Points)

Question I-1 (28 points total, 4 points each part) - Queries of election16_county

Question I-2 (4 points total, 2 points each) - Queries of us_a1city_2000_lower48_epsg102008.shp

These answers are obtained by sorting the attribute table or by using the Field/statistics and Field/summarize tools in ArcMap (or doing equivalent queries in MS-Access).


Part II: Mapping Concepts (6 Points)

Question II-1 (4 points)

When you first open the map document, the counties are thematically shaded using the default (Jenks Natural Breaks) classification broken into 10 categories based on the 1996 population of each county.  Explain briefly (a) why so much of the country is in the same shade of yellow, and (b) why the same map looks very different when shaded using quantile categorization with the same number of categories (10).

The vast majority of US counties have small populations and are lumped together with yellow shading when Jenks Natural Breaks is used.  Quantile classification puts an approximately equal number of counties in each of the 10 groups so there is much more differentiation in the shading but, because of the skewed distribution of population across counties, the population differences are not that large across many of the less populated counties that fall into several of the 10 quantile groupings.  

Question II-1 (6 points total, 3 points each)

When you open the map document, the county map is displayed using a North_America_Albers_Equal_Area_Conic projection (NAD 1983) rather than in geographic latitude-longitude coordinates. As we know, local planning agencies typically use projected coordinate systems. For example, the MassGIS mapfiles that we have used are generally saved in the Massachusetts (mainland) State Plane coordinate system.

Part II-1a (3 points): Suppose we were to view the same US County map in latitude-longitude coordinates. Would the size of Washington state (in the northwest corner) appear to be larger, smaller, or the same relative to the size of a more southern state such as Florida? Explain Briefly. [Note: Feel free to change the coordinate system to see the results. However, there is no need to change the coordinate system and submit a map. You need only provide a brief answer and explanation.]

Washington would be relatively larger when displayed using lat/lon coordinates because it is further north of the equator than Florida, so the longitude values would be exaggerated relative to Florida. The following graph from Wikipedia illustrates how circles of equal area look like at different locations on the map when plotted using lattitude-longitude coordinates.

Part II-1b (3 points):  Suppose we were to view the same US County map in the Massachusetts (mainland) State Plane projection. Would the map be rotated clockwise or counter-clockwise relative to the original orientation when you opened the ArcMap document? Explain Briefly. [Notes: there is no need to change the coordinate system and submit a map. You need only provide a brief answer and explanation.]

It would be rotated clock-wise, so Massachusetts is more-or-less horizontal (with North straight up) and the western states are rotated clockwise.


Part III: Election Map (30 Points)

Let's develop a choropleth map showing the percentage of votes for Donald Trump in each county. Add the election16_county table to ArcMap, join it to the shapefile of 'lower-48' counties and design your map. Note that election16_county lists the county FIPS code as text (field= fips_txt) as well as an integer (field=combined_fips). We added the fips_txt field for your convenience. It is not in the original CSV file. Be careful to select the appropriate data type when you join the table to your shapefile.

There are 3139 records in election16_county table and 3111 records in us_f7counties_1996_lower48_epsg102008.shp. Note that election16_county data is missing from a few counties (such as Dade County, FL, fips 12025) and includes counties in states like Alaska and Hawaii that are not part of the 'lower 48'). So every row in election16_county will *not* match a row in us_f7counties_1996_lower48_epsg102008.shp and vice versa. After you join election16_county to the shapefile, you should see 3105 non-null values for the election results among the lower-48 counties.

In addition, please notice that election16_county has many columns, most of which are not needed for the test. We have made these available just in case you want to explore this data beyond the test. This may be interesting for some of you, depending on your research interests.

Question III-1 (14 points total)

MAP #1: Prepare and submit an ArcMap layout of the counties within the lower 48 states and shade the counties based on the percentage of votes that were for Donald Trump. Be sure to:

NOTE: This question is worth 15 points and provides an opportunity to demonstrate the cartographic skill that you have developed.

Sample maps selected from student submission with names blocked (We use these maps for illustration. Please also read the comments on the sample maps).

Map #1-1




Map #1-2
 

Question III-2 (6 points total)

Part III-2a (3 points): :Briefly discuss your choice of classification scheme and number of categories for your election results map.

Part III-2b (3 points): Briefly discuss any spatial pattern that you observe in your map regarding Trump's percentage of votes among the US counties.

Trump had higher percent in counties outside of big cities, especially in the north-south swath along the great plains and in Appalachia where his support was particularly strong.

Question III-3 (10 points total)

Part III-3a (2 points): What is average (mean) value of the percentage of votes for Trump (per_gop) among all those 'lower 48' counties for which you have the voting results joined to your map:: ___63.7%___?

Part III-3b (4 points): Select those counties that contain the cities that have POP>=250000. How many counties are selected: ___60 (59 couties is acceptable if you note that Miami is larger than 250k but  is in Dade County which has Null for the vote count)___? What is the average (mean) value of the percentage of votes for Trump (per_gop) among these counties with the big cities: ___33.3%___?

Part III-3c (4 points): Using the county-level ACS census data joined to your county shapefile, determine the average (mean) percentage of persons below the poverty level (pct_below_pov) for those counties containing the larger cities ____17.6%____ and for those counties outside the larger cities ___16.7%___.


Part IV: Proximity to Big Cities and Metro Areas (32 Points)

Next, let's examine the presidential voting pattern for counties in and around the larger cities.

Question IV-1 (6 points total)

Create a buffer of 50 km radius around the cities with POP>=250000. Select all the counties that intersect your buffer. (In particular, select the counties that have their centroid within your buffered large cities.) Note: 65 counties contain the 65 cities that have POP>=250000; but 248 counties have their centroid within the 50km buffer of these large cities; and 83 counties are completely within the buffered large cities.  However, 504 counties intersect the large city buffer using the default 'intersect the source layer feature' option for the spatial join.

Part IV-1a (4 points): How many counties fall within your buffered large cities: ___248 (out of 3,111)___?  If, instead, your overlap criteria were having any or all of the county area within your buffer, how many counties would satisfy that criterion:  _______504 (out of 3,111)________?

Note: We also allowed 247 and 503 since you get those numbers if you exclude Washington D.C. which is not technically a county.

Part IV-1b (2 points): Among those counties having any or all of the county within the large city buffer, what is the total number of votes for each candidate (i.e., the sum of votes_gop, and the sum of votes_dem): Trump votes: ___20,206,873___, and Clinton votes: ___31,531,543____?

Question IV-2 (14 points total)

Part IV-2a (4 points): Based on the 2010 MSA population (pop2010) field in the us_e25msa_2010_lower48_epsg102008 shapefile, select those Metropolitan Statistical Areas that had a population of more than one million persons. How many MSAs met this criterion: ___51 (out of 909)___? Which MSA in this million-plus group had the smallest population: CBSA_code = ___40380____? and Name = ____Rocheser, NY_____?

Part IV-2b (2 points): Next, select those 'lower-48' counties that have their centroid within those MSA that have a population of one million or more. How many counties meet this criterion: ___429 (out of 3,111)___?

Part IV-2c (2 points): Among these counties that have their centroid within the larger MSAs, what is the average (mean) value of the percentage of votes for Trump (per_gop): _____53.6%______? and for Clinton (per_dem): _____41.7%______?

Part IV-2d (2 points): Among those counties within the larger MSAs, what is the total number of votes for each candidate (i.e., the sum of votes_gop, and the sum of votes_dem): Trump votes: ___27,513,667___, and Clinton votes: ____38,513,996____?

Part IV-2e (4 points): Upon examining the last two questions, we notice that the average vote percentage (in the counties within the larger MSAs) is higher for Trump, but Part IV-2d shows that Clinton earned more votes in total among those counties within the larger MSAs. Explain briefly what is going on that allows this to occur:

The total votes in each country are very different because of population size differences.  Low-pop counties voted overwhelmingly for Trump. Hence, the unweighted average of per_gop is much higher than the population-weighted average.  In this case, Trump had a higher percentage of votes than Clinton in most counties, but Clinton won the overall vote by 39 to 28 million in these 51 counties.

Question IV-3 (12 points)

MAP #2: Prepare and submit a second ArcMap layout of the counties within the lower 48 states and once again shade the counties based on the percentage of votes that were for Donald Trump. This time, be sure to:

Sample maps selected from student submission with names blocked (We use these maps for illustration. Please also read the comments on the sample maps).


Map #2-1


Map #2-2



Map #2-3


That's all for the test, but feel free to keep the election data and explore some of the election patterns further when you have time. We do not have time on the test to examine further the relationship between voting, demographics, and health exchange participation and subsidies.


Please note:



Last modified 1 April 2020 by Hongmou Zhang & Juan Camilo Osorio.

Back to the 11.188 Home Page.
Back to the CRON Home Page.