| 11.520: A Workshop on Geographic Information Systems |
| 11.188: Urban Planning and Social Science Laboratory |
For this test, we will use US county-level data from the recent November 4, 2008, presidential election. In the usual manner, mount the class locker as drive M, and copy the entire (readonly) folder from M:\test08data to writeable space on your local drive (e.g., C:\USERTEMP in Room 37-312 and C:\tmp or C:\workspace in Room 9-251). (In Room 9-251, the class locker may not mount as Drive M:\ and you may have to navigate down Drive Z to find Z:\afs\athena.mit.edu\course\11\11.520\test08data )
Look in your copy of the folder for an ArcMap document called 11.520test08_start.mxd that has already included many of the shapefiles and tables that you will need. This ArcMap Document utilizes the following shapefiles that were obtained from the MIT Geodata Repository and then projected to a North_America_Albers_Equal_Area_Conic projection (NAD 1983) projection (in meters). (The standard EPSG code for this coordinate system is 102008 so I have included it in the shapefile names, but you will not need to use this fact.) Only the contiguous 48 United States were saved (the "lower 48"). Four shapefiles and one dBase file are referenced:
Filename Description us_a1city_2000_lower48_epsg102008.shp Point feature shapefile showing the location of major US cities (within the lower 48 states) having 2000 US Census population of at least 50,000. [For your information only, not needed for any questions.] us_f7states_2006_lower48_epsg102008.shp Polygon feature shapefile showing state boundaries for the lower 48 states.
us_e25msa_2000_lower48_epsg102008.shp Polygon feature shapefile of the US 2000 Census boundaries of metropolitan statistical areas (MSAs) within the lower 48 states. . us_f7counties_1996_lower48_epsg102008.shp Polygon feature shapefile from US Census Bureau of the 1996 County boundaries within the lower 48 states together with the state, county name, Federal Information Processing Standard code (FIPS), and estimated 1990 and 1996 population. ELECT08C.DBF A dBase-formatted table containing two-digit state abbreviation (st_code), state name (state), county name (county), and county FIPS code plus the county-level vote count for John McCain (McCain) and Barack Obama (Obama). The vote counts come from MSNBC reporting of the 2008 Presidential Election data for each US County as published online at "ManyEyes" on 11-11-08: http://manyeyes.alphaworks.ibm.com/manyeyes/datasets/2008-vs-2004-presidential-election-r/versions/1
[This is an interesting IBM-sponsored site for sharing data and visualizations.]Within the ArcMap document, 11.520test08_start.mxd, the election data in elect08c.dbf has already been joined to the county shapefile using the FIPS code. Data are shown for 3107 of the 3112 counties in the lower 48 states. Votes in the District of Columbia and Yellowstone National Park are not included. In addition, coding differences regarding the handling of county vs. center-city FIPS codes prevent the election data from matching up with the county shapefile for Miami-Dade County and two center-cities in Virginia: Clifton Forge and South Boston. The omission of these 5 counties will not matter for the purposes of this test.
In addition to the shapefiles, dBase table, and ArcMap document, the test08data folder also contains an MS-Access database, 11.520_election08.mdb, with two tables: election08_county is the same as ELECT08C.DBF and election04_county is a similar table with 2004 presidential election results for George W. Bush and John Kerry together with some additional demographic data about the US counties. The data dictionary for the columns in both tables is visible in the 'design view' of the tables within MS-Access. We have also included a raster grid layer, nc_grid, in the test08data folder that divides North Carolina into 10 km grid cells. This grid layer will be used in the last question.
When you open the ArcMap document, you will see, in the Data Frame, a thematic map of vote counts for McCain that uses the US county shapefile after it has been joined to the ELECT08C table. The thematic map shades the number of McCain votes received in each county using a red-to-blue color ramp with 5 categories of 'natural break' classification.
For some of the maps you create, we request a color ramp varying from red to blue (as in the ArcMap document that we provide). Note that ArcMap provides tools that allow you to 'flip' the color scale in the event that your red-to-blue scale has red at the wrong (Obama) end rather than at the Republican (McCain) end. For example, in the 'symbology' window for 'graduated symbols', you can click the 'Symbol' column heading and it will provide an option to 'flip symbols'. You may find these features helpful to get the shading that you want.
These answers are obtained by sorting the attribute table or by using the Field/statistics and Field/summarize tools in ArcMap (or doing equivalent queries in MS-Access).
This election map is displayed using a North_America_Albers_Equal_Area_Conic projection (NAD 1983). Local planning agencies typically use projected coordinate systems. For example, the MassGIS mapfiles that we have used are generally saved in the Massachusetts (mainland) State Plane coordinated system. Explain briefly (a) why local agencies prefer to display maps in projected coordinates rather than lat/lon, and (b) two noticeable changes in the visual appearance of the US election map if it were instead displayed in a lattitude/longitude coordinates (such as geographic coordinate system, World, WGS 1984).
I-2-A: Latitude/Longitude measurement allows one consistent coordinate system to be used for measurements anywhere on Earth. However, plotting lat/lon as coordinates on a flat two-dimensional surface leads to distortions of area, angle, and distance measurements. Local agencies use projected coordinates with a projection method that keeps North straight up in the local area and does a better job of preserving distance measurements or areal measurements in the local area that is their focus of attention.
I-2-B: One change is that the northern border of the US (and of North Carolina) would be horizontal instead of curved (and, for North Carolina, higher on the east side since the Albers projection is centered in the middle of the US). Another change is that the map would be flatter and wider in the lat/lon presentation because, for the lattitudes of the US, one degree of latitude is a much longer distance on the ground than one degree of longitude (i.e., as you approach the North Pole, one degree of longitude becomes vanishingly small).
Part I-3A (5 points): In the ArcMap document, the 'Bad Election Map A' map looks strange. It shades votes for McCain using a natural-break classification. We all know that the Republicans are stronger in the rural areas and the news reports tend to use red colors where Republicans won and blue colors where Democrats won. This map does use red for the counties where McCain received more votes and blue where he received fewer votes. Yet the map is mostly blue in the rural parts of the country and red along some of the coasts. Explain briefly why this is the case.
The map shading is based on the raw vote count for McCain in each county. Counties are shaded red only if McCain receives a very high number of votes compared to his votes in all other counties. The number of votes in each county is highly skewed and McCain did relatively well in counties with relatively small populations. So most of the blue counties in Bad_Election_Map are small-population counties that McCain won depsite receiving few votes relative to the large-population counties on the coasts. The distribution of votes is so skewed that the 'natural break' classification puts most counties at the low-vote end of the spectrum - and, therefore, shades them blue.
Part I-3B (5 points): Explain briefly your choice of attribute field, symbology, and classification method in order to display a thematic map using the data in ELECT08C.DBF that presents a better indication of the geographic pattern of the voting outcomes for the contest between John McCain and Barack Obama. (You do not need to turn in a map at this point - just explain what you would do and why.)
To portray the outcome of the election we should look not at the McCain votes in each county, but at the percentage of votes for McCain in each county. (If there were significant third party candidates, the choice of measure for McCain's 'support' could be more ambiguous.) In this case, I would choose to shade the map based on the McCain-percent of votes with quantile classification so we see the upper, middle, and lower ranges across counties. Even so, many sparsely populated counties (in the central US) are large, and that fact leads to an areal distortion.that exaggerates that impact on the vote count of these larger, low-population counties. One way around this is to plot the vote as a pie chart for each county with the size (i.e. area) of the pie proportional to the county population and the shading of the pie showing the McCain/Obama split. One might argue for another choice of classification scheme or a slightly different measure. For example, you could look at the difference in votes between Bush and Kerry and use standard deviations for the classification - this map would look less red/blue since most counties are closer than 60/40. We can accept more than one choice if reasonably argued. [If you would like to see further exploration of state and county level voting patterns check out the other maps and datasets on the 'Many Eyes' site: http://manyeyes.alphaworks.ibm.com/manyeyes/datasets/2008-vs-2004-presidential-election-r/versions/1] Your answer does not need to be this long winded!
For this portion of the test, you will want write access to some of the test data so be sure you have copied the test08data folder to a local drive before you use ArcMap to open your copy of the 11.520test08_start.mxd document on your local drive. Our questions will focus on North Carolina, one of the large 'battleground' states that was heavily contested in the recent election.
Create a 'Definition Query' for the US County map layer so that only those counties in North Carolina are included. (For the rest of the test we will focus only on North Carolina and this restriction will speed up the processing. For even faster processing, you can export the counties of North Carolina into a new shapefile for use in the remainder of the test.) Also change the coordinate system of the Data Frame so that, instead of North_America_Albers_Equal_Area_Conic projection, you use 'NAD_1983_StatePlane_North_Carolina_FIPS_3200' (Hint: Set the properties of the Date Frame to be the appropriate pre-defined coordinate system.) Zoom in on North Carolina and shade the counties based on the percentage of the votes that Obama received. (Note that ELECT08C.DBF only has the votes for McCain and Obama and not for third-party candidates that were on the ballet in various states. For the purposes of this test, we will ignore all third party candidates so you should compute the percentage of votes for Obama as equal to (100 * [Obama votes]) / ([Obama votes] + [McCain votes]). Use quantiles with 5 categories and a red-to-blue color range with blue indicating more support for Obama. Turn on the MSA layer with 50% transparency [see the Display tab in the Layer Properties dialog window] and an appropriate color or shading pattern so you can both read the thematic map and visually distinguish which counties fall within the MSAs.
Turn in a PDF file showing a layout view of the North Carolina thematic map that you create. Be sure to have your name and Athena userid on the map. Also be sure to project the map to North Carolina State Plane coordinates (NAD83) and have the MSA layer on top with symbology that makes them clearly visible. Include a North Arrow and legend as well. The data sources are 2008 US Presidential Election (MSNBC data reported in ManyEyes) and the MIT GeoData Repository.
Note that the range for Obama's by-county percentage is 38-76% within North Carolina but 2.9-89% among the lower-48 states. The northern border of North Carolina is slightly curved but not rotated because the Data Frame is using the North Carolina Projection. Using a hatch pattern fot the MSAs helps to make them visible on top of the thematic map..
The newspapers have made a lot of the urban/rural dichotomy - Obama was strong in the cities and Bush was strong outside the cities. Let's look at the vote in counties that are within vs. outside the MSAs:
Part II-2A (8 points): Highlight those North Carolina counties that have their centroids outside the MSAs. How many North Carolina counties have their centroids outside MSAs? ___65_____ Turn in a PDF file showing a layout view of North Carolina counties and MSAs with those counties clearly highlighted that have their centroids outside the MSAs.
Use 'select by location' to find the counties that intersect with US Interstates. It was not necessary to shade the counties thematically for this map as long as you highlighted those counties in North Carolina that are outside the MSAs. Again, it is hard to get the highlighting and MSA shading to be visible along with the thematic mapping. Many different visualizations are acceptable as long as the key points are readable (i.e., the system being able to identify which counties are outside the MSAs).
Part II-2B (11 points): Examine the attribute table of election results in order to fill in the eleven blanks in the following table
North Carolina Number of Counties McCain Votes Obama VotesDifference
(Obama-McCain) Total Votes McCain Percent Obama PercentIn MSAs 35 1,418,277 1,555,474 137197
2,973,751 47.7% 52.3%Outside MSAs 65 683,560 560,864 -122696
1,244,424
54.9% 45.1%All counties 100 2,101,837
2,116,338
14,501
4,218,175
49.8% 50.2%You can compute these numbers by sorting and summarizing the tables in ArcMap or via queries in MS-Access.
Part II-2C (4 points): Briefly interpret your results. Is McCain a lot stronger in those counties outside the MSAs?
Obama is indeed stronger in those 35 counties within the North Carolina MSAs - he won them with a 52.3 to 47.7% split. Because the counties within the MSAs accounted for three-quarters of the state's votes (1,555,474 out of 2,116,338), that narrow split was just enough to make up for the poor showing (54.9% vs. 45.1%) outside the MSAs. We did not use the City layer in the test but it is available in the test08data folder. You might want to test other hypothesis - e.g., that the further a county is from the big cities, the bigger the percentage difference for McCain. You might also examine the demographics of the counties and see if ethinicity, income, family, size, etc. are correlated with election outcome. Note several complications due to the geography of the state: (1) many small counties in the northeast favored Obama and the red/blue split appears to be as much an East/West phenomenon as an MSA/rural split, (2) the MSA/rural split is like compounded by ethnic splits since exit polls show that 90+% of affrican-americans voted for Obama and that minority population is not spread evenly across the state, and (3) the outer banks of North Carolina along the coast raise questions about the distance metric to use for measuring proximity. Perhaps it would be better to use road miles to the city rather than straight-line distance.
Next, consider the election04_county table (in the MS-Access 11.520_election08.mdb database). This table shows the number of votes that George W. Bush and John Kerry received in each US county in the 2004 presidential election. Compute the total number of votes for Bush + Kerry in each county and then the compute the difference in the total number of votes (for Republican and Democratic candidates) for the 2008 and 2004 elections. Call this difference delta_votes = (2008 total) minus (2004 total). Use the FIPS code to join this table to your county map. Now, let's examine whether the changes in turnout between the 2004 and 2008 election across the North Carolina counties favored McCain or Obama.
Part II-3A (4 points): What is the total number of 2004 votes for either Bush or Kerry across all North Carolina counties? __3487015 ___
Part II-3B (2 points each): What county in North Carolina had the largest increase in votes cast (for Republicans or Democrats) in the 2008 election compared with the 2004 election.? That is, what North Carolina county had the largest delta_votes? County FIPS = __37183____? County Name = ___Wake_______? delta_votes = ___83972_____?
Part II-3B (3 points each): What is the sum of delta_votes for those North Carolina counties outside the MSAs? __179776________? What is the sum of delta_votes for those counties in the MSAs? __551384____________?
Although it is easy to connect to the MS-Access database in order to add the election04_county table to ArcMap, it is then hard to add columns to that table and the system does not let you export the table to a dBase (DBF) file. When preparing the test, I did the delta_votes calculations on the MS-Access side so I never noticed this limitation. One workaround in ArcMap is to join the table to a shapefile and then export the shapefile - it will let you do that. However, I did not intend to force you through these hoops on the test and that is why I sent the email about providing the exported DBF file for you.
Instead of simply determining which counties are inside or outside of MSAs, we decide to measure distance from MSAs. Let's use ArcMap's spatial analyst for this purpose. To save you time, we have already rasterized North Carolina into 10 kilometer grid cells (using the same North Carolina State Plane, NAD-83, projection mentioned above). We have saved this coverage for your use under the name: nc_grid. (The cell values in the grid are the FIPS codes of the Country containing the center of the grid cell. This encoding makes it easier to see County boundaries when shading the grid cells but you will not need to utilize these grid cell values for the test.)
Use the Spatial-Analyst/Distance/Straight-line function to compute a new raster layer whose cell value is the straight-line distance (of the center of the grid cell) to the nearest MSA boundary within North Carolina. (Note: The grid cell distance computation will measure distance to all MSAs that are not 'masked off' by whatever mask you have set. So, if you set the mask to be nc_grid, then the distance operation will only consider MSAs within the North Carolina grid.) BEWARE of the usual spatial analyst cautions - you will not be able to do this raster-cell distance computation unless the 'spatial analyst' extension is turned on in Tools/Extensions and the 'spatial analyst' toolbar is turned on in View/Toolbars, and the Data Frame is set to a projected coordinate system, and you have set all the usual properties in the Spatial-Analyst/Options dialogue box. (So much for simple spatial analysis!) [The nc_grid layer is already in the desired projected coordinates with the appropriate cell size and the like - just be sure your Data Frame is using the same projection.]
We don't have time on this test to do much raster exploration of the voting data. It will be enough to do the following:
Part II-4A (6 points): Determine that grid cell within North Carolina that is farthest from any MSA (within North Carolina). What is the distance from that grid cell to the nearest MSA? ___143.178 km______ What county is underneath the center of that grid cell? ____Cherokee_______ What is delta_votes for that county? ___1187__________
Part II-4B (6 points): Shade the grid cells using a red-to-blue color scale with red for those cells farthest from MSAs. Turn in a PDF file of your North Carolina map while taking care to set the transparency so that shaded grid cells are visible underneath the MSAs. Be sure to highlight that grid cell from Part II-4A that is farthest from an interstate. (Hint: In case you aren't familiar with setting the transparency level for the buffer, you can set it from the display tab of the 'layer properties' window).
The next map shows the distance-to-MSA result for all MSA whether or not they are within North Carolina. The cell that was furthest in the previous map is close to an MSA in the neighboring state (Tennessee) and the cell furthest from any North Carolina MSA is along the Atlantic coast in the Outer Banks:
Notice that, for this last map, we chose to use 10km grid cells that had been constructed using the original North America Albers Equal Area Conic projection (rather than North Carolina State Plane) but the Data Frame uses North Carolina State Plane. As a result the grid cells are rotated clockwise (do you see why?).
That's all for the test, but feel free to keep the election data and explore some of the election patterns further when you have time. Try using 'zonal statistics' to get the average distance of each rural county from MSAs, then plot a scttergram of distance versus percent of vote for McCain. Did Obama increase the 2008 turnout (compared to 2004) by a greater percentage in those counties that he won? that favored Kerry in 2004? that are furthest from MSAs. What happened in other swing states? What about counties with high/low percentages of minorities? Enjoy!
Please note:
Back to the 11.520 Home Page.
Back to the CRON Home Page.