11.520: A Workshop on Geographic Information Systems |
11.188: Urban Planning and Social Science Laboratory |
Due Monday, Oct. 1st, 2007
(1) Merging, editing, and saving tables: To do this lab, we'll have to learn a few more ArcMap techniques for merging and saving tables. There are hundreds of census variables and any Cambridge blockgroup map is likely to include only a small subset of them. To use other census data, we'll have to load these data into ArcMap and "join" them with the attribute table of the Cambridge blockgroup theme using a common geographic reference variable (such as the state-county-tract-blockgroup identifier, Stcntrbg). We may also need to compute new fields that normalize or otherwise combine multiple columns of the original data. But the official class datasets are "read-only" and ArcMap requires that you have write access to attribute tables in order to create and calculate new fields. So we'll also need to learn how to save local "writeable" copies of (extracts of) our class datasets.
(2) Spatial Joins: Suppose we'd like to compare the sales prices of Cambridge homes with the socio-economic characteristics of their neighborhoods. For example, we might want to know whether the high-priced sales tended to be in neighborhoods with highly educated adults. By drawing a pin-map of high-priced 1989 sales locations on top of a Cambridge block group map, we may be able to see a pattern. The block group map can be thematically shaded by educational attainment levels. With this map, we can 'see' whether high/low-priced sales cluster in neighborhoods with, for example, more (or less) educational attainment. But the pattern may be misleading or hard to interpret and we may want to quantify the relationship to measure the degree of association and to tag the sales data with the some of the characteristics of their neighborhoods. Much of the benefit of using GIS software depends on its tools for cross-referencing and reinterpreting data based on spatial location. We'll defer until later labs the more general and advanced tools for tagging data based on spatial proximity and we'll focus in this lab on simpler spatial joins and buffer creation.
(3) Setting coordinate system: So far, the coordinate system of the data has already been preset for your use. However, in typical settings, you may need to specify or change the coordinate systems associated with GIS datasets. In this exercise, you will learn how to apply a suitable coordinate system to your datasets.
Now we will experiment with querying the data. Select "Select by Attributes" under the "Selection" menu. On the "Select by Attribute" window, make sure choose Cambbgrp for the "layer" window. In this case, we don't need to worry about that because we have only one layer. Otherwise, we have to pick up the correct layer from the drop-down list. In the method box, keep the default one "create a new selection".
Now you can type your query manually into the box in the bottom left corner of the window, or you can use the tools at the top of the window.
Let's find the block groups with median household income in excess of $50,000 per year. In the "Fields" list, double-click on "Med_hh_inc". Then single-click on the ">" button. Next, you can type in 50000 or scroll down "Values" list until you can double-click on 50000. The query entry window should now show "Med_hh_inc"> 50000
The window finally looks like the Fig. 1. Click "Apply" button to run the query and 'select' these block groups. Notice that all block groups where the median household income is greater than $50,000 are now highlighted in the data display area. If you open the attribute table, you can find out the associated records are also highlighted. You may need to scroll up or down in the attributes window to see the highlighted records. If you click the Selected button, which is at the bottom of the attributes window, only the selected records will be visible. Next to the Selected button, you can find "9 out of 94 are selected".
Now close the query window. Make sure the selected records are still highlighted. Select Statistics from the Selection menu. You should see a selection statistics window. Choose Med_hh_inc in the field drop down list (Fig. 2). You should see a window listing some common statistical measures describing the 9 selected records of the Med_hh_inc field. It also generates a frequency distribution diagram.
Now close the selection statistics window and return to ArcMap window. Swap the selected and unselected block groups in the follow way: 1. Open the attribute table; 2. Click Options; 3. On the pop-up menu, select Switch selection.
Select Statistics from the selection menu again to summarize these 85 lower income block groups. Write the mean and standard deviation of these 85 block groups on your lab assignment. (There is more than one way to access the selection statistics. With the attribute table open, try right-clicking on the column heading of the field for which you want statistics...)
Note from the statistics for these 85 block groups that at least one block group had a median household income of zero. Let's look at these block groups using the select by attributes function. We find that there are with zero income. Two of these block groups are near Tech Square and one of them is near Harvard Square (where students in dorms count as people but not households). Let's exclude these block groups from consideration and recompute the statistics for the 82 block groups with a median income of $50,000 or less. Specify the selection criterion as "MED_HH_INC" > 0 and "MED_HH_INC"<=50000. Then use the Selection > Statistics menu to compute the mean and standard deviation for those block groups with median household income greater than zero and less than or equal to $50,000. Record these two numbers on your lab assignment. Since we consider the census data to be unreliable for the three block groups with '0' income, we may wish to exclude them from consideration entirely. One way to do this is to set the layer properties for Cambbgrp. Double click on the layer's name to bring up the layer property window. Under the tab "definition query", type in "MED_HH_INC" > 0. Click OK. Notice that the Cambbgrp layer no longer shades the three block groups with zero income. They have been excluded from the layer and they are no longer shaded, but are 'donut holes' in the Cambridge map. The data are still there on disk but ArcMap is ignoring the three block groups that don't meet the selection criteria specified in the layer definition criterion. For the remainder of the exercise we will exclude the zero-population block groups from our analysis.
(Universe: Persons 25 years and over) |
|
Column | Description |
---|---|
EDUTOTAL | Total Persons 25 years and over |
EDU1 | Less than 9th grade |
EDU2 | 9th to 12th grade, no diploma |
EDU3 | High school grad (includes equivalency) |
EDU4 | Some college, no degree |
EDU5 | Associate's degree |
EDU6 | Bachelor's degree |
EDU7 | Graduate or professional degree |
Suppose we wanted to generate a thematic map of the percentage of people (aged 25 and older) who have less than a high school education. The EDU variables in the Attribute table of Cambbgrp table provides the required information. They indicate the 1990 census counts of persons (aged 25 years and over) with various degrees of education as in Table 1.
From the listing, we see that the desired percentage equals
100 * (EDU1 + EDU2) / EDUTOTALThe tables we are using are read-only so we can't edit the tables in order to add a new column that computed this percentage. To overcome this problem, we'd like to create (and edit) a local copy of the table. But there are several dozen columns in the table and we are concerned only with the geographic identifiers and the EDU fields.
Open the layer properties window, go to the Fields Tab. Turn the 'visible' off for all but the three needed EDU fields and the Stcntrbg blockgroup identifier. To turn off a field, first select the field, then uncheck the visible box. When you click 'OK' the table will show only those four fields. Open the attribute table again to confirm this behavior.
Now you need to export the table. Click on the Option Button, which is on the right bottom of the attribute table window. On the pop-up menu, pick up Export. Save the table to your working directory. If you have not registered your working directory with ArcCatalog, you need to use the connect to folder button to register it first (Your working directory would be I:\Private\11.520). Save the table as a dBase-formatted table. Call the table education.dbf. The *.dbf suffix will be a handy reminder of the data format for the table and the dBase format is quite portable and easily read by Excel and many other packages.
When asked whether you want to add the table to the current map, click Yes. Now this newly saved-to-disk education.dbf table will be included in your map document file. However, you may not be able to see it in the data frame if the bottom tab is set to Display instead of Source. Change the view of the data frame from Display to Source; and the education.dbf table will be listed. If you clicked the wrong button when asked to add the table to the map file, it won't be listed, but you can now add it in the same way as you add any other map layer or attribute table.
Since you now have write access to your own education.dbf table, you will now be able to add a new column for computing your low-education percentages. Right click the table name in the data frame. On the pop-up menu, select Open Attribute Table. On the new window, click on options, and on the pop-up menu, select Add field. Call the new field p_lowed and change the data type to float. Click OK, and you will see a new field appear in the window. Next, right click on the new column name and on the pop-up menu, select Calculate Values. Click "yes" when a warning message comes up (about not being able to undo the editing). Then set
p_lowed = (this top line should appear by default above the box with the cursor and you should not retype it)
100 * ([EDU1] + [EDU2]) / [EDUTOTAL]
Sort the table (in descending order) by p_lowed. Right click on the column name and select Sort Descending. The highest p_lowed value is 55.5%. What is the second highest p_lowed percentage? How many block groups have more than 50% of their adults lacking a high school diploma? Write the values on your lab assignment.
Apply the join and go back to the ArcMap window. Open the attribute table of Cambbgrp. We now find the columns of the education tables appended to the right side of the attributes of Cambbgrp table. Now, we are ready to shade a thematic map of the newly computed p_lowed field. Open the layer definition window. Go to Symbology the. Classify your p_lowed field using 5 quantiles categories and eyeball the results. To save time, we won't ask you to print out this map. Just pick a readable graduated color scheme and take a little time to examine the thematic map: Do the patterns you see match your impression of the socio-economic patterns in Cambridge?
Please note that the table 'joins' are one-directional. If you join the tables in the reverse order, the Attributes of Cambbgrp table will disappear and you won't be able to map the selections from the expanded education table. Finally, note that these Joins are temporary and do not affect the tables that are stored on disk.
Add the sales89 coverage as a layer in your ArcMap window. Open the layer properties window and go to the definition query tab and create query definition to include in your layer only those sales with a realprice less than $150,000. You should find that 29 of the 222 sales meet this criteria. Now select areas of the city (block groups) where these sales occurred. Go to menu Seletion > Select by location. Specify the criterion as in Figure 3.
Click the Apply button. Every block group that contains one or more of the low-priced sales will be highlighted. You've just done a basic "point-in-polygon" query to find a set of polygon features (block groups) which contain a set of point features (the low-priced sales).
Analyzing the results:
These point-in-polygon queries are useful for quick exploration of the data but the summary statistics are only that -- a summary of the patterns that result. As we might suspect, the general trend suggests that low-priced housing tends to be in lower-income, lower-education neighborhoods. But there is quite a bit of variability and the tools we've used so far don't let us move the Attributes of Cambbgr data from the census table over to the appropriate sales89 rows. Doing that would permit us to examine the patterns more closely. In later labs, we'll look at other tools (in the ArcGIS Toolbox) that let us tag the sales89 table with the census data for the blockgroup that contains the sale.
Suppose we want to check whether the lower-priced housing tends to be closer to the major roads. Let's use ArcMap's buffering tools to create a buffer around Mass Ave and see whether the sales in the buffer are relatively high or low priced.
Add cambtigr coverage to your ArcMap project. Choose to add the 'arc' features from this coverage. The cambtigr coverage is stored in M:\data. This is the Cambridge street coverage that we used in an earlier lab. Let's add a second copy of this layer to our "Lab Exercise 3" data frame. (We're doing this so we can limit one copy of the road theme to a single road -- Mass. Ave.) Right click on the layer name camtigr and select 'copy'. Right click on the data frame name "Lab Exercise 3" and select 'paste layer'. Open the layer property window of one of the TIGER layers. Under the tab general, change the layer name to Mass Ave. Go to the Definition Query tab, and use the query builder to limit it to only those road segments with FNAME = 'Massachusetts'. (There is one street segment on the Mass Ave bridge that has FNAME = 'Masssachusetts Ave'! Don't bother including that link.) Beware of two issues when making this selection: (1) the field name shows up as 'fname' in the 'definition query' window, but is listed as 'fename' if you use the 'identify' cursor to click on the street links display in the map window. That's because column names are allowed to have aliases - take a look at the 'fields' tab in the layer properties window and you'll see both names! (2) Also, note that some Mass Ave street segments are not selected. That's because the FNAME for those segments are listed as 'State Hwy 2A' rather than 'Massachusetts'. This multiple-name issue could be a problem. In this case, the 55 selected Mass Ave street segments are sufficient to create a buffer that will enclose the others so we won't need to do extra work to identify those Mass Ave segments that list the route number instead of the street name. ArcGIS has some more elaborate database table schemas to handle such naming and route numbering issues but we won't get into that level of complexity in this exercise.
Next we will use the 'Buffer' tool in the ArcToolbox to create a buffer. We want to buffer the 55 features of our Mass_Ave theme by 0.5 miles. (Make sure you have already set the Data Frame Properties so that maps units are in meters and display units are in miles.)
Open the ArcToolbox window from the Window menu. Then choose the Buffer option from the 'Proximity' listing under 'Analysis Tools'. Choose Mass Ave for the input feature and enter a path and shapefile name for the output feature that will store it in your I: drive locker: for example. I:/[Your 11.520 working directory]/Buffer_of_Mass_Ave.shp. In the 'Distance' portion of the buffering window, set the linear unit to be 0.5 and be sure the units are set to miles. Accept the defaults for the 'side type' and 'end type' (regarding how the shape of the buffer is computed) and change the 'dissolve' setting to 'all' so that overlapping buffers around each street segment are merged. When you've adjusted all these settings, click "Finish" and wait for ArcMap to do the computations. You should end up with a curved sausage-shaped area something like the one in Fig. 6.
Since this newly created buffer is a shapefile, we can use it just like any other theme. In particular, we can do an exercise similar to "point-in-polygon" example we just did in order to determine which 1989 housing sales are located within the new buffer. Highlight the sales89 theme (with all the sales not just the low-priced ones) and choose Selection > Select by Location. Select all the sales89 cases that intersect the Buffer of Mass Ave layer. (We used "completely contain" before and "intersect" this time. Do you understand why?) You should find that 140 of the 222 sales are located within the half-mile buffer. What are the mean and standard deviation of sale prices for the sales within the buffer the buffer? Write the values on your lab assignment.
The sales prices in and out of the buffer aren't all that different (compared with their standard deviation). Also, some parts of the buffer falls outside Cambridge and we don't know about home sales prices in those areas. Before reaching any conclusions, we would want to do further analysis. But, this quick tour of spatial selection is enough for today. In subsequent labs, we'll examine lots more of the spatial analysis capabilities of ArcMap.
In this part, you will simply apply a suitable coordinate system to a Middlesex county city boundaries layer for a map you will create. Suppose you would like to create a map layout shown below. As you can see, there is a small map in the left bottom corner. It shows cities in Middlesex county, Massachusetts and red area is Cambridge.
To create this map, you have to create a new dataframe as you did in lab2.
Rename the new dataframe to legend map and copy Cambbgrp
layer and paste into the legend map dataframe. Add a new layer
middle_county to the legend map dataframe
from M:\data\lab3_county\.
(The map layer shows all the towns within Middlesex County in Massachusetts).
When you add the shapefile, you will get a warning message something like
this:
This message notifies you that the spatial reference information is missing from the county layer and may cause problems for further use. In some cases, you may not get an error message but it does not mean that it is O.K. Let's check it out. Open the middle_county layer's layer properties window and select the source tab. It shows that the coordinate systems of this layer is "Undefined". Let's fix this. Close the layer properties window and remove county layer from the legend map dataframe. You cannot change the coordinate system of a layer while it is open in ArcMap. Now launch the ArcCatalog (Start > All Programs > ArcGIS > ArcCatalog).
1) Copy and paste the middle_county layer from M:\data\lab3_county\ into yourI drive (I:\Private\11.520). When you copy and paste the layer, please use ArcCatalog. A GIS layer consists of several files, such as .shp, .shx, .dbf, .sbn, .sbx, and you may miss necessary files when you use windows explore to copy the layer. Click right mouse button while put the cursor on the name of the layer. Drop-down menu will show up. Click properties from the list. Then Shapefile Properties window will show up. (Note that, in this graphic, the shapefile is called 'county' rather than 'middle_county'.)
2) Choose the "XY Coordinate System" tab. It will bring up the spatial reference properties window
3) Select appropriate spatial reference system from the list. In this case NAD1983 Stateplane Massachusetts Mainland FIPS 2001.prj is the right one. Select the spatial reference system and click ADD.
4) Now the spatial reference for the county layer has been set. Click OK. Now go to the ArcMap and add the county layer to the legend map dataframe. This time no warning message will show up.
Knowing how to examine and reset the projection associated with data frames and layers is often handy when integrating data layers that come form different different agencies and sources.
Created by Thomas H. Grayson
and Joe Ferreira.
Modified Sept. 16 2003 by Jeeseong Chung.
Modified Sept. 25 2004 by Jinhua
Zhao.
Modified Sept. 25 2005 by Xiongjiu Liao.
Modified Sept. 15 2006 by Mi Diao.
Last Modified Sept. 29 2007 by Yang Chen.
Back to the 11.520
Home
Page.
Back to the CRL
Home
Page.
Please send comments about this page to the 11.520 class staff <11.520staff@mit.edu>.