Massachusetts Institute of Technology
Department of Urban Studies and Planning


11.520: A Workshop on Geographic Information Systems
11.188: Urban Planning and Social Science Laboratory

Lab Exercise 4: Database Aggregation and Chart Creation in ArcMap

Due October 4, 2010


In this lab exercise, we will examine the population density of Massachusetts towns and we will explore using ArcMap's data manipulation and charting tools. The motivating example introduces one of the complications that we listed when discussing GIS data models. When we obtain maps and data from different sources, there may not be a one-to-one match between the spatial objects in the map and the rows in the data table. For example. there are 351 towns in Massachusetts, but the MassGIS map of town boundaries has 631 rows! Why so many? Because some towns are split into two or more parts by rivers. Other towns include islands with boundaries that don't connect with the other parts of the town. But the data source for town-by-town population was the total population for each town. So, the usual way to create a population density map won't work well. If we try to draw a thematic map of population density by dividing the listed population for each polygon by the area of the polygon, the Boston harbor islands will show up as the densest part of the state - with an average of 170+ people per square meter of land area! Try it!

In order to develop an appropriate thematic map of population density, we must aggregate the area across all parts of each town before we compute density as town-population divided by town-area. We will use the 'Summarize Table' function to create a new table from the Attributes-of-Town table. We will also use the Chart functions to create business graphics of Town density. Producing the charts will involve another complication because of the multiple rows per town in the Attributes-of-Town table. Handling this density-map problem will give us a better understanding of ArcMap and, moreover, a better understanding of the one-to-many problems that often complicate good data analysis when we mix and match data from different sources.

As we did in previous labs, we suggest you attach Drive M: before you start the exercise for easier access to the class data locker. Here is a link to the 'in-lab' notes: lab4_inlab_notes.html

I. Fixing the One City-Multiple Polygon Problem

First let's generate a quick-n-dirty (but wrong) population density map. Add the MassGIS shapefile of Massachusetts city and town boundaries from M:\data\matown00.shp and produce a thematic map of the 1990 population (pop90) divided by 'area'. When zoomed in on Boston, the thematic map will look something like this:

Fig. 1. Population density map without considering one-to-many relationship

 

The results look strange and we notice that the density of some small areas appears to be extremely high. On looking more closely, we find that small islands in Boston Harbor have a small area but are associated with all of Boston's population, thereby creating the impression that these islands were extraordinarily densely populated. In fact, many of these islands have no population at all. The basic problem is that the population data counted people in each entire city (or town) whereas the map sometimes had multiple polygons for a single city. In the case of Boston, for example, East Boston and Charlestown are separated from the rest of Boston by Boston Harbor and the Charles River and each of the harbor islands is represented by a its own polygon - and a separate row in the Attributes of matown00 table. Hence, we have a 'one-to-many' database problem whereby each city or town may be associated with more than one spatial feature. We can find a way to resolve this problem using ArcMap's 'Summarize Table' function. In this part of the lab, you will solve this problem, with some modest variations in the basic procedure. In particular, we will strive to obtain a better estimate of population density by excluding small islands of 100,000 square meters (10 hectares) or less.

Start the exercise in the usual manner by identifying a place (such as C:\usertemp\[your-user-name]) to store local files, and copying into that location some of the shapefiles that we may need for the lab. You may also want to reset your 'scratch space' directory to point to a local directory (such as C:\usertemp\scratch). Use ArcCatalog to copy the MassGIS shapefile matown00.shp of Mass town boundaries from the class data locker M:\data\ into a truly local drive such as C:\usertemp\[your-user-name]. Open ArcMap and add this local copy into your ArcMap data frame. In order to make it easier to compare different density calculations, we can include the shapefile two times in the data frame and then symbolize each layer differently. To accomplish this double-listing, make a copy of the matown00 layer by copying and pasting it into the same dataframe. Now you have two matown00 layers in your dataframe.

Let's use one of the matown00 layers to focus on the larger polygons and the other to handle the small islands that we assume are uninhabited. To do this, rename one of the layers to Towns and modify its layer properties so that the layer's definition (in the 'definition query' tab of the layer properties window is defined by this query:.

( "Area"> 100000 )

Now modify the other copy of the matown00 layer so that it is defined by this query:

( "Area"<= 100000 )

Rename this theme to Small Islands. Zoom into Boston and the harbor and make sure that both the Small Islands layer and the Towns layer are visible. The map now looks no different than if we had only a single layer, but we can now thematicaly shade the polygons of the Towns layer without shading the small polygons in the Small Islands layer. Do you understand why we will want to do this?

Next we want to create a new table that aggregates the area by town for the Towns layer. The new table will contain the name of the town, the total area for that town (exclusive of the islands), and the town's population for 1980 and 1990. Open the attribute table for the Towns layer first. Since we want to do the aggregation by town, right click on the column name "Town_Name". On the pop-up menu, select Summarize. The "Summarize" window appears.

On the "summarize" window, select the Town field in the table as the field to summarize. (The "TOWN-NAME" field shown in the attribute table is an alias of the field "TOWN"). Make sure that none of the towns are selected; if any are, then your summary table will contain records for only those selected towns! Then, change the name of the output file from the default to town_density.dbf and adjust the path name to be sure it is a writeable local directory such as C:\usertemp\[your-user-name]. We must also specify what to do with the attribute fields (i.e., the columns) that we want to compute by aggregating data across the several rows that may apply to a single town. In our case, we want to select the Sum for the AREA field, Minimum for POP1980, and POP1990 fields, (Why don't we use sum for POP1980, and POP1990? Can we use Maximum or average for POP1980, and POP1990? )

When you click 'OK,' ArcMap will ask you whether you want to add the new table to the map document. Select Yes. town_density.dbf will be added to your document. We now have the raw data we want -- the town areas and the populations -- but we have not yet computed the densities. Let's do that now. Using the techniques you learned in Lab 3 for editing data tables, add these fields to your table:

AreaMi2 Town's area in square miles
PopDen80 Population density per mile in 1980
PopDen90 Population density per mile in 1990

To convert square meters (the units of area used in Towns layer) to square miles, divide by 2589988.

When you're finished with the calculations, join the town_density.dbf table to the Towns layer, as described in the section entitled "Merging Data Tables" in Lab 3. Now create a thematic map of the population density in 1990. Focus your map on eastern Massachusetts just beyond Interstate 95 ( known as Route 128), Boston's first ring road. (You can add the major roads shapefile majmhda1.shp that you used in lab #3 to locate the ring road. BUT, use a local copy of the file instead of the one in the class data locker or you will slow down ArcMap just as happened in Lab #2) Use the Small Islands theme to show the islands but keep them colored white. Export a 'layout' version of your map (with appropriate title, scale bar, north arrow, source, legend, and annotations) to a PDF file and save it in your Athena (network) locker space for subsequent uploading into Stellar.  

If you have extra time, you can consider making a similar map of the 1980 population density and comparing it with the 1990 density map and developing a 'layout' with three maps that shows 1980 and 1990 population density plus the population density change between 1980 and 1990. BUT, finish the rest of the lab exercise before you try on these extra, optional steps!

II. Creating Graphs in ArcMap

ArcMap's charting tools are somewhat meager compared with, say, a spreadsheet like Excel. For example, you can't easily plot a histogram of the population density that you just computed since it is cumbersome (using ArcMap's table manipulation tools) to group the data into intervals.  Nevertheless, the charting tool is still handy for exploring geographical patterns. To illustrate this, let's chart our newly computed population density and examine the relationship between population density and town size to see if bigger towns are more dense.

Go to Tools > Graphs > Create. This opens the graph wizard. On the first wizard window, select "Bar" as the graph type (either "Horizontal Bar" or "Vertical Bar"). And select "towns" as the layer. Select PopDen90 as the value field to graph. Check the 'show legend' option and label X-axis ("Y-axis" if you chose "Horizontal Bar") with town name. Click "Next," and make sure to check the option "show all features/records on graph". Type in the title Population density in 1990. Click finish, and a graph will appear.

Notice that this graph is not very readable because it contains too may towns all 351. Although we request showing legend and labels, they can't all show up. We can make it readable by selecting fewer towns. Highlight the map window, make sure the Towns layer is selected and use the point-and-drag selection tool Selection Tool to highlight several towns on the map. Notice that the chart window is *not* updated automatically. If you have closed the chart window already, go to Tools > Graphs > Population density in 1990 to open the chart. Right click on the window bar of the chart to open the chart property window. In the "Appearance" tab, check "show only selected features/records on the graph". Click OK. You'll notice that the chart window now shows a bar chart of the population density only for the selected towns.

On closer inspection, you'll notice that the chart is misleading. If you zoom in near Boston and select towns near Boston Harbor, this time, the chart will be updated automatically. Maximize the chart window so you can see more detail, and you'll notice that Boston shows up more than once in the bar chart. This is because the Attributes of Towns table has more than one row for Boston -- one for each harbor island and for other isolated parts of Boston such as East Boston and Charlestown. Each piece of Boston is estimated to have the same density (now that we have finished Part I to use the total area of each town in our density calculation) but we would like to have Boston (and the other towns) show up only once on the chart. We can accomplish this by using the table town_density.dbf that we created in Part I in order to compute a more meaningful population density. At the moment, this table is joined into the Attributes of Towns table (so we were able to create a thematic plot of population density in Part I). We still want that thematic map, but we also want to work with the grouped table to get a meaningful chart. To do this, we can 'relate' the grouped table town_density.dbf to Attributes of Towns. Make a new copy of the Towns layer and rename it Town_chart. Open its layer property window and go to the tab joins and relates. Since it is a copy of Towns layer, it keeps all joins that Towns layer has. Remove the existing join from Town_chart first and then add a 'relate' ' in the same manner that we we previously 'joined' the town_density.dbf to the layer. After setting up the 'relate' click Apply. [Do you understand why we need a non-joined copy of town_density.dbf in order to get the chart that we want with only one row per town?]

Using 'relate' instead of 'join' helps us get a meaningful chart without duplicate towns. However, it makes it harder to highlight towns on the map by selecting rows in the town_density.dbf table. If town_density.dbf were joined into Attributes of Town_chart, then selecting a row in Attributes of Town_chart will select the corresponding row in town_density.dbf. However, the behavior is different for related tables. In ArcMap (unlike an earlier version called ArcView), selecting rows in one table does not automatically select corresponding rows of a related table. To see the corresponding rows in a related table, you should click option in the bottom of the table and select related tables > [name of relation]. Then the corresponding rows in the town_density table will be shown.

Fig. 2. View related tables

Linking in this manner is useful here because we can now select towns on the map, which will in turn select rows in both tables. We can then use town_density.dbf to make a meaningful chart, since this table has only one row per town, unlike Attributes of Towns. Make sure the town_chart layer is active. Select several towns around Boston , including the city of Boston. Then, open the attribute table of town_chart layer. Select options. On the pop-up menu, go to related tables > town_density. The town_density table appears. Compare the number of the selected records in these two tables. [Do you understand the difference and the reasons why we have set up the map, table, and charts in this manner?]

Next, we can edit the properties of chart we created earlier so that the data source is town_density.dbf. Open the graph property window, under the Series tab, choose town_density as the table containing the data. Choose PopDen90 as the field to graph. Make sure the option "show only selected features/records on the graph" is checked. Click OK and go back to the chart window. You'll see the updated bar graph of population density -- with only one bar per town (Note, that the images below are only examples and the numbers in your images will be different depending upon which towns (or municipalities) are in your selected set.)

query_win

Fig. 3. Create a Chart in ArcMap

 

Notice that whenever you select a new set of the towns in towns_chart layer, you need to update the selection set of the related table, town_density.dbf, by opening the town_density.dbf from the attribute table of towns_chart layer. (Go to Options > Related tables > Towns_density.) Update the graph in the graph property window as well. Now test this by selecting a few towns from the map in the view window.

Create a layout that includes a map indicating the dozen or so towns you've highlighted on the map, a table listing those towns, and the chart you created. You will need to add chart and table frames to a basic map layout. Be sure to include Boston so we know that you've properly aggregated the parts of the town. Create a PDF file from the layout and save it in your athena locker (for subsequent uploading to Stellar).

III. Connecting ArcMap and MS-Access

We have now finished the lab exercises that generate meaningful population density maps and charts despite the one-to-many issues involved in linking population data to the detailed Massachusetts town map. However, before concluding the exercise, let's get some practice connecting ArcMap to MS-Access so that we are ready (next week) to make more use of MS-Access data management capabilities. We will export into an empty, new ms-access database the attribute table from the cambridge blockgroup shapefile that we used in earlier labs.

Start by opening up an empty MS-Access database. Use Start/Programs/Microsoft-Office to open MS-Access. When the application starts you are required (in the 'getting started' panel on the right) to create a new file or select an existing ms-access database (i.e., a file on disk with the an 'MDB' or 'ACCDB' suffix after the period at the end of the filename - if you have set up Windows to show the filename suffixes. Which suffix you see depends on whether you are using recent or older versions of MS-Access) Select the 'blank database' icon under 'new blank database' (or, in older version of MS-Access, select 'create-a-new-file' and then 'blank database'). In the new 'blank database' panel that opens on the right (or the pop-up 'file new database' window on the old versions), navigate to your personal network locker (or your temporary space in the local C:\usertemp\...) and, in a sub-directory where you store files for this class, name the database 'lab4_newdb.mdb'. This name will be the file on disk in which will be stored all the tables, queries, etc. that are associated with your new database. A window named 'lab4_newdb.mdb' opens up inside the ms-access window.

Now go to the ArcMap application that you used for the previous parts of this lab (or a new ArcMap application - you can run ArcMap more than once if your machine is fast and has lots of memory). Add the cambbgrp.shp shapefile that we have used in previous labs. Right-click this layer and choose 'open attribute table.' Now left-click the 'options' button and choose 'export'. In the 'export data' popup window, click the folder icon next to the 'output table' area and, in the 'saving data' dialog box, set the 'save as type' to be 'File and personal geodatabase tables' and then navigate to the directory in which you create your lab4_newdb.mdb database. However, do not yet click that filename - if you do, you will get an error message that the file is already in use. Switch to your ms-access application and close it (File/exit). The ms-access database is empty but still needs to be closed before ArcMap can interact with it. Now go back to the 'saving data' dialog box (in the ArcMap application), doubleclick on lab4_newdb.mdb and choose a name to be used in ms-access for the attribute table that you wish to export into ms-access. Call it cambbgrp_attributes. Double-click on this table to open it up in ms-access. Do you see any difference from the same table when opened up in ArcMap? (Hint, notice the first few columns.) We do not ask you to do anything further with ms-access this week, but we will do more with it when we analyze census data next week to map the percentage of workers in the Boston metro area who drive alone to work each day. If you have time this week, practice generating a few queries in ms-access. For example, develop a query to calculate the percentage of residents aged 25 or older in each block group who have less than a high-school education (i.e., the same query that you developed in ArcMap for earlier exercises). You might also export the attribute table from matown00.shp and compute the population density as we demonstrated in lecture. .

If you have extra time, you might also wish to experiment with other chart types. In particular, experiment with plotting population density vs. town size (area) and population change (the 1990/1980 population ratio). You may also want to experiment with the use of MS-Access and MS-Excel to calculate and plot Massachusetts densities. For example, you might import a copy of the matown00.dbf part of the shapefile into MS-Access (or save the attribute table directly into a new MS-Access database) and then write a query that groups the rows by town and computes population density (in the same manner that was illustrated in the previous lecture). For your convenience, we have saved the tables and queries that compute correct Mass town densities in an MS-Access database called lec5_mass.mdb. This database is available here: http://mit.edu/11.520/www/lectures/lec5_mass.mdb. However, don't start off by looking at the saved queries in lec5_mass.mdb database that 'group by' town to sum the area, compute the density, and join the summarized table back to the attribute table. Try to create the required queries on your own by using the graphical query design tools and look at the answer in the saved queries only when you need some tips.

Finally, you may want to experiment with adding MS-Access tables into ArcMap. In an ArcMap session with the matown00 shapefile added to a data frame, click the 'Add' button, navigate to the local directory where you have stored lec5_mass.mdb and click on that filename and then the ma_density table that includes the correctly computed town densities. Join this table to matown00 and confirm that you can generate a thematic map of density just like you did in Part I above.

Extra note
Another feature in ArcGIS (and ArcMap) that is different from the earlier ArcVIEW is that the links are bidirectional rather than unidirectional, which means that you can choose rows in town_density table and see the corresponding rows and polygons of Towns layer (instead of working only in the other direction). Let's test it. First, open the town_density table and select a row in Boston. Click the option button in the bottom of the table, then related tables > [relate name]. Three tables will pop up: the Attribute table of Towns, the Attribute table of Small_islands, and the Attribute table of Town_chart. That's because those three layers are basically one coverage with different names.

IV. Assignment

Please submit your two PDF files on Stellar: the population density map you made in Part I and the layout with the map, table, and chart you created in Part II. There is no need to hand in a hard copy just submit your files to appropriate Stellar site (for 11.520 or 11.188). The assignment is due on Monday, Oct 4th, 2010.

Created September 2001 and modified 2001-2008 by Joseph Ferreira, Jr. and Thomas H. Grayson, Jeeseong Chung. Jinhua Zhao, Mi Diao. Yang Chen, and Yi Zhu.
Last modified 25 September 2010 by Joe Ferreira

Back to the 11.520 Home Page.
Back to the CRON Home Page.