11.188 Lab 6

Another strategy for interpolating a housing value surface would be to use the median housing value field, MED_HVALUE, from the census data available in cambbgrp. There are several ways in which we could use the block group data to interpolate a housing value surface. One approach would be exactly analogous to the sales89 method. We could assume that the block group median was an appropriate value for some point in the 'center' of each block group. Then we could interpolate the surface as we did above if we assume that there was one house sale, priced at the median for the block group, at each block group's center point. A second approach would be to treat each block group median as an average value that was appropriate across the entire block group. We could then rasterize the block groups into grid cells and smooth the cell estimates by adjusting them up or down based on the average housing value of neighboring cells.

Let's try the first approach. This approach requires blockgroup centroids, but we have already shown how to create them in earlier lectures and labs. The Cambridge block group centroids have been saved (along with a few of the columns from the cambbgrp shapefile) in the shapefile cambbgrp_point. Make sure that layer has been added to your Data Frame and then do the following:

Next, let's use the second approach (that is, using the census blockgroup polygon data) to interpolate the housing value surface from the census block group data.

As you can see from the images below, except for the jagged edges, the newly created grid layer looks just like a vector-based thematic map of median housing value. Do you understand why this is the case (When using Quantile, 9 classes)?

Examine its attribute table. Among the original 94 block groups, there were 63 different housing values (including 0). The attribute table for numeric raster grids does *not* have a row for every grid cell. Rather it has a row for each unique grid cell value. In this case, there are 63 rows --one for each unique value of MED_HVALUE in the original cambbgrp coverage. The attribute table for grid layers contains one row for each unique value (as long as the cell value is an integer and not a floating point number!) and a count column is included to indicate how many cells had that value. Grid layers such as hvalue_points have floating point values for their cells and, hence, no attribute table is available. (You could reclassify the cells into integer value ranges if you wished to generate a histogram or chart the data.)

Finally, let's smooth this new grid layer using the Spatial Analyst Tools> Neighborhood> Focal Statistics option. Let's recalculate each cell value to be the average of all the neighboring cells - in this case we'll use the 9 cells (a 3x3 matrix) in and around each cell. To do this, choose the following settings: (they are the defaults)

Input raster: CAMBBGRPGD
Output raster: [your working space]/hvalue_poly
Neighborhood: Rectangle
Width: 3
Height: 3
Units: cell
Statistic type : Mean

Click 'OK' and the hvalue_poly layer will be added on your data frame. Change the classify method to "Quantile". You should get something like, although not exactly the same as, this:
(When using Quantile, 8 classes)

**Fig. 8. Smoothing by neighborhood statistics function**

One might assume that selecting rows in the attribute table (for hvalue_poly) would highlight the corresponding cells on the map. Try it! However, the attribute table is not accessible since the grid cell values are floating point numbers and ArcMap makes the attribute table available for grid layers only if the values are integers. [You could use the int() function (in ArcToolbox) to create a new grid cell layer whose values are obtained by truncating the hvalue_poly values to integers.] Instead, use the 'identify' tool to click in the high value parts of Cambridge in order to identify the highest valued grid cells. Find the cell containing the location of the highest price sales89 home in the northwest part of Cambridge. What is the interpolated value of that cell using the two methods based on MED_HVALUE?

Many other variations on these interpolations are possible. For example, we know that MED_HVALUE is zero for several block groups--presumably those around Harvard Square and MIT where campus, commercial, and industrial activities results in no households residing in the block group that are owner-occupied and have a housing value reported in the census data. Perhaps we should exclude these cells from our interpolations -- not only to keep the 'zero' value cells from being displayed, but also to keep them from being included in the neighborhood statistics averages. Copy and paste the cambbgrp layer into the same Data Frame and use the query tools in the Layer Properties > Definition Query tab to exclude all block groups with MED_HVALUE = 0 (which means include all block groups with MED_HVALUE > 0 -- therefore, another way to do this would be to select by attributes MED_HVALUE > 0). Now, recompute the polygon-based interpolation (Hint: the major difference between the first and second approaches of interpolating housing values is that the former one is centroid-based, and the latter one is polygon-based) and call this grid layer 'hvalue_non0'. Select the same color scheme as before. In the data window, turn off all layers except the original camborder layer (displayed in a non-grayscale color like blue) and the new hvalue_non0 layer that you just computed. The resulting view window should look something like the following (When using Quantile, 9 classes).

**Fig. 9. hvalue_non0**

Notice the no-data cambordergd cells sticking out from under the new surface and notice that the interpolated values don't fall off close to the no-data cells as rapidly as they did before (e.g., near Harvard Square). You'll also notice that the low-value categories begin above $100,000 rather than at 0 the way they did before. This surface is about as good an interpolation as we are going to get using the block group data.

Comment briefly on some of the characteristics of this interpolated surface of MED_HVALUE compared with the ones derived from the sales89 data. Are the hot-spots more concentrated or diffuse? Does one or another approach lead to a broader range of spatial variability?

Q:\data\cambbgrp.shp	Census 1990 block group polygons for Cambridge
Q:\data\cambbgrp_point.shp	Census 1990 block group centroids for Cambridge
Q:\data\cambtigr.shp	U.S. Census 1990 TIGER file for Cambridge
Q:\data\camborder polygon.shp	Cambridge polygon
Q:\data\sales89.shp	Cambridge Housing Sales Data

Massachusetts Institute of Technology
Department of Urban Studies and Planning

Lab Exercise 6: Raster Spatial Analysis

Start: Monday, March 30, 2020, 2:35 pm (EDT) -- No Due Date (Optional due to Covid-19 interruption)

Administrative

Settling into Online Learning - planning 2nd half of semester

First, a few maps and spatial analyses related to Covid-19 to examine

Overview

I. Setting Up Your Work Environment

II. Spatial Analyst Setup

III. Interpolating Housing Values Using SALES89

IV. Interpolating Housing Values Using CAMBBGRP

V. Combining Grid Layers Using the Map Calculator

VI. Combining Grid Layers Using Weighted Sum in ModelBuilder

Setup

Lab Assignment

Massachusetts Institute of Technology Department of Urban Studies and Planning