A housing value 'surface' for Boston would show the high and low value neighborhoods much like an elevation map shows height. The computation needed to do such interpolations involve lots of proximity-dependent calculations that are much easier using a so-called 'raster' data model instead of the vector model that we have been using. Thus far, we have represented spatial features—such as Boston parcel or block group polygons—by the sequence of boundary points that need to be connected to enclose the border of each spatial object—for example, a parcel, a city block, or a Census block group. A raster model would overlay a grid (of fixed cell size) over all of Boston and then assign a numeric value (such as the block group ID or median housing value) to each grid cell depending upon, say, which block group or parcel contained the center of the grid cell. Depending upon the grid cell size that is chosen, such a raster model can be convenient but coarse-grained with jagged boundaries, or fine-grained but overwhelming in the number of cells that must be encoded.
In this exercise, we only have time for a few of the many types of spatial analyses and spatial smoothing algorithms that are possible using raterized datasets. Remember that our immediate goal is to use the Boston sales data to generate a housing-value 'surface' for Boston. (Actually, to keep the computations manageable for a short exercise, we'll compute the surface only for East Boston.) We'll do this by 'rasterizing' the sales data and then recomputing the value of each grid cell based not only on the housing value of any sales that fall within each grid cell but also on the value of nearby homes that sold. As usual, accomplishing this by mixing and matching the specific data and tools that we have in Oracle, ArcGIS, and MS-Access is somewhat more complex and frustrating than we might expect. The point of the lab exercise is not to 'memorize' all the tricks and traps involved, but to understand the complexities and ways around them so that you can 'reverse engineer' a new problem that might be doable if decomposed into similar data processing, model building, and visualization steps.
Start with a brand new ArcMap project and add several Boston themes as indicated below. Load ma_towns00 first so that the data frame is set to Mass State Plane NAD83 (meters).
Order the themes, top to bottom, as follows:
To get started with a useful visualization of East Boston, we suggest the following setup: use the 'Definition Query' in the 'Layer Properties' window for the 'bos_planning_districts' layer so that only the East Boston boundary is visible. Adjust the colors and transparency of their symbols so that the ocean is blue with no border, the town and Planning District boundaries around East Boston are visible as well as the block and building outlines for East Boston. Create a thematic map of median housing value by shading the boston_bg90 theme based on the 1990 medhhval values. Be sure to exclude block groups with medhhval = 0 by setting '0' to be the null value.
ArcGIS's raster manipulation tools are bundled with its Spatial Analyst extension. This section reviews how to setup Spatial Analyst in case you have forgotten how or you learned about raster analysis using another GIS package. It's a big bundle so lets open ArcGIS's help system first to find out more about the tools. Open the ArcGIS help page from within ArcMap by clicking Help > ArcGIS Desktop help from the menu bar. Click the index tab and type "Spatial analyst." During the exercise, you'll find these online help pages helpful in clarifying the choices and reasoning behind a number of the steps that we will explore. The Spatial Analyst module is an extension, so it must be loaded into ArcGIS separately. If the 'Spatial Analyst' toolbar does not appear in your ArcMap toolbar area, do the following: [Note the menu lists in the graphics may differ slightly depending upon the machine and user account setup.]
Although you just activated the "Spatial Analyst" extension, you may still have to add "Spatial Analyst tool bar on the menu manually to use the extension. To add the "Spatial Analyst" tool bar, go to View > Tool bars from the menu bar and click "Spatial Anayst"
Once the Spatial Analyst tool bar is loaded, a new main-menu heading called Spatial Analysis will be available. ArcGIS will remember your settings and, unless you explicitly turn off the 'Spatial Analyst' tool bar, it will now appear whenever you launch ArcMap.
Setting Analysis Properties:
Before building and using raster datasets, we should set the grid cell sizes,
the spatial extent of our grid, and the 'no data' regions that we wish to
'mask' off. Let's begin by specifying a grid cell size of 50 meters and an
analysis extent covering all of East Boston. To do this, click 'Spatial
Analyst > Option '. When the "Options" window pops up.
If successful, the ebosgd layer will be added to the data frame window. Turn it on and notice that the shading covers all the grid cells whose center point falls inside of the spatial extent of the ebosgd layer. Note that the attribute table for ebosgd contains only one row and indicates that 5237 grid cells have the same value of 'East Boston' for the attribute field called NAME. (Attribute tables for raster layers contain one row for each unique grid cell value - hence, there is only one row in this case. Our 50 meter grid cells generate 112 rows and 98 columns for the bounding box that includes East Boston, Note that there are 10,976 grid cells. Restricting the grid cells further to those whose centroid falls within the East Boston boundary reduces the count to 5,237 cells.)
A city assessor's database of all properties in the city would generally be considered a good estimate of housing values because the data set is complete and maintained by an agency which has strong motivation to keep it accurate. This database does have drawbacks, though. It is updated sporadically, people lobby for the lowest assessment possible for their property, and it's values often lag behind market values by many years.
Recent sales are another way to get at the question. On the one hand, their numbers are believable because the price results out of an informed negotiation between a buyer and a seller that results in the 'true' value of the property being revealed (if you are a believer in the economic market-clearing model). However, some sales may not be 'arms length' (i.e., not representative of market value because they are foreclosure sales, sales to family members, etc.). In addition, the accuracy of such data sets are susceptible to short-lived boom or bust trends and, since individual houses (and lots) might be bigger or smaller than those typical of their neighborhood, individual sale prices may or may not be representative of housing prices in their neighborhood.
Alternatively, the census presents us with yet another opportunity for estimating housing value—the median housing values aggregated to the block group level. However, this data set is also vulnerable to criticism from many angles. The numbers are self-reported, only a sample of the population is asked to report, and the data are aggregated up to the block group level. The benefit of census data is that they are cheaply available and they cover the entire country.
3.1 Pulling East Boston Data from the ASSESS_DATA.SALES Table
To begin exploring some of these ideas, we will use an Oracle table called SALES containing information about the sale price of residential Boston parcels that sold between 1986 and 1995. This table is owned by the user ASSESS_DATA. You need to specify the fully-qualified table name, ASSESS_DATA.SALES, when querying from this table, because you do not own this table and you already have a synonym called SALES that refers to the table PARCELS_DATA.SALES. From your Oracle window, take a look at the ASSESS_DATA.SALES table:
The first two characters of the parcel_id are the Ward number. Let's see how many East Boston (Ward 01) sales are included in the ASSESS_DATA.SALES table: (Beware that this query work in Oracle but MS-Access does not support the count(distinct fieldname) syntax from standard SQL).set pause 'more...' set pause on set pagesize 30 set linesize 130 select * from assess_data.sales; SALE_ID PARCEL_ID SALEDATE STATECLASS PRICE SELLER BUYER SA BASE_ID C ---------- ---------- --------- ---------- ---------- -------------------- -------------------- -- ------- - 1 0100003000 01-JUL-87 105 135000 DALY PAUL ET AL PIAZZA DAVID W1 0100003 N 2 0100003000 17-FEB-89 105 85000 PIAZZA DAVID M ONE HUNDRED REALTY T D3 0100003 N 3 0100004000 09-JUN-86 105 125000 CAPONE ROBERT ET AL EMERSON JEFFREY W1 0100004 N 4 0100004000 25-OCT-90 105 115000 EMERSON JEFFREY F DILLON KELLEY A D3 0100004 N 5 0100005000 01-MAR-90 105 90000 DIGIROLAMO JOSPH G DIGIROLAMO JOHN F ET A1 0100005 N 6 0100012000 03-OCT-90 105 115000 STRAW BRUCE E JR ET SCIACCA ANTONIO ET A D3 0100012 N 7 0100013000 28-MAR-88 105 150000 CARCO MARIE MELIA GERARD ET AL W1 0100013 N 8 0100014000 15-JUL-94 105 114000 TRIPI GIACOMO TRIPPE VIRGINIA W1 0100014 N 9 0100019000 01-NOV-89 105 125500 TRICOMI JOSEPHINE BROWN STEPHEN P D3 0100019 N 10 0100022000 30-AUG-91 105 117000 NIGRO ANTONIO J ET A CAHILL STEPHEN F ET W2 0100022 N 11 0100024000 20-NOV-87 105 60000 IANNACCONE MARY SHEA RAFFAELA A1 0100024 N 12 0100028000 12-DEC-88 104 95000 FRANCIS MARK S ET AL MELE ANTHONY D2 0100028 N 13 0100029000 01-OCT-91 105 80000 BONO ANTHONY ET AL STEWART ROBERT ET AL D3 0100029 N 14 0100039000 13-APR-94 105 47800 DAPOLITO ANTHONY M E DAMATO FILIPPO A1 0100039 N 15 0100043000 09-APR-90 105 165000 DAVOLIO MICHAEL ET A SCIORTINO SALVATORE X1 0100043 N 16 0100043000 14-FEB-94 105 85000 SCIORTINO SALVATORE MASSACHUSETTS HOUSIN E1 0100043 N 17 0100043000 23-NOV-94 105 49500 MASSACHUSETTS HOUSIN MORAN DESERRAE J L1 0100043 N 18 0100046000 28-FEB-90 104 47500 MATTHEWS ALICE BICKFORD WILLIAM J E D3 0100046 N 19 0100048000 30-OCT-86 105 159000 CARBONE CHRIS ET AL ROGERS ROBERT ET AL W1 0100048 N 20 0100048000 22-APR-91 105 35000 GARDEN MANAGEMENT CO TWO 06 PRINCETON RLT L1 0100048 N 21 0100048000 27-FEB-95 105 40000 TWO 06 PRINCETON RLT WARSHAW GEORGE 0100048 N more... ... [ 68134 rows total ]
To begin, let's construct a SQL query that pulls into ArcGIS the IDs and (average) sale price of all Ward 01 parcels:select count(*), count(distinct parcel_id), count(distinct base_id)Hmm, there are 2836 sales involving 1836 distinct parcel numbers and 1354 unique base_id values. Condos within the same building/development share the same base_id (and parcel footprint) but have different parcel numbers (the last 3 digits differ). Unfortunately, split parcels can also retain the same base_id—but they have different footprints as well as differing last three digits. For the purposes of this exercise, we will ignore complications involving split parcels but will address complications associated with condo sales and multiple sales of individual parcels.
from assess_data.sales
where substr(parcel_id, 1, 2) = '01';
COUNT(*) COUNT(DISTINCTPARCEL_ID) COUNT(DISTINCTBASE_ID)
---------- ------------------------ ----------------------
2836 1836 1354
create view holdsales1 as
select base_id, avg(price) saleprice, count(*) sales
from assess_data.sales
where substr(base_id,1,2) = '01'
group by base_id;
There are 1354 rows in this table—including one base_id that showed up 84 times in the SALES table! (It is a condominium complex.) Let's look at the year in which each sale occurred:
select to_char(saledate,'YYYY') year, count(*) from assess_data.sales where substr(parcel_id,1,2) = '01' group by to_char(saledate,'YYYY') order by to_char(saledate,'YYYY');YEAR COUNT(*) ---- ---------- 1979 2 1980 1 1984 2 1985 2 1986 367 1987 364 1988 241 1989 271 1990 216 1991 167 1992 278 1993 327 1994 363 1995 235Hmm, most sales are indeed during the 10-year period 1986 through 1995 but there are a few earlier sales that we should probably omit from our analyses. Refine your view (call it holdsales2) to include only those East Boston sales that occurred during 1986-1995. Connect to Oracle and pull this holdsales2 table into ArcGIS using the techniques in Lab 4.
3.2 Mapping East Boston Parcel Sales
There are about 100,000 ground parcels in Boston but, for this exercise, we will focus only on the 6500+ East Boston parcels that we have used in earlier labs. At this point, be sure that you have added the ebos_parcels05.shp shapefile to your ArcMap view. Note that the attribute table contains both a parcel_id and a base_id column. We could use base_id to join the holdsales2 table to 'ebos_parcels05.shp' and then do a thematic map based on sales price. (Try it later if you have time.) But many of the parcels are so small and the number of sales so many that the resulting thematic map of housing prices won't be too readable. In addition, the unsold parcels won't be shaded at all and parcels with multiple sales will only be shaded based on the value of the last sale.
Instead, we would like to construct an appropriate average of resales and nearby sales as an indication of housing value in the area surrounding the sale. One way to do such an interpolation is to compute a housing value for any particular location that is the average of, say, the nearest dozen sales. When computing the average, we might want to weight distant sales lower than close-by sales.By rasterizing the space within East Boston we can identify a finite and manageable number of places to consider (each grid cell) and then we can use the capacity of the GIS to know which sales are 'nearby.' (We don't have to rasterize the Ward. We could consider every parcel and set it's value to be a weighted average of its sales—if any—and its neighbors. But that requires a lot more geometry calculations - e.g., to compute an adjancency matrix - so analysts often settle for a quicker raster analysis as illustrated next.) Let us handle the condo and multiple sale issue on the Oracle side and the neighborhood average smoothing on the ArcMap side.
Let us create a new view called holdsales3 that averages the sales price of all sales associated with a single ground parcel (base_id) during the 10 year period. A little inspection of assess_data.sales suggests that this may not be as simple as we would hope and we need several steps. Many 'base_id' fields are blank or incomplete and the 'condotag' column seems to either contain 'N' for non-condo or be blank. Also, we are not yet certain that the parcel_id in the sales table (for sales from 1985 through 1995) will match pid_long or base_id in the fiscal year 2005 assessing table for Boston. Let's do some checking on the Oracle side. The sales data are in assess_data.sales, the attribute table from ebos_parcel05.shp is in an Oracle table called ebos_parcels05, and the FY05 assessing data is in bos05eb.
Run a few SQL queries such as the following to get a sense of how well the files match up.
select count(*) from assess_data.sales s, bos05eb b where substr(b.pid,1,2) = '01' and s.parcel_id = b.pid and cm_ID is not null; -- 892 rows
select count(*)
from assess_data.sales s
where substr(s.parcel_id,1,2) = '01';
-- 2836 rows
select count(*) from bos05eb b where substr(b.pid,1,2) = '01'; -- 7235 rows
Groan! Only 892 of the 2836 parcel IDs in the sales table match any of the FY2005 parcel numbers. Something is wrong! Fortunately, we have saved a 1996 parcel map for Boston. That year is just at the end of the sales period for which we have data and happens to be before Boston did a renumbering of its parcels! The 1996 parcel shapefile is saved as parcel1996.shp (in NAD83 meters) in the class data locker K:\data\parcel1996.shp and the assessing data for 1996 in a form equivalent to the schema of bos05eb is saved in Oracle as a table called parcel96. For your convenience we have saved key atttribute fields from for all 1996 Boston ground parcels with non-zero parcel_id into an Oracle table called parcel96shp. We have also saved the East Boston portion of parcel1996.shp into a smaller shapefile called ebos_parcels96.shp. This shapefile has also been added to the eboston05 subfolder of the class data locker: K:\data\eboston05.
Let us check how well the data from these 1996 files match up with the sales data.
select count(*) from assess_data.sales s, parcel96 b where substr(b.parcel_id,1,2) = '01' and s.parcel_id = b.parcel_id; -- 2737 rows
All but 99 East Boston sales records match. However, this match is for the assessing records - not the shapefile containing the ground parcels. Let us see if the field that looks like a complete ground parcel ID in the attribute table of ebos_parcels96.shp matches parcel_id in assess_data.sales:
select count(*) from assess_data.sales s, parcel96shp p where substr(p.parcel_i_1,1,2) = '01' and substr(p.parcel_i_1,1,7) = substr(s.parcel_id,1,7); -- 3109 rows - TOO MANY
Since condos have multiple records in both the sales table and the shapefile, we have extra matches. (Do you see why?) Let's try this query instead:
select count(*) from (select distinct substr(parcel_id,1,7) parcelsale from assess_data.sales) s, (select distinct substr(parcel_i_1,1,7) parcelshp from parcel96shp where substr(parcel_i_1,1,2) = '01') p where p.parcelshp = s.parcelsale; -- 1347 rowsHmm, now the number looks low! But maybe not:
select count(distinct substr(parcel_id,1,7)) from assess_data.sales where substr(parcel_id,1,2) = '01'; -- 1354 rows
Finally!! We see that there are 1354 unique East Boston ground parcels involved in residential sales during the 10 year period and all but 7 of the parcel IDs matched IDs in the shapefile. That looks pretty good.
Let's remind ourselves where we are at this point. We have two shapefiles of East Boston parcels from 2005 and from 1996: ebos_parcels05.shp and ebos_parcels96.shp. We also have the three Oracle tables assess_data.sales, assess_data.parcel96, and assess_data.parcel96shp containing, for all of Boston, the 1985-1995 residential sales information, the 1996 assessing office information, and selected attributes from the ebos_parcels96.shp. Shortly, we will join the three Oracle tables together to pull off average sales prices for East Boston residential parcels and associate the prices with parcel IDs that can be located and mapped using the 1996 parcel shapefile. Whew! We then want to use these sales prices to intepolate a land value 'surface' for all the residential parts of East Boston, not just the ones that sold during our 10-year period.
We could use ebos_parcels96.shp for this purpose but it will be easier to do the interpolation if we join the sales data to a 'point' coverage rather than to a polygon coverage. (ArcGIS can 'raterize' the parcel layer but doesn't have a simple interpolation option that will first compute a new price for each parcel polygon that is a (weighted) average of nearby parcels that have sold. But ArcGIS does have an interpolation option that will do such a nearest neighbor weighted average computation for a point layer.) Since we'll only have time today to show one of the simpler interpolation methods, we'll do that. You can create a point (centroid) shapefile for East Boston parcels by adding X and Y fields to your copy of the ebos_parcels96 attribute table and then using the Field/calculate-geometry choice from the attribute table options menu. However, there are a few 'gotchas' to consider before trying to do these calculations. First, it is more reliable to add fields to an attribute when the table is not already joined (or related) to another table and you may have joined the shapefile to the East Boston assessing table (parcel96). Second, the data frame is set to Mass State Plane NAD83 (in meters) but we have not yet checked the on-disk coordinate system for ebos_parcels96. The fine print in ArcGIS help warns us that centroid calculations using 'calculate geometry' can only be done on layers that are being displayed in their native coordinate system.
If you want to be sure to avoid these two difficulties, add a fresh local copy of ebos_parcels96.shp into a new data frame with no other layers. Let us rename this to ebos_parcels96_xy.shp since we will change it by adding X/Y fields and we want to remember that it will be different from the original. Check that the data frame is in Mass State Plane meter coordinates - it should be since that is the coordinate system used by ebos_parcels96.shp on disk. (We could also calculate centroids using NAD83-feet if that were the coordinate system we used for the on-disk version of the shapefile). Now we can compute X/Y values for the parcel centroids in the native coordinates. Add two fields called CentX and CentY to the ebos_parcels96_xy attribute table and set the data type to double with precision=12 and scale=3. Now, right click each attribute column and use the 'calculate geometry' option to compute the X./Y centroid values.
Next, we want to created a point shapefile using the centroids. (Thus far, we have only added the centroid columns to the table but ebos_parcels96_xy.shp is still a shapefile of polygons.) Edit the properties of ebos_parcels96_xy.shp so that only the fields parcel, pid_long, wpd, CentX, and CentY are visible and open this attribute table. (It is okay to keep all the fields but these are the only ones that we really need.) Export these columns into a new DBF table on a writeable local drive with the name ebos_xydata.dbf. Finally, add this table to ArcMap and use the Tool/Add-XY-Data option to convert it to a new shapefile that the system will call 'ebos_xydata events'. When creating this shapefile of points, you may want to edit the coordinate system to state that it is Mass State Plane (mainland) NAD83 (in meters). Finally, let us change the name of the shapefile to ebos_centroids.shp (to get rid of ' events' in the name!). One way to do this is to export the shapefile to a new one called ebos_centroids.shp. Check that ArcMap knows the coordinate system for this shapefile and remember that it is saved on a local disk and needs to be copied to your locker before you log out!
Now, activate your original data frame and add in ebos_centroids.shp. Do the centroids show up inside the corresponding parcels in the ebos_parcels96.shp? What about ebos_parcels05.shp? This new shapefile of centroids is saved on disk in Mass State Plane NAD83-meters whereas the original polygons for ebos_parcels05.shp are in NAD83-feet. However, ArcMap can handle any conversions within a data frame as long as it knows the coordinate system of the shapefile as it is saved on disk.
After all this, we finally have a set of centroids that can be joined both to the sales data in Oracle and to the appropriate parcels associated with the centroids we have mapped. So we are almost ready to interpolate housing market values in East Boston. In Oracle, create a new view, holdsales3, that joins two of our Oracle tables and averages all the sales that have occurred on the same property. Here is a SQL query to get you started:
create view holdsales3 as select substr(parcel_id,1,7)||'000' parcel_id, base_id, avg(price) avg_price, count(*) scount from assess_data.sales s, (select distinct substr(parcel_i_1,1,7) parcelshp from parcel96shp where substr(parcel_i_1,1,2) = '01') p where p.parcelshp = substr(parcel_id,1,7) and to_char(saledate,'YYYY') > 1985 group by substr(parcel_id,1,7)||'000', base_id;
There should be 1347 rows in this view. Do you understand why we build parcel_id in this way? This list is not quite what we need since some of the sales are not 'arms-length' (for example, they may be a sale to a family member or a foreclosure that is not likely to represent market value). But let us use these sales to do our raster grid interpolation and come back to the more subtle questions later.
3.3 Interpolating Housing Values from Sales Price Data
We are now ready to do the interpolation via the averaging approach that we discussed above.
The grid layer that is created fills the entire bounding box for East Box. The interpolated surface is shown thematicly by shading each cell dark or light depending upon whether that cell is estimated to have a lower housing value (darker shades) or higher housing value (lighter shades). Based on the parameters we set, the cell value is an inverse-distance weighted average of the 12 closest sales.. Since the power factor was set to the default (2), the weights are proportional to the square of the distance. This interpolation heuristic seems reasonable, but the surface extends far beyond the East Boston borders (all the way to the rectangular bounding box that covers East Boston). This problem is easily fixed by setting the 'mask' to be used when interpolating. We can prevent the interpolation from computing values outside of the East Boston boundary by 'masking' off those cells that fall outside. Do this by adding a mask to the Analysis Properties:
With this analysis mask set, interpolate the Saleprice values in ebos_centroids.shp once again. All the values inside East Boston are the same as before, but the cells outside are 'masked off' and set to be transparent. Delete the unmasked surface, and rename the masked surface to be 'Surface-east-boston-sales'.
Does the housing value 'surface' look reasonable? Are you sure that the interpolation did not include those centroids with no sales? Does it look like it is distorted by unusually high or low sales prices in some parts of town - perhaps because a very large house sold for a large amount, or because foreclosures resulted in a fire sale at unusally low prices. You can easily exclude certain parcels from the selected set and redo the interpolation .
Closer inspection of the Oracle data dictionaries associated with the Boston assessing data indicates that the SALES table contains a stateclass field that indicates the parcel's land use type as a three-digit code. A few lookup tables are also of interest. We have already seen the 'stclass' codes. Those are the land use codes that we used from the mass_landuse table in earlier labs. The STCLASS table in Oracle has these state land use codes and relates each stateclass code to this two-character categoryid and to another abbreviated 'landuse' code. Also, a SALECODE table explains the codes that indicate which sales were 'arms length' and which were not a good indication of market value. (The tables STCLASS, and SALECODE are also owned by the user ASSESS_DATA. As such, it is a good idea to fully qualify the table name with the owner name (e.g., ASSESS_DATA.SALECODE) as shown below.
A quick look at the breakdown of sales codes indicates that about half of the East Boston sales were non-market value sales to/from a financial institution—probably due to foreclosure and only the SX, W1, W2, and X1 sales are likely to be representative of fair market value.
select s.salecode, substr(descript,1,65) description, count(*)
from assess_data.sales s, assess_data.salecode c
where substr(base_id, 1, 2) = '01'
and s.salecode = c.salecode
group by s.salecode, substr(descript, 1, 65)
having count(*) > 10
order by salecode, substr(descript, 1, 65);
SA DESCRIPTION COUNT(*)
-- ----------------------------------------------------------------- ----------
A1 Sale among members of the same family or relatives 219
B1 Intra-corporation sale; sale between corporate affiliates 17
D2 Property significantly improved after assessment but before sale 27
D3 Sale of improved property at an atypical price; "speculative" sal 191
E1 Sale to or from federal, state, or local government 21
L1 Sale to or from a financial institution not reflecting market val 594
MW Multi-parcel sale, not personally verified 24
MX Multi-parcel sale, personally verified 13
N1 Sale included an assumed mortgage 12
N3 Multi-parcel sale conveyed in one deed, not fair market value 109
N9 Sale price of unit reflects a tenant cash discount 27
O1 Substantial remodelling/rehabilitation after the date of sale 24
SX Auction sale; verified as arm's length 112
W1 Price within 10% of most current assessment 738
W2 Consistent with inspection and market, sale not verified by staff 55
X1 Field inspected by staff, sale personally verified by staff 372
With this in mind, redo your earlier query to create a new view that restricts the selection to fair-market sales of condos and 1-4 family units. You can do this by adding these clauses to the appropriate parts of your SQL statement: and (landuse in ('R1', 'R2', 'R3', 'R4') or s.stateclass = 102) and (salecode in ('SX', 'W1', 'W2', 'X1'))
Run this query, pull the new table across to ArcMap and redo the interpolation.
Let's bring this view over to ArcGIS as holdsales4 and do another interpolation.[The row count should be around 900 for the number of sales that meet the above conditions (and around 800 unique base_id values)] Do you have faith in these new results? Do you like the way we have handled multiple sales? Should we adjust prices based on housing inflation estimates during the 10-year period? If you were doing this analysis for real, you would want to iron out your methodology on a small part of the City (such as East Boston) before applying the analysis to the entire city. In such a case, you might want to adjust the grid cell size and, perhaps, adjust the mask so that it excluded parts of the city that weren't really residential neighborhoods (such as the airport in Ward 1 or Franklin Park in the south of Ward 12). Picking appropriate cell sizes and masks can be time consuming and is as much an art as a science. For example, you could use as a mask a rasterized version of the parcel map. In that case the streets would be masked off and any grid cell whose center was not in a block would be masked off. There are also many more sophisticated statistical methods for interpolating and estimating value surfaces. (Take a look at the 'interpolation' choice under 'spatial analyst tools' in ArcToolbox).
We don't have time to refine the analysis further (for example to examine multiple sale issues, split parcels, and price inflation during the 1986-1995 period), or to compare alternative interpolation models. But this exercise should be enough to give you a sense of when and why it is often important to have more data manipluation capabilities than ArcGIS provides, and how the ArcGIS/Oracle link matters as you begin to work with larger datasets. Also think about this interpolation approach versus using a thematic map of block group data or a nearest-neighbor smoothed grid cell map of either block group data or individual parcel assessments.
Analyzing 'real world' datasets—especially disaggregated administrative datasets such as assessment and sales data—often requires the complexities and subtleties invovled in this exercise. What should a planner know in order to get past the simple (and wrong) first or second round of interpolation? How can you tell if someone you hire knows how to do disaggregated spatial analyses that are meaningful? How can you institutionalize the data processing and modeling so that planners can routinely tap these administrative datasets for planning purposes with confidence and reliability? (Development of the 'salecodes' used to tag market-value and non-market-value sales is one example of how data quality can be distinguished. These codes became standardized as a result of Mass Dept. of Revenue regulations regarding reporting requirements of Town Assessors to assist the State in comparing assessments to recent sales as part of the State process of certifying local reassessments every three years. However, given Boston's parcel numbering schemes and limited cross-reference tables, the handling of condo sales and parcel subdivisions remains complex and prone to analysis errors.)
3.4 Using Model Builder to Manage Steps in a Spatial Analysis
Next, we want to gain some experience with the ArcGIS Model Builder to diagram and save as a 'model' the steps we just carried out to develop the interpolated housing value surface for Boston's Ward 1 (East Boston).
To create a new ModelBuilder model, first make sure that you know where ArcGIS is going to save your model. The default location can be found by going under the main menubar to Tools->Options, and in the resulting dialog box, looking at the "Geoprocessing" tab. The third item there allows you to specify the location of your 'My Toolboxes' folder. The default location on MIT's network is H:\WinData\Application Data\ESRI\ArcToolbox." This should allow you to access the same model from different workstations, but if you wanted to share (or back up!) a model, it is good to know where it is located. You can't directly "open" or "save" ModelBuilder models. However you can move them around in ArcCatalog. In general, if you are creating models specific to a particular project, consider putting them in a directory near the project data.
Review the ArcGIS help files regarding Model Builder and reread the 'Model Builder' notes in Lab #8 of the introductory GIS class (11.520), http://mit.edu/11.520/www/labs/lab8/lab8.html. Then, create a new 'model' that results in a diagram with the steps shown below: (The example in the diagram shows Boston Ward 12 instead of East Boston, but the steps are the same.)
There are several ways to build the flow chart, but you will probably find it most convenient to start building the model by dragging and dropping the appropriate operations from the Toolbox menu into the model builder flow chart window and then specifying the appropriate input data and parameters. Don't try tp connect to Oracle from within Model Builder. Add the View from Oracle (e.g., holdsales4) into your ArcMap session in the usual way and start the Model Builder diagram with the 'join' step that joins holdsales4 to the parcel centroid shapefile. Also, note that Model Builder uses the spatial analyst tools in ArcToolbox, which are a bit different from those used above from the 'Spatial Analyst' menu. In particular, you will need to set the desired raster analysis settings for ArcToolbox by right-clicking 'ArcToolbox,' choosing 'raster analysis setting,' and (once again) setting your grid cell size and mask. Note that, for the ArcToolbox settings, you can set the extents to match those of another layer only if that layer is a raster grid (not a shapefile).
When using ArcToolbox, be aware that, on some WinAthena machines, you will get error messages indicating 'script' errors when trying to set parameters and environment variables for items in the flow chart. Just click 'yes' to continue and eventually you should get to the correct dialogue box. ArcGIS has some trouble on WinAthena machines with calling java scripts - for example to open the help files to the appropriate page. You can always open the ArcGIS help files separately from the Start/Programs/ArcGIS/ArcGIS-Desktop-Help... menu.
(3) Lab Assignment
Most of the questions suggested above are simply to stimulate your thinking. Only a few items need to be turned in for this lab assignment. They are based on the revised SQL query that you used in Oracle to create holdsales4 for the sales that were considered arms-length. (You are welcome to redo the interpolation for the foreclosures - there were many in East Boston during this period!). Turn in the following:
Turn in this assignment electronically via Stellar. This lab assignment
is due Tuesday, March 16, 2010 at the start of class.
Home | Syllabus | Lectures
| Labs | CRON | MIT
Created by Joseph Ferreira
Modified: March 2001-2004 by [thg] and
[jinhua];
Last Modified: 2 March 2010 [jf]