Massachusetts Institute of Technology - Department of Urban Studies and Planning

11.188: Urban Planning and Social Science Laboratory
11.205: Intro to Spatial Analysis (1st half-semester)

Spatial Analysis (Vector Models)

March 10, 2021



Administrative Notes:

  • Lab and Homework Schedule
    • Hwk #1 due Friday
    • Lab #3 is due next Monday (March 15)
    • Hwk #2 is now online Part 1 due Friday, March 19 (Part 2 due March 26)
  • Today's topic spatial analysis using vector models
    • Overlay, intersect, and other 'spatial joins'


Major Topic: Spatial Analysis (using Vector-based Data Models)

  • Taking advantage of GIS to handle spatial operations
    • Do vector-based spatial analysis today with lab next Monday
    • Do raster-based spatial analysis next Wednesday will lab the following week
    • Then a test during lab and a guest lecture
      • First half of class (11.205) ends here
    • Second half includes web mapping, field data collection, and mapping APIs using R
    • That's all we have time for in lectures and lab exercises (some terrain analysis and 3D modeling at end)
    • Much of second-half (11.520) is the (small) individual project
      • For project: start thinking about topic of interest, spatial component, and possible data source 
  • Spatial 'Join' issues:
    • Combine data from different sources based on location & proximity
    • Present results so that they can by visualized and/or encoded and saved for further use

 

  • Examples of spatial 'joins' - we have already seen some of the tools for these examples
    Which Cambridge house sales fall within which census blockgroups
    • Point-in-polygon operation
    • Because we want to know about the neighborhood surrounding lower-price homes...
    • Can we add census data to the sales attribute table? (not just 'select' the census block group containing the sale)
    point-in-polygon example
    Which blockgroups are in which Mass Town?
    • Polygon-in-polygon operation
    • Tricky because polygon boundaries may not line up (sliver & gap issues)
    • How do we create a cross-reference table (for blockgroups and towns)
    • Which overlap criterion: intersect, completely within, centroid within,...
    sliver-problem-example
    Which roads are in the flood plain?
    • Line-in-polygon operation
    • Should you cut up the road segments or list line segments with part (>50%) inside?
    • What happens to attributes of split road segments (keep names, divide length,...)
    fresh-pond-mall-example
    Which shopping centers are close to major highways
    • How do we represent shopping centers (point or polygon)
    • How do we represent highways? (line, polygon, or just the exits)
    • Where is this exit? (the perimeter of the Wrenthan outlet mall is > 1 mile)
    wrentham-outlet-mall-example

  • There are many interesting Proximity questions
    • ...and, we now have the background to utilize much of the QGIS Toolbox on our own


General spatial operations

  • Equals – are the geometries the same?
  • Disjoint – do the geometries share at least a common point?
  • Dissolve – Eliminate boundaries with same attribute value on each side.
  • Intersect – Create all the combined pieces when features are overlaid
  • Touch – do the geometries intersect at their boundaries?


GIS Marketing Example:

(1) Site Selection for Low Cost Grocery Store Chain

    • Conceptual Model
      • Brainstorm criteria for "good" locations
    • Case Study Example
      • Factors used in actual Commercial Shopping Center Site Selection
      • Powerpoint slides used by commercial firm to market site selection tools
        by Edens & Avant and RPM consulting  
    • Think about these marketing slides?
      • Is the methodology or analytic scope overstated?
      • What considerations are omitted, shortchanged, badly measured?
      • From whose point of view is the siting service helpful or hurtful?
(2) 5 minutes general discussion of questions



<<< back to overlay and spatial proximity tools >>

Buffering Vector Data

  • Buffering creates new polygon features by computing which areas are within a certain specified distance of the selected object.
  • Buffering may define a new spatial object whose boundary enclosed all the area within the buffer
    • the geometric computations can be complex
    • beware of islands, slivers, etc.
    • involves choices when buffers of selected features overlap (dissolve...)
  • Buffer Input: Points, lines or polygons.
  • Buffer Output: Polygon(s).

 

Buffering Raster Data

  • Classify grid cells according to whether they lie inside or outside the buffer.
  • In addition to distance, you can also use any function of grid cell values (e.g. travel speed) as a buffer criteria.
  • The result is a new table of cell values for each grid cell - e.g., a raster map distinguishing all grid cells with, say, less than 30 minutes travel time from a particular point (e.g., a city center).
  • More on raster modeling and analysis after next Monday's vector analysis lab.

 

Point in Polygon

  • Whether a point lies inside or outside a polygon.
  • In the case of many points and many polygons, the task is to assign points to polygons
  • Common operation:
    • Many mailing lists and GPS readings are 'geocoded' into point files
    • You need point-in-polygon operation to tag the attributes of the point map layer with characteristics of the places they are 'in'
    • Sometimes, point-in-polygon is used to handle difficult polygon-on-polygon operations such as the blockgroup-in-town issue in HW#2
      • Variations along boundaries create slivers and make polygon-on-polygon overlay unsatisfactory
      • Compute centroids of blockgroups first, or extend one boundary with a small buffer and then intersect, ...
      • Illustrate point-in-polygon operations using blockgroup centroids to see which ones fall inside which towns
    • ArcGIS operation
      • Find the right tool in ArcToolbox (use the 'search' tab if necessary)
      • Use 'spatial join' operation in order to add polygon attribute information to the point attribute table
      • We have seen this join-by-spatial-location operation when doing join-by-attribute operations
      • This operation - and many others - can be accessed from ArcToolbox (under the 'analysis tools' menu)

Line in Polygon

  • Used to determine whether a line lies inside or outside a polygon.
  • In the case of many lines and many polygons, the task is to assign line segments to polygons.
  • ArcGIS operation:
    • Intersect lines and polygons in order to tag attributes of line segments with polygon information
    • Since polygons aren't likely to cross all lines precisely at the ends of the line segments, this operation can be complex since the operation must split lines at the polygon boundaries and assign the appropriate attributes to each new segment.

Polygon Overlay

  • Many possible ways of intersecting polygons with one another
    • Intersection: Polygon 1 + polygon 2 = polygon 3 (with common areas)
    • Union: Polygon 1 + polygon 2 = polygon 3 (with both areas)
  • Many issues about how to interpret the results
    • Handling slivers due to geometric errors or differences in degree of generalization of boundaries
    • Handling attributes
      • Some attributes apply to each fragment without change (e.g., in flood plain, in city...)
      • Some attributes need to be 'allocated' to each fragment (e.g., population in each part of blockgroup)
    • In next Monday's lab exercise we will have examples (e.g., estimate number of people living within a specified distance of MIT's biology building)
      • We will do an 'area allocation' to find the fraction of area in a block group that is inside the buffer and use that fraction to estimate the population living inside the buffer


overlay diagrams

 

More spatial data processing

  • Dissolve features based on an attribute.
  • Merge layers together
  • Clip one layer based on another (including 'erase' as well as 'cutout')

 


Spatial Analysis Example - Locating a Shopping Center

  • Criteria for 'good' location: 
    • Within two miles of a major highway, but no less than one quarter mile away from a major highway.
    • At least one mile away from a water body (e.g., rivers, lakes).
    • Not in the flood zone.
    • Close to a major residential area (at least 2500 households within 5 miles).
    • On a large enough piece of vacant land (at least 10 acres)
  • Classic 'site suitability' analysis
    • Decompose task into many ArcGIS analysis steps

Data Required

  • Highway layer (linear feature),
  • Vacant land layer (polygon feature),
  • Rivers layer (linear or polygon feature),
  • Lakes layer (polygon feature),
  • Land use layer (polygon feature),
  • Flood zone layer (polygon feature),
  • Census layer (blockgroup level, polygon feature).

 

Spatial Analysis Process

  • Select vacant land parcels larger than 10 acres, result: vacantlg10.
  • Buffer highway layer with 2 mile and .25 mile buffers, result: hw2 and hwquarter.
  • Buffer Rivers and Lakes layers with 1 mile buffers, result: river1 and lake1.
  • Intersect vacantlg10 with hw2 = vacanthw1, useful land.
  • Union hwquarter with river1, lake1 and flood (need to do two layers at a time), result: notusable, (land not meeting the criteria)
  • Remove notusable from vacanthw1, result: usable.
  • Intersect usable with census blockgroup layer (touch the boundary of), result: usableblock.
  • Calculate and reselect usableblock having the total number households larger than 2500, result: bigusableblock.

 

Additional criterion

  • Add socio-economic criteria: “The median household income of the proximate residential area should be less than $30,000”?
  • Refine 'major residential area' criterion
    • Require enough people within a buffer area around site
    • Require proximity to dense residential area

 


QGIS Example using 'DIFFERENCE' Operator:

Cut out the industrial area from two block groups near Sherman and Walden St. in Cambridge (near Fresh Pond).  The folder, cambridge_lecture7qgis.zip, in the class data locker contains all the files needed for this example.  To illustrate the process using small datasets, we extracted as separate shapefiles two block groups (from cambbgrp) and all the 'industrial' land in Cambridge (from camb_area_lu_1999).

Cambridge block groups: STCNTRBG = 250173546003 and 250173546001
with Industrial land use (LU21_CODE=16) overlaid across Sherman St.

2 Cambridge block groups and industrial land
Open 'Processing / Toolbox' and choose 'Difference' option within 'vector overlay'
opening toolbox panel
Cut industrial land out of the two block groups using the 'difference' tool
(Input layer is camb_bg_fresh_pond to get parts of it that are *not* within industrial zones.)
difference tool settings
Log tab of 'difference' tool reports output of running python algorithm.
log of output from difference tool
End up with the industrial area cut out from the two block groups
NOTE: right-side block group now consists of two polygons.
result of difference operation
BEWARE: all attribute values are retained in attribute table - even if no longer correct (e.g., area, perimeter, and landacre). You can recalculate area and perimeter, but the change in landacre is unknown. (Do you understand this?)
no change in attribute tables
Use 'Field Calculator' to recalculate last column ('tmp') or add new field so it shows the new area of each block group.
recalculate area

new values of tmp field

Note that the new area of each block groups is now saved in the 'tmp' field.



Example used Color Orthophotos from MassGIS



Homework #2 (now online)

  • Part 1 (due March 19): SQL queries in QGIS to slice, dice and aggregate census data

    • Aggregate census tract data to town level  and compute counts and percentage of senior citizens who are poor in each town
  • Part 2 (due March 26): Find good location for senior center in Cambridge

    • Analogous to siting the grocery store in the 'GIS marketing slides' example

    • This will take some time! Spread out the work!

  • start early, focus on spatial operations for suitability analysis
  • save often, take care when naming intermediate files, save several *.qgs project files early and often - with different version numbers as you move along.
  • expect to redo overlays a few times before you like result (so you need to remember the saved pieces and combine them into new projects)


<< This section illustrates the ERASE operation using ArcGIS software >>

ArcGIS Example using 'ERASE' Operator:

Cut out the industrial area from two block groups near Sherman and Walden St. in Cambridge (near Fresh Pond).  The folder, cambridge_lecture7.zip, in the class data locker contains all the files needed for this example.

Cambridge block groups: STCNTRBG = 250173546003 and 250173546001
with Industrial land use (LU21_CODE=16) overlaid across Sherman St.

cambridge-industry-overlay-example
Choose the ERASE extract tool in ArcToolbox
cambridge-industry-intersect
Use the industrial land use boundaries to erase the industrial parts of the two block groups
camb-erase-window
End up with the industrial area cut out from the two block groups
cambridge-after-erase
BEWARE: all attribute values are retained in attribute table - even if no longer correct (e.g., area, perimeter, and landacre). You can recalculate area and perimeter, but the change in landacre is unknown. (Do you understand this?)
cambridge-add-newarea
cambridge-area-calc

cambridge-newarea-calc2

Note that the new area of the block groups is reduced by the erased area.




<< Remainder is only relevant for those using ArcGIS software >>


ArcGIS Example of 'polygon-in-polygon' operation

(to find which blockgroups are in which town for your use in Homework #2)

  • Here are the steps we used to produce the Town/Blockgroup cross reference table (so the homework would be easier and you would not have to figure out all these steps):
    • A 'Spatial Join' involving polygon-to-polygon operations will have problems
      • It results in many 'sliver' issues along the edges of the blockgroups and town boundaries
    • Instead, compute centroid of blockgroups and then do a point-in-polygon operation to find all the blockgroups in Cambridge and the surrounding towns
      • Recent versions of ArcGIS can do this easily by:
        1. Using the 'centroid within' choice when doing a 'spatial join' polygon-on-polygon operation, (see the parameters for the 'spatial join' operator in ArcToolbox), or
        2. Using the 'calculate geometry' options in the 'field caculator' to add fields with the XY centroid values
      • We will do it the 'long way' to demonstrate the use of Python code and visual basic scripts (VB Script) to grab data from the in-memory ArcMap geometry and save the values in attribute columns of shapefile layers

  • How can we compute (and save) centroid XY values ?
    • Centroids are interior points 'close' to the 'middle' of a polygon
      • What should be the centroid of a C-shaped polygon?
        • Beware, ArcMap centroids are *not* guaranteed to be interior to the polygon
      • How can you tell if a point is 'inside' of a polygon?
    • Centroids are NOT part of 'shapefile' data model - which only stores boundaries
      • BUT, they are already computed and available in the runtime ArcMap data model
    • Use ArcGIS help to find a way: look up 'Centroid - calculating coordinates' in ArcGIS help
      • First, simplify the task by selecting the blockgroups in the 5-town area around Cambridge
        • Select blockgroups in and around the 5 towns and save to a new shapefile (blkgrp_start.shp)
        • add empty fields (called centX and centY) to the attribute table of blkgrp_start (as double precision numbers)
      • Strategy: using Python code (for ArcGIS version 10+) to pull centroid X,Y data from the ArcMap data model into the XY fields of the attribute table
        • You may read about Python coding in the online ArcGIS help files.
        • Right click on your new X field and select Calculate Values...
        • Click the 'Python' button in the 'Parser' section at the top and then set centX to be the python expression for the 'X' portion of the 'centroid' parameter of the feature layer: !shape.centroid.X!
        • There is no need to click the Show Codeblock box.
        • Click 'OK' to calculate the values and then repeat for centY
    • Use DataManagement/Add-X-Y-data option in ArcToolbox to create a pin map of the centroids (search for 'XY' and find the 'make XY event layer' tool to turn the centroid columns into a mappable point layer)
      • Once you can see the pin map, you still need to turn it into a shapefile in order for ArcGIS to be able to use it in a 'join by location' operation. Otherwise, ArcMap will complain that it does not know the coordinate system for the newly created cambbgrp_layer of block group centroids.
      • Export the pin map into a point shapefile 'blkgrp_c_shape' of the XY centroids and add it back into your table of contents
    • Finally, Use the point-in-polygon spatial join operation (join-by-location) to attach town attribute information to the point attribute table 'blkgrp_c_shape'.
      • Right-click on the 'blkgrp_c shape' layer and choose the 'join' option from the 'joins & relates' menu item,
      • Instead of joining attributes based on relating two tables, we want to 'join data from another layer based on spatial location'
      • The layer we want to join to the selected layer (blkgrp_c shape) is the town layer (which we called cambridge5 when we saved the 5-town boundaries above),
      • This operation will add the columns in the cambridge5 attribute table to the columns in the 'blkgrp_c Events' attribute table where the rows are matched based on the town that contains each block group centroid. The results are stored in a new shapefile with the same block group centroid points and the expanded attribute table.
      • You won't be able to do the spatial join unless each of the layers you are joining have a known spatial reference system - because the spatial join won't be meaningfull unless both layers can have all their coordinate expressed (or transformed) into the same coordinate system.
      • This expanded attribute table has the blk_key in one column and, now, the town name and code in another column. So, we turned off all the columns except for blk_key and Town and and selected only those blockgroups in the 5 towns. Exporting this table produced the block-group-to-town (blkgrp2t.dbf) lookup table that we have included in the 11.520 data locker for your use in HW#2.


Last modified 9 March 2021 [jf]
Back to the CRON Home Page. Back to the 11.188 Home Page.