Massachusetts Institute of Technology - Department of Urban Studies and Planning

11.188: Urban Planning and Social Science Laboratory

11.520: A Workshop on Geographic Information Systems

Georeferencing, Data Creation & Network Analysis

 

April 14, 2021, Joseph Ferreira
(based in part on '06 Lecture by Michael Flaxman)

 


Administrative

  • Project Proposal: upload to Stellar if not yet there
  • Homework #3: Raster Analysis due today
  • Lab #6 (Web scraping and tracking APIs): API usage and mapping field-collected data
    • Part 1 due Wednesday, April 21; Part 2 due  Monday April 26

Today

  • Creating GIS Data from non-GIS sources
    • Already discussed: create point shapefile from X,Y points in a table (e.g., points of interest and tracking waypoints)
    • Discuss other ways today:
      • From imagery, CAD files, digitizing
      • From mailing addresses (geocoding, georeferencing, or address matching)
    • Illustrate polygon creation in QGIS
  • Accessing database servers in the cloud (PostgreSQL + PostGIS
  • Network Modeling and Analysis


 

Creating GIS Data from Non-GIS Sources

Common sources

    Raw Imagery

    CAD Files

    Digitizing

    Addresses


Different sources, different methods

Raw Imagery

  • Georeference (if necessary)
  • Classify (by Color/Spectral Characteristics) or
  • Digitize (aka Trace)

  • GeoReferencing Imagery

    Georeferencing Imagery

    Note: JPG and TIF images can be directly read into ArcGIS. But by default, they won't have an appropriate coordinate system and won't overlay anything else. (JPEG 2000 and GeoTiff are standard formats that are not always supported but can save coordinate system metadata information along with the image.)

    So we need some data with a coordinate system we trust. (Warning: Google Earth, etc. can be *very* imprecise internationally - see error in this image below).

    Georef Step 2


    So, now we have a valid coordinate system, but our image is clearly pretty far from being correctly registered.

    The solution? The "Georeferencing Toolbar" (View->Toolbars->Georeferencing). This tool allows you to identify relationships (by clicking on the screen):
    • If you identify one corresponding location in each data layer, the software computes a simple shift.
    • If you identify more than one location, the software does a more complex transform (linear, or fancier).

    CAD Files

    Can simply "open" most common CAD files directly in GIS (DXF, DWG, DGN)

    For example, from a real world project, here are CAD data for a regional plan as created by Fonatur, the Mexican national tourism/development agency.

    CAD in GIS

    Important limits:

    "attributes" don't come along, only layer names *therefore you are well-advised to know the layer naming/numbering convention* (and if none - a big messy problem).

    objects must be "exploded" in CAD before export

    solids must be converted into boundary representations

    Common problems / solutions

    Drawn "to scale", but often without explicit projection information and not North aligned

    Solution 1: layer properties dialog allows specification of transformations

    CAD Manual Transform

    Solution 2: "world files" (*.wld) are simple text files documenting transforms

    CAD World Files

    Why bother with "world files"?!!! Scalability: one world file can be replicated and applied to many CAD documents drawn against the same base.



Digitizing - Creating new (georeferenced) Geometry

Vector Data Model - Requires boundaries with X,Y coordinates

  •  We demonstrate use of QGIS to create a new shapefile of polygons via 'heads up' digitizing on top of one of the orthophotos that we accessed last week from the MassGIS WMS server. (You could just as easily digitize new polygons on top of any other shapefiles that we have used.)


Deeper integration of geoprocessing services within web pages:

  • Example of back end support for large, dynamic databases
    • Postgres + PostGIS for SQL compliant, multi-user, relational database management with spatial extensions
    • Illustrate use of postgres from QGIS - by redoing example from census lecture
      • Thematic map of median personal earnings from 2000 census
      • Geography: us_ma_e25blkgrpsct_2000 block group boundaries
      • Census 2000 data:
        • MAGEO table of geographic identifiers and summary levels
        • MA0007 table including LOGRECNO and P85 personal earnings data
      • Examine tables on PostGIS server (on VM at Medai lab)
        • Note X/Y boundary geometry (of polygons) is included in 'geom' column
        • Spatial Reference System (SRS) information for EPSG codes is in a separate table
        • Relationship between SRS and census block group boundaries is stored in a view
      • Join boundaries, mageo, and ma0007 with SQL query in PostGIS for summary level 150
create table c2000inc as
(select g."LOGRECNO", e."P085001", e."P085002", e."P085003", g."STATE"||g."COUNTY"||g."TRACT"||g."BLKGRP" AS blkkey from census2k.mageo g,  census2k.ma00007 e where g."LOGRECNO" = e."LOGRECNO" and g."SUMLEV" = '150');
      • Pull result into QGIS, join to boundary layer and thematically map

  • These tools are useful for 'publishing' project work

    • Enable interactive manipulation of maps within browser, PDF document, etc. without 'running' GIS software



Network Modeling and Analysis

    To generate points from Addresses - 'Geocoding's

What is Geocoding

    • Geocoding is a process of creating map features from addresses, place names, or similar textual information based on attributes associated with a referenced geographic database, typically a street network that has address ranges associated with each street segment or 'link' running from one intersection to the next.
    • Geocoding typically uses Interpolation as a method to find the location information about an address. 
      • (If the address along one side of a block range from 1 to 199, then Street Number = 66 is about one-third of the way along that side of the block.)
    • Data required:
      • Reasonably clean, consistent list of legal addresses (i.e. not too many typos, addresses really exist, etc.)
      • Address range attributes on a linear street network
        • Most commonly from Census
        • More current/cleaner data available from private vendors
    • Geocoding is one of many geoprocessing services

Geocoding Process

    • Converting textual addresses and names to X,Y locations
    • via address matching - develop point map from mailing list
    • Lookup place names in a 'gazeteer' to find lat/lon, zip, place boundary, voting district, etc.
    • General 'service' to translate among geographic identifiers

 

    Example: using US Census Bureau, TIGER Line Files (as source info for geocoding)

    • Geocoding Strategy using TIGER
      • Encode road network as street centerlines
      • Attach address information to each street segment
      • Use 'in reverse' to match street address to street segment to get approximate X,Y location
    • TIGER: Topologically Integrated Geographic Encoding and Referencing system
      • http://www.census.gov/geo/www/tiger/
      • US Census Bureau TIGER line file 2000, technical documentation
        • at Census: http://www.census.gov/geo/www/tiger/rd_2ktiger/tgrrd2k.pdf
        • in class locker: http://mit.edu/www/data/census2k/tiger_tgrrd2k.pdf
      • Illustrative Example

    Street centerline road segments
    Attaching address ranges to road segments
    TIGER diagram-1 TIGER diagram-1


How do GIS systems model Networks?

A network is a system of linear features connected at nodes
E.g, nodes could be where three or more street segments intersect.
The linear feature connecting any given pair of nodes is called an arc, or network link.
Each arc on a network is represented as an ordered pair of nodes, from node i to node j, denoted by (i, j), and thus has direction.
A network representation that is good for transportation modeling may differ from a geographically accurate representation of the physical road (e.g., street centerline, handling exit ramps, 3D overpasses, etc.)
Combining two network models can be hard
    -- How do you handle lanes, exit ramps, and overpasses?

    Other basic elements of a network:

A shortest path is the shortest (or least 'cost' path) from a source node (origin) to a destination node.  In practice, pathfinding seeks the shortest or most efficient way to visit a sequence of locations.

A tour is an enclosed path, that is, the first node and the final node on the path are the same node on the network.

A stop is a location visited in a path or a tour.

Events or locations may be viewed as collection points (e.g., 'origins' or 'destinations' ) where certain resources are supplied or consumed.

A turn on a network is the transition from one arc to another arc at a node (there are 16 ways in which two intersecting (one-lane) roads can allow vehicle flow among the 4 links that 'connect' to the one node).

'Location-allocation' models often use network representation of connected places in order to determine the optimal locations for a given number of facilities (e.g., stores, restaurants, banks, factories, warehouses, libraries, hospitals, post offices, and schools) based on some criteria for assigning people to the the 'nearest' facility.

  • For Lab #6, you use the GoogleWay API to run their shortest path algorithm to compute the walking distance from 'home' to each restaurant (obtained using Yelp points of interest). 



A machine at the GIS Lab in Rotch Library (with online access) has a seamless street map of the US that does a good job of geocoding any US address.

    (using ArcGIS from ESRI)


Geocoding

A geocoding service, which is a configuration file that specifies the georeferenced feature layer and its relevant attributes, and various rules and tolerance for use in the matching.

GeoCoding Setup

Address Locator

The output of the geocoding is a point file stored as either a shapefile or a geodatabase in ArcGIS.


GeoCoding Results

 


 


Created by Joseph Ferreira and Michael Flaxman, 2005-2006
Last modified 14 April 2021, Joe Ferreira
Back to the 11.188 Home Page.

Back to the CRON Home Page.