Massachusetts Institute of Technology
Department of Urban Studies and Planning


11.520: A Workshop on Geographic Information Systems

11.188: Urban Planning and Social Science Laboratory

Lecture 3: GIS Data Manipulation and Querying

September 23, 2009

(Based mainly on notes by Zhong-Ren Peng, Mike Flaxman and Joe Ferreria)

Administrative notes:

  • Lab Exercise #2 due next Monday, Sept. 28 (at the start of the class)
  • Homework set #1 posted online today, due in 2 weeks (Weds. Oct 7 th via Stellar)
    • Examine relationships among eastern Mass Shopping Centers, major roads, and residential locations
    • Read it over, check out the datasets, try the methods, spread out the work
    • Waiting until the end will be frustrating and stressful!

Today's Outline:

  • Database management basics (more next time)
    • Data types (numeric, text, date, ...)
    • 'Flat-file' tables (*.dbf, *.csv, spreadsheet table, ...)
    • Relational algebra
  • Data management tools (start here and backfill concepts in next lecture)

Basic Data Manipulation and Query Tools

  • Today's focus
    • manipulating and querying textual tables in ArcGIS
    • associating textual data tables with mapable features
  • Next week:
    • broader focus on relational database management
    • Use of MS-Access

n                   Methods of Selecting Features (within basic "vector" GIS model)

o      Exploit link between map and tabular views

o      Simple Attribute Queries

o      Spatial Selection Queries

n                   Manipulating and Extending Tabular Data

o      Adding Attribute Columns

o      Calculating new attribute values

o      Basic statistics on selected features

o      Summary statistics

n                   Using Selected Sets

o    Exporting Data Subsets (replicating data subset)

o    Using “Definition Queries” as filters

Specfic ArcGIS Notes:

Demo using HW1 data (1990 census tract data for Eastern Mass)

  • Thematic map of median housing value (H061A001) by census tract
    • msa5_tr90.shp has census tract boundaries but not census data
    • msa5_tr90_data.dbf has selected census tract level data
    • Must join tables before creating thematic map
  • Improve visual quality of map by adjusting symbology and adding town boundaries and water layer
  • Exercise ArcMap tabular data query and summarization tools
    • Select-by-attribute queries (census tracts in Middlesex County)
    • Definition Queries to restrict consideration of certain features
    • Statistics (histogram, mean,,,) tools for selected sets
    • Select-by-location queies
  • ArcGIS preferences and settings:
    • 'Environment' settings within ArcToolbox (right-click)
    • Local vs. network locations
    • Setting a local 'scratch workspace' in General Settings

 

Simple Attribute Queries in ArcGIS (using examples from Cambridge landuse shapefile)

            Basic format

                        <Attribute> <Operator> <Value>

                        Attributes delimited with square braces, i.e.: [Landuse]

 

                        String (text) values typically single quote delimited

                                    'Commercial' not "Commercial"

 

                        Exact syntax depends on back-end database (aargh!)

                                    Some require double quoted strings

 

            Example

                        [Landuse] = 'Commercial'

                        ArcGIS Interface Dialog perculiarity:

                                    double click to load attribute names or values

                                    single click for operators

 

Compound Attribute Queries (the confusing syntax of ArcGIS)

                        Remember: must repeat the attribute name

                                    [Landuse] = 'Commercial' or [Landuse] = 'Industrial'

*not* [Landuse] = 'Commercial' or 'Industrial' (missing required repetition of attribute name)
*not* [Landuse] = Commercial (missing required single quotes around text)

 

                        Can build up based on current selection set

                                    Two pass query:  [Landuse] = 'Commercial'

                                    then, add to selection: [Landuse] = 'Industrial'

 

Fun with Selections

            By default, processing operations occur based on selected features only

                        For example: to buffer commercial land uses, first select commercial, then buffer

            Subsetting based on selection

                        Simple, important, poorly documented workflow

                        Create a selection, then "export data" to new file

                        (Sorry no cut and paste!)

                       

            In the attribute table, calculations done only on selected features

                        Useful for calculating new attributes, often for reclassification/aggregation

                        Example: ranking store location suitability based on zoning layer

                                    Logic:  best = commercial or mixed use

                                                moderate = industrial

                                                worst = residential

                                    Strategy:

                                                Create new rating attribute [rating] in zoning table

                                                Select best features, calculate rating attribute = 'best'

                                                Select moderate features, cal rating 'moderate'

                                                etc.

                        Advantages/Disadvantages:

                                    Permanent change to database, ranking result obvious

                                    Method *not* obvious after the fact (requires external documentation)

                                    Single, transferable data set

                                    Requires "write" permission on the database

 

            Spatial Selections

                        (This capability is different from other textual databases)

                        Can select features based on their spatial relationship with other selected features (e.g., 'inside of' or 'contains part of')

                        You will need to do this for your homework

 

 

Intro to Geoprocessing

n                   Relationships between Data Models & Spatial Questions

o      Data Models Vary in Degree of

§       Geometric refinement

·       What’s the MMU (minimum mapping unit)?

How are contiguous features segmented?

§       Attribute Refinement

·       How many classes of land use are recorded? (urban/suburban or 27 types?)

·       Are the aspects you need directly coded at all? (traffic congestion, historic building quality?)

§       Temporal Refinement

·       How up to date are your data?

·       Are all layers in temporal synch?

·       Is your question about current conditions, or really about future conditions?

o      Common cases when your data model doesn’t match your question

    • Disaggregate using attributes

Example: Classify shopping centers into five classes based on square feet

    • Aggregate using attributes

    Example: Reclassify 27 land use types into built/unbuilt

    • Disaggregate spatially
    • Example: shopping centers near major highways or not

    • Aggregate spatially
    • Example: treat roads as unified linear object based on road type (regardless of digitization segments or name)

o      Spatial aggregation and disaggregation require more than simple selection – require creating new geometries based on combinations of existing geometries


n                   Some Useful and Common Geoprocessing Operations

o      Spatial data subsetting using “Clip”

§       Selects those features within a polygonal geometry, breaking partially included features as needed

o       Buffering

§         Creates new geometry representing an area within a given distance from selected features

§         By default creates one buffered object per feature.  Often useful to “join” output geometries

 

 

Example: Site Selection for Low Cost Grocery Store Chain

n                   Conceptual Model

o      Brainstorm Criteria for "good" locations

n                   Case Study Example

o      Factors used in actual Commercial Shopping Center Site Selection

o      Powerpoint slides used by commercial firm to market site selection tools
by Edens & Avant and RPM consulting

n                  Think about these marketing slides?

o      Is the methodology or analytic scope overstated?

o      What considerations are omitted, shortchanged, badly measured?

o      From whose point of view is the siting service helpful or hurtful?


Created by Zhong-Ren Peng, Mike Flaxman, and Joe Ferreira 2003-2008

Last modified 23 September 2009 by Joe Ferreira

Back to the 11.520 Home Page.
Back to the CRON Home Page.