11.522: UIS Research Seminar (Fall 2008) - Discussion notes

Tuesday, October 7, 2008, 5:10 - 7:00 PM

Putting New Spatial Data Techniques in the Hands of Planner

Discussion Leader: Clio Andris

Urban planners do not have sufficient access to the optimal computational tools for answering questions and solving problems. Some frequent urban planning problems include where to relocate a facility, where to target crime, which neighborhoods to clean up, how to improve accessibility, how to fairly distribute new infrastructure, and where environmental injustice may occur. (Stillwell and Clark 2004) Traditional Geographic Information Science (GIS) methods to solve these problems include buffer analysis, overlay functions, raster algebra, suitability modeling, linear logit models, cluster analysis, and network analysis (NRC 2006, Wrigley 1985, Getis and Ord 1992). Additionally, on a less pragmatic scale, urban studies scholars can also benefit from new tools to learn more about space and place. Avenues of inquiry may include comparing the economies of regions with different industrial backgrounds (like coal mining, insurance, agriculture, tourism, software), analyzing environments that produce an uncharacteristically small or large middle class, or quantitatively finding indicators for cities that have undergone lucrative gentrification. Finding trends, patterns, and cause and effect cases from this type of analysis can augment the field of urban studies and, in turn, help planners better understand the progressive nature of the built environment and its inhabitants.

Quantitative methods not traditionally found in GIS can be employed to successfully support inquiry in urban planning and urban studies. (Longley et al 2001) These methods, rooted in the fields of data mining and pattern recognition, have given rise to the field of spatial data mining. They include supervised and unsupervised classification methods, anomaly detection, association rules, and clustering methods (Tan et al 2006, Duda et al 2000, Fayyad et al 1996). When presented with sufficient data, these methods can be used to find the 10 most predictive indicators of future neighborhood deterioration, similar income bracket patterns from 1000 cities, allocate funds between two depreciating neighborhoods and index major world cities in terms of progressive transportation.

The connection between data mining and geoinformatics has been strengthening in recent years with the work of Han and Miller (2000), Guo (2006), Keim (2002), Skupin and Hagelman (2005), MacEachren (1995), and others, who have developed software, algorithms, conference sessions and have supervised theses on spatial data mining topics. This pioneering research has put new tools in the hands of many geographers and should diffuse to practical issues in planning and urban studies with careful research in method appropriateness and suitable application and creativity.

The body of literature referenced to answer these questions is rooted in many fields: Methodological: information theory, database management, data mining, pattern recognition, probability, statistics, visualization, software development, spatial analysis, raster and vector GIS operations. Applied: urban identity, community ideology, environmental injustice, urban sociology and economics, accessibility, planning and design.

Representation

How are we currently representing multivariate spatial data? How are we incorporating multiple fields into one view? Are current GIS and cartography techniques conducive to showing more than one feature of a spatial entity at a time, or do the most widely-used methods cater to low or single-dimensional data?

Segmentation and Characterizing Space

Sometimes, muti-dimensional data is classified so that spaces (closed areas, points, line segments, etc) exhibiting the same characteristics can be found. Why is it easier to understand space when it is grouped or segmented? Do we usually expect things closer to each other to be members of the same group? What kind of information do we lose during this process? What kind of information do we gain?

The place of Data Mining in Academia

The term 'data mining' incorporates four major method groups: association, classification, anomaly detection and clustering. These tasks are often performed in business (credit cards, frequent shopper cards, targeted mailings, targeted advertisements, pharmaceuticals, product purchase trends) for lucrative purposes instead of for "scientific" purposes. In fact, data mining classes at MIT are offered through the Sloan School, although many of the same processes are used in Computer Science and Mathematics. Does 'data mining' carry a non-academic connotation? As a scholar, does this predicament 'cheapen' the field?

Curse of Dimensionality

The 'curse of dimensionality' occurs when so much data has been collected about a set of objects that they now can be identified uniquely by their 'signature'. This "problem" renders classification tasks pretty useless (they all have their own class), unless the data is reduced to the important fields. This process is called dimensionality reduction and includes binning (by entropy, gini split, log scales, etc), Principal Component Analysis (PCA) and other methods. Are these well-known? How do we make these methods easy for the spatial data community to utilize?

Bibliography (focus on ### in ./ 11.522/proj08/clio_papers)

  1. Duda, R. Hart, P. and Stork, D. (2000) Pattern Classification. New Jersey: John Wiley and Sons.
  2. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P. (1996) From Data Mining to Knowledge Discovery in Databases. American Association for Artificial Intelligence.
  3. Getis, A. and Ord, J. (1992)"The Analysis of Spatial Association by use of Distance Statistics." Geographical Analysis, 24. Pp. 189-206.
  4. ### - Guo, D, et al. (2006) "A Visualization System for Space-Time and Multivariate Patterns (VIS-STAMP)." IEEE Transactions on Visualization and Computer Graphics November-December 2006. Pp. 1461-1474.
  5. ### - Keim, D. (2002) "Information Visualization and Visual Data Mining." IEEE Transactions on Visualization and Computer Graphics January-March 2002. Pp. 100-107.
  6. Longley, P., Goodchild, M., Maguire, D., Rhind, D. (2001) Geographic Information Systems and Science. West Sussex: John Wiley and Sons.
  7. MacEachren, A. (1995) How Maps Work: Representation,Visualization and Design. New York: Guilford Press.
  8. Miller, H. and Han, J. (2000) "Discovering geographic knowledge in data rich environments: a report on a specialist meeting." SIGKDD Explorations January 2000. Pp.105-107
  9. National Research Council. (2006) Beyond Mapping. Washington, D.C.: The National Academies Press.
  10. Skupin, A., & Hagelman, R. (2005) Visualizing demographic trajectories with self organizing maps. GeoInformatica, 9(2), 159–179.
  11. Stillwell, J., Clarke, G. (2004) Applied GIS and Spatial Analysis. New Jersey: John Wiley and Sons.
  12. ### - Tan, Steinbach and Kumar (2006) Introduction to Data Mining. Boston: Pearson Education Inc.
  13. Wrigley, N. (1985) Categorical Data Analysis for Geographers and Environmental Scientists. New York: Longman Group Limited.

Back to 11.522 home page.