11.521 | Spatial
Database Management and Advanced Geographic Information Systems |
11.523 | Fundamentals
of Spatial Database Management |
Needed Info:
- Clark Broida's database of Boston Zoning variances (stored in the ZONING table on the Oracle server)
- Assorted Boston maps inlcuding Massachusetts Town boundaries, Boston MSA Census Tract, Boston Neighborhoods
We've already 'geocoded' the addresses of the parcels in the zoning variance database in order to get approximate latitude/longitude locations. The geocoding results are saved in an Oracle table called lonlat4. However, the longitude values needed to be multiplied by -1 to properly represent longitude values in the Western Hemisphere.
We can use the following Oracle query to create a table that makes this change:
CREATE TABLE t_lonlat5 AS SELECT casenumber, -1.0 * longitude longitude, latitude, streetname, quality FROM lonlat4;We could have avoided creating a new table by using the following Oracle query to create a 'view' that makes this change:CREATE VIEW v_lonlat5 AS
SELECT casenumber, -1.0 * longitude longitude, latitude,
streetname, quality
FROM lonlat4;However, the default Microsoft and ArcGIS ODBC drivers for Oracle tend to change handle data types badly when reading from Oracle views. For example, floating point numbers may be truncated integers in ArcGIS! (see note below.) Just to be sure, use SQL*Plus to connect to Oracle and create your own table, t_lonlat5, to convert the longitudes. By prefixing the table name with 't_' you can remind yourself that this is a temporary table to be deleted later when you have finished the exercise.
<<The use of a decimal place and '0' when multiplying by -1.0 forces Oracle to treat the second field as a floating point number when computed 'on the fly' by the view. Beware, however, that some ODBC drivers have problems guessing the right data types when bringing into ArcMap tables computed on-the-fly via views in Oracle. For example, the "Oracle in OraHome92" ODBC driver works fine but "Microsoft ODBC for Oracle" and ArcMap's "Microsoft's OLE DB Provider for Oracle" both convert the [-1 * longitude] expression to an integer! They make the wrong data type assumption for Views that generate negative floating point numbers! If you have the wrong driver, you can create a table in oracle instead of a view and then pull that table across - since the use of a table will result in the driver having less discretion about data types, the negative-valued longitude will not be rounded to an integer! Since many WinAthena machines do *not* have the 'Oracle in OraHome92' driver, we suggest you create the temporary tables.>>
The SQL connection from ArcGIS to Oracle can be made using the same type of OpenDatabaseConnect or Object-Linking-and-Embedding connection as we used for MS-Access. Choose File / Add-Data / Database Connection / Add OLE DB Connection in ArcCatalog. In the Data Link Properties, choose Microsoft OLE DB Provider for Oracle. Then fill in "crl" as the data source name and your (Oracle) user name and password. If you check the "Allow saving password" option, then you will not have to retypes them after you save the ArcMap document and reopen it later at some future time. While this is convenient, it does mean that anyone with a copy of your ArcMap document will be able to open the connection to Oracle using your account. Saving the name/password will also keep a copy in your ArcGIS configuration file. Since you are working on a WinAthena PC this configuration file will be saved in your private locker rather than on the local driver part of C:\Documents and Settings\ . Note that we are using the CRL database services that are running in Oracle on our database server. Oracle uses the CRL entry in the TNSnames configuration file to find the actual machine name, port, etc. that corresponds to the server we call 'CRL.'. (We emailed this configuration file to you at the start of the semester in case you wanted to set up SQL*Plus and an ODBC driver on your own personal machine.)
Doing all this will generate a new 'OLE DB Connection.odc' instance in the 'Add Data' window of ArcMap [which in turn uses the Microsoft OLE DB Provider for Oracle that is already installed in the Administrative-Tools/Data-Sources (ODBC) folder in the WinAthena Control Panel]. Double-click on this instance to see the Oracle tables that you can access. At some point you may want to rename this OLE databse connection to something like 'OLE-CRL-Oracle.odc' so you can remember that it connects specifically to the CRL Oracle database on bulfinch.
To map the zoning variances, you'll need to convert the latitude/longitude values in the t_lonlat5 table into 'points' on your map. Since all the Boston area maps we've been using are in Massachusetts State Plane coordinates (not latitude/longitude), we'll have to worry about map projections. There is more than one strategy for mixing and matching layers with different projections. Which way works best depends upon whether your datasets have their spatial reference system identified, what variations you want to combine, and the spatial reference system with which you want to view the data onscreen. We'll explain one approach here:
(a) Before you add any data layers to an empty Data Frame in ArcMap, right click on the data frame name, and choose property. In the Coordinate System tab, choose "Predefined / Projected Coordinate Systems / State Plane / Nad 1983 / Massachusetts Mainland FIPS 2001". This step identifies the spatial reference system (Mass State Plane...) in which you wish to view the data {Note that there are several NAD 1983 choices. NAD 1983 with no explanation uses meters and is the one we suggest using. NAD 1983 HARN stands for 'high accuracy reference network' and is a regional update of NAD 1983 coordinates using GPS observations. Both NAD 1983 and NAD 1983 HARN should work well but, since most state datasets are based on NAD 1983, we suggest using that one.)
(b) Now, add in the data from your t_lonlat5 table through the connection to CRL Oracle that you just created in ArcCatalog or via the ArcMap database connection.
(c) Right click on the t_lonlat5 table, and choose Display X,Y data. In the pop-up window, select the proper fields for X field and Y field then press OK. This 'Display X-Y Data' window should also indicate that the coordinate system for these X-Y values is not known. Choose 'Edit' and select 'geogrpahic coordinates' using the World WGS84 datum. This specification lets ArcMap know that it must convert the X-Y values from lon-lat decimal degress to Mass State Plane coordinates (using the appropriate datum). After you click 'okay', ArcMap will generate and display a pinmap (in Mass State Plane coordinates) of the X-Y locations stored in the lon-lat columns of t_lonlat5. [If the pinmap is not showing, ArcMap may be mixed up because this XY pinmap is the only layer in the data frame and ArcMap is confused about coordinates. In this case, remove your XY layer, add in a shapefile such as the ma_towns00 shapefile mentioned in the next section, and then do 'Display XY data' again.]
(d) Next, add the Massachusetts Town Boundary layer. One place to find this is the ma_towns00 shapefile in the data sub-directory of the class locker: //afs/athena.mit.edu/course/11/11.521/data. Since this shapefile includes a ma_towns00.prj file indicating the spatial reference system (Mass State Plane, Mainland, NAD83 meters), ArcMap won't complain about not knowing the coordinate system and the Town map should display correctly. (If the *.prj file were missing, the display would still work since we have already set the Data Frame to be the same as the projected coordinate saved on disk in the ma_towns00 shapefile.)
Now (finally!) you have a 'pin map' of the projected zoning variances showing up within the town boundaries of Boston where they belong. The data type conversions, coordinate issues, and geocoding (which we did for you!) do complicate pulling data into GIS. However, if you understand the data conversion issues and check each step along the way, there are substantial amounts of mapable data available on the net.
Next, let's examine and map various subsets of the zoning variance cases -- for example, all approved variances. We're not quite ready to do this since the t_lonlat5 table has none of the zoning variance characteristics -- we'll have to join it to the ZONING table using 'casenumber'. We can do this in Oracle or we can move the entire zoning table into ArcGIS and do all the queries (and joins) there. Since the ZONING table isn't that big, bringing the whole table into ArcGIS is manageable. However, for other situations -- such as the landuse owner categorization exercise that we did earlier, we might want to run the queries in Oracle, store the results in a SQL 'view' and pull the view over to ArcGIS.
(e) Now let's add a few other map themes to ArcMap to give more meaning to the zoning variance map. Add the msa5_tr90 and bostnbrd coverages (showing Eastern Mass 1990 census tracts and Boston neighborhood boundaries). The data files are located in the class locker //afs/athena.mit.edu/course/11/11.521/data.
(f) Add in the Zoning table from the Oracle connection you created. After joining the zoning data to your projected-and-mapped t_lonlat5 data, generate a pin-map showing the location of all variances (that were geo-referenced) which proposed to convert vacant land to housing. Your map should be zoomed in to just fit Boston onscreen and should show city boundaries and census tracts as well as the zoning variance locations.
(g) Now change the symbols used to draw the variances (involving vacant land converted to housing) so that a different symbol is used for those which were APPROVED (code 1 or 2). You will also want to exclude the missing value cases so the two symbols apply only to approved/not-approved variances. (There are several ways to distinguish and symbolize the approved/not-approved variances. For example, you can copy/paste a second zoning pin-map layer. Adding a column to the zoning table is also possible but more complicated since the zoning data is pulled over from Oracle.)
(h) Next let's examine whether the approval rate of vacant-land-to-housing variances appears to be different across parts of Boston that have high/medium/low housing value -- as measured by the 1990 census data. Selected 1990 Census tract data are stored in a table in the class data locker called msa5tr.dbf in //afs/athena.mit.edu/course/11/11.521/data. Add this table to your ArcGIS project. You will have to join (or link) this data table to the msa5_tr90 theme in order to map the census data. The column names in the msa5tr.dbf table are the official Census Bureau names. The technical documentation for 1990 STF3 census data is available at: http://mit.edu/11.520/data/census90/census90stf3td.pdf and contains the data dictionary for these variables. But we'll save you some time: the H061A001 variable contains the median value of owner-occupied housing units. Join the msa5tr.dbf table to msa5_tr90 and use H061A001 this variable to generate a thematic map of the housing values across the Boston census tracts. The STCNTYTR column contains the STate+CouNTY+TRact code for each census tract and will allow you to join the census tract data table with the census tract shapefile. (For further information about US Census data and mapping, see the lecture notes from the Fall GIS class, 11.520. For example, http://mit.edu/11.520/www/lectures/lec6_census.html). Now zoom in a bit on a part of town of particular interest to you and fiddle with the map so that the variances (with approved/not-approved symbols) are readable on top of your thematic map.
(i) Finally, let's ask some questions about the zoning variances that fall within high/medium/low value tracts. How many vacant-to-housing variances are in above-average value tracts (in the 1990 census)? What fraction are approved? From the map, we can see which variances fall within the high/medium/low value tracts. But we can't easily compute the counts and fraction-approved because the zoning variances aren't 'tagged' with the number of the census tract that contains them. Here's an opportunity to use some of the spatial data processing capabilities of our GIS. We can do a spatial join -- in this case a so-called 'point-in-polygon' operation -- to find out which zoning variances (points) are contained within which census tracts (polygons). The spatial join will add the appropriate census tract row from the msa5_tr90 attribute table to the attribute table row for each (mapped) zoning variance. Since we have already joined msa5tr.dbf to msa_tr90, the spatial join tells us the 1990 median housing value for the census tract containing each zoning variance.
We still have a few steps to go to answer the questions. Use ArcGIS's query tools to compute the number of vacant-to-housing variances in above-average income (1990) tracts and the fraction of them that were approved. Then calculate the same two numbers for the tracts that have a below-average income and compare them with those of the above-average income tracts.
In addition to computing these numbers, turn in a printout of a table showing the casenumber, board decision, 1980 nsa number, and 1990 census tract number for all vacant-to-housing variances that are within ROSLINDALE (i.e., NSA subnghbrd 49, 50 and 51).
Instead of overlaying our selected zoning variances on top of our ArcGIS map, we might want to display our selected zoning variances in, say, Google Earth. The ArcGIS Toolbox provides conversion tools that allow you to write out an ArcMap layer or ArcMap document in a format that can be read by Google Earth, Google Maps, and other map viewing applications that adhere to the 'KML' protocol for representing mapable 3D data. (KML stands for Keyhole Markup Language. Keyhole was the company that originally developed what became Google Earth. KML is a variation of a more general geographic markup language (GML) developed by the Open GeoSpatial Consortium and KML has become a de facto standard for many map mashups. KMZ is the compiled version of KML.)
Before we try to save a KMZ file containing our zoning variances, we need to be careful about coordinate systems and in-memory vs. written-to-disk shapefiles. Your KMZ file needs to use lat/lon coordinates, not the Mass State Plane coordinates. Let's try to overlay in Google Earth all the zoning variances that proposed converting vacant land to housing. Just to be sure we have the desired data and coordinates, let us save the results from part (4g) above as a layer file. (Right-click the layer and choose 'save as a layer file') Call this layer file myzoning.lyr. Now open up a new empty Data Frame (via Insert/Data Frame from the ArcMap menus), right-click the data frame and use the properties tab to set the coordinate system to geographic coordinates using the GCS_WGS_1984 datum. (Actually, it would be better to use the GCS north american datum 1983 entry since we know that MassGIS also used the NAD83 datum for its Mass State Plane projection, but WGS84 is a safe bet for any lat/lon data and we have used that in most of the labs). Now, add the myzoning.lyr layer that you just saved into this data frame.
We are now ready to export the layer in the KMZ format that Google Earth can read for the map mashup. In ArcMap, open the ArcToolbox pane, click the 'Search' tab, and search for 'kml'. Double click on the 'layer to KML' choice, set the layer name to be 'myzoning' and the output to be myzoning.kmz within a writeable directory. Set the 'layer output scale' to be '1' so the layer is always turned on. (Before you click on 'layer to KML' you can click the 'locate' button to see where this tool fits within the ArcToolbox hierarchy.) Finally, open Google Earth and click File/Open to navigate to whereever you saved myzoning.kmz and open the file. If you saved the KMZ file to a web-readable directory such as the 'www' sub-directory of your Athena locker, then you could also open myzoning.kmz in Google Earth via 'Add/Network-link' using this URL: http://mit.edu/your-athena-id/www/myzoning.kmz
Finally, save and turn in a screen shot of the Google Earth screen once you have zoomed in to the Boston area and can see the pin-map of your variances.
Be sure to put your name and Athena username within each separate file that you turn in. Email you lab assignment to jf@mit.edu by the start of class on Tuesday, March 3, 2009.(a) The cutoff 1990 median housing value that you used to distinguish above-average value and below-average value census tracts. Also, turn in the two numbers (zoning variance count and fraction approved) from both high income and low income tracts, plus the Roslindale table from Part (i), and
(b) A PDF copy of your ArcMap layout showing your two maps of Boston from Parts (g) and (h) with the location of the vacant-to-housing variances highlighted in various ways.
(c) A JPG or PNG version of the screen shot showing your Google Earth map mashup of vacant-to-housing zoning variances from Part 5.
For simple queries, it's easier to use ArcGIS's SQL tools to run the queries using the SQL connection -- or to use MS-Access or even to query *.dbf versions of the ZONING database and lookup tables (rather than do the queries in Oracle and bring them over with the browser and addins). But, suppose you wanted to use the institutional/landuse categories of ownership that you are developing in the homework set. It takes a while to develop these as you accumulated the SELECT and UPDATE SQL statements that built the cross-reference table. Hence, you'd like to leave all those queries and results on a network database server that could protect private data while letting you reach it quickly from any Athena or PC 'client' when you had a chance to work on it and map the results.
For all those variances whose board decision is known, what is the approval rate (code 1 or 2) broken down by your institutional/landuse categories from Lab #3? What fraction are in high income neighborhoods and tracts (as defined earlier) by each institutional/landuse category? What about for the vacant-to-housing subset? Map the locations of the variances in the various categories. Do you see any patterns worth pursuing? Are they closer to major roads, downtown, more/less dense residential neighborhoods,...? Is it the ownership type or landuse type (e.g., single family, two-family, triple-deckers, or apartments) that account for different approval rates (to the extent that you find differences)?
Home | Syllabus | Lectures | Labs | CRON | MIT
Created by Joe Ferreira, modified
2002-2010 by thg, jinhua, jf