11.521 |
Spatial
Database Management and Advanced Geographic Information
Systems
|
11.523 |
Fundamentals
of Spatial Database Management
|
What SQL query answers the following questions:
- Does every Assessing record in BOS05EB have a corresponding 'ground' parcel in ebos_parcels05.shp?
- Does every ground parcel have an Assessing record in BOS05EB?
- Is every property type code in BOS05EB explained in the MASS_LANDUSE lookup table?
- Which land uses in MASS_LANDUSE are not found within East Boston?
In lab#2, when you ran q_eb05new to join your expanded t_noncondo table with MASS_LANDUSE, you may have noticed that q_eb05new produced a table with one fewer record than t_noncondo. Which record dropped out?
SELECT bos05eb.PTYPE FROM bos05eb WHERE bos05eb.ptype not in (select stclass_txt from mass_landuse);Or, equivalently:
SELECT bos05eb.PTYPE FROM bos05eb WHERE NOT EXISTS(SELECT NULL from mass_landuse m WHERE bos05eb.ptype = m.stclass_txt);Let's figure out how to do the same query in MS-Access - by pasting the SQL into MS-Access ...
Geocoding location
Creating pin-map of locations (data conversion, coordinate system, pin-map from data table)
Symbolization
Spatial join (point-in-polygon operation)
Exporting data tables back to Postgres or MS-Access from ArcGIS
- Handy in Lab #4 to put zoning casenumbers, after tagging with census data, back into MS-Access and Postgres
- Needed if we want to have the 'spatial join' relationship between census tract and zoning case available in MS-Access or Postgres
- Hard to do easily from menu choices
- Option A: export to DBF or TEXT file; import into MS-Access, then export to Postgres (or Oracle, SQLServer, ...)
- Max of 8 characters in *.dbf filename
- Max of 10-characters in dbf column names
- Use 3rd party tools to import DBF and CSV (text) files into Postgres, Oracle, etc. or go through MS-Access
- pgAdmin4 provides a good interface for importing CSV files into Postgres tables
- Option B: export table (not shapefile) into Personal GeoDatabase
- functions like File GeoDatabase but is a Windows-only MS-Access database (*.mdb) with a few extra spatial tables
- Then, open personal geodatabase (using MS-Access) and export to Oracle
- Lower-case all table and column names (so you match the default expectation of Postgres)
- Beware of single-user complications and MS-Access formats
- ArcMap uses the older *.mdb format for MS-Access databases instead of *.accdb
- When you try to export a table, ArcMap recognizes MS-Access databases (*.mdb) if the file type is set to 'File and Personal Geodatabase tables'
- But, it then complains about not finding a table (GDB_ObjectClasses) because the MS-Access database does not have the expected 'geodatabase' management tables
- You need to first use ArcCatalog to create a new personal geodatabase which you can use instead of a default file geodatabase in C:\temp and then save in any file locker and use on any Windows machine.
- If you try to make use of a file or personal geodatabase, or any MS-Access database, while it is 'locked' by ArcCatalog or another ArcMap session, ArcMap will complain saying, "Failed to connect to database. An underlying database error occurred. System resource exceeded."
Miscellaneous other Lab #4 Notes
SELECT expr1, expr2, expr3, ... FROM object1, object2, ... WHERE conditions GROUP BY expr4, expr5, expr6, ... HAVING conditions ORDER BY expr7, expr8, expr9, ...
Note that the order of the clauses matters! The clauses, if included, must appear in the order shown! Oracle will report an error if you make a mistake, but the error message (e.g., "ORA-00933: SQL command not properly ended") may not be very informative.
Compute the percentage of East Boston parcels that are triple-deckers (LU=R3)
Method 1: 'Brute Force' but reliable and easy to understand
- Numerator: all the triple deckers
create view v_num as (select count(ptype) as R3_count from bos05eb where ptype = '105');
- Denominator:
create view v_denom as (select count(ptype) as all_ptype from bos05eb);
- Ratio:
select 100 * r3_count / all_ptype from v_num, v_denom;
- NOTE: These queries each return a single number - that is, a table with one row and one column. We could also add a 'group by' clause to the SQL query to compute percentages for each planning district (or other categorization of parcels). Then we would get a final table with two columns (ptype and percent-housing) and rows = number of planning districts.
select 100 * n.r3_count / d.all_ptype from (select count(ptype) R3_count from bos05eb where ptype = '105') n, (select count(ptype) all_ptype from bos05eb) d;
select count(LU) as R3_count from bos05eb where LU = 'R3' -- 2224 cases select count(LU) as R3_count from bos05eb where ptype = '105'; -- 2199 cases select pid, cm_id, st_name, unit_num, LU, ptype from bos05eb where LU = 'R3' and ptype <> '105'; -- all have ptype='109' => multiple buildings on one lot -- but triple-deckers are not 3 separate buildings -- so, better to use ptype='105' for triple deckers -- BUT, you can use either one for the exercises!
The syntax looks like this:
For example, let's sort the parcels by land use in ascending order and the square footage in descending order:SELECT ... ORDER BY expr1 [ [ ASC | DESC ] , expr2 [ ASC | DESC ] ... ]
Notice that the query can mix the ASC and DESC keywords in a single ORDER BY clause.SELECT parcelid, landuse, sqft FROM parcels ORDER BY landuse ASC, sqft DESC; PARCELID LAN SQFT ---------- --- ---------- 15 34800 10 10900 7 A 14000 14 A 10000 16 A 9600 18 A 9500 11 C 210000 2 C 100000 19 C 40000 8 C 24800 1 C 20000 4 CL 6 CM 2100 20 E 50000 5 E 25000 9 R1 1800 17 R1 1500 13 R2 3 R3 5500 12 R3 5300 20 rows selected.
We might (incorrectly) write a query like this:
Notice that it returned a row with LANDUSE = 'R3'. That's not what we wanted! The problem is that the 'AND' and 'OR' operators, when mixed, are not processed in the sequence written, but rather follow an order of operations much as in algebra (exponentation before everything, then multiplication and division before addition and subtraction). In Boolean logic, 'AND' is like multiplication and 'OR' is like addition, and Oracle orders their processing accordingly. Hence, the query above is actually equivalent to this one:SELECT parcelid, landuse, sqft FROM parcels WHERE sqft = 5300 OR sqft = 10000 AND landuse = 'A' ORDER BY landuse ASC, sqft DESC; PARCELID LAN SQFT ---------- --- ---------- 14 A 10000 12 R3 5300
Since the order of operations can surprise you at inconvenient times, you should always use parentheses to force the correct order whenever you mix 'AND' and 'OR' in a WHERE clause. Here is the correct way to write the query:SELECT parcelid, landuse, sqft FROM parcels WHERE sqft = 5300 OR ( sqft = 10000 AND landuse = 'A') ORDER BY landuse ASC, sqft DESC; PARCELID LAN SQFT ---------- --- ---------- 14 A 10000 12 R3 5300
SELECT parcelid, landuse, sqft FROM parcels WHERE ( SQFT = 10000 OR SQFT = 5300) AND LANDUSE = 'A' ORDER BY landuse ASC, sqft DESC; PARCELID LAN SQFT ---------- --- ---------- 14 A 10000
Normally, a query may return duplicate rows. For example, if we query the FIRES table to list the parcels that had fires, we'll find that the parcels that had more than one fire (parcels 2 and 3) show up multiple times:
SELECT parcelid FROM fires ORDER BY parcelid; PARCELID ---------- 2 2 3 3 7 20 6 rows selected.If we don't want to see the duplicates, we can add the keyword DISTINCT right after SELECT:
SELECT DISTINCT parcelid FROM fires ORDER BY parcelid; PARCELID ---------- 2 3 7 20 4 rows selected.Now parcels 2 and 3 show up only once. You only use the DISTINCT once, right after SELECT, to apply to the entire row. You do not not apply it to each column. Hence, this query is valid:
SELECT DISTINCT p.onum, o.oname FROM parcels p, owners o WHERE p.onum = o.ownernum ORDER BY p.onum; ONUM ONAME ---------- ----------------------- 9 PATRICK KING 10 METHUINON TRUST 11 FERNANDO MARTINEZ 18 JOHN MCCORMACK 29 FRANK O'BRIEN 32 GERALD RAPPAPORT 38 BAY STATE, INC. 55 THOMAS KELLIHER 89 JOSEPH NOONAN 100 MGH, INC. 200 VANDELAY INDUSTRIES 11 rows selected.However, the following incorrect query, with two DISTINCT keywords, generates the cryptic error message "ORA-00936: missing expression":
SELECT DISTINCT p.onum, DISTINCT o.oname FROM parcels p, owners o WHERE p.onum = o.ownernum ORDER BY p.onum;Note also that you can (in Postgres, Oracle, SQL-server, etc. but *not* in MS-Access) use DISTINCT within the group functions (e.g., COUNT, AVG, STDDEV) to consider only the distinct values within a group: (do you see why the counts would not all be the same if you replace 'wpb' with 'sqft'?)
SELECT COUNT(wpb) wpb_ct, COUNT(DISTINCT wpb) wpb_ct_distinct, COUNT(*) row_ct FROM parcels; WPB_CT WPB_CT_DISTINCT ROW_CT ---------- --------------- ---------- 20 20 20 1 row selected.Finally, note that COUNT(DISTINCT *) does not work. This result is not surprising if you think about it. Why do you think this is the case?
This query returns 200 rows!SELECT p.parcelid, p.onum, o.ownernum, o.oname FROM parcels p, owners o;
There are only 20 rows in PARCEL and 10 rows in OWNERS, so what's going on? The problem here is that this query has no WHERE clause that specifies how the PARCEL table relates to the OWNERS table. Without this information, Oracle does not know how to match a row in PARCEL to a corresponding row in OWNERS. What does it do instead? It matches every row in PARCEL to every row in OWNERS. Hence, we end up with:
(20 rows in PARCEL) matched to (10 rows in OWNERS) = 20 x 10 = 200 rows returnedThis kind of unconstrained join is called a Cartesian product. This result is desireable only under rare circumstances. If you have queries that are returning a suspiciously large number of rows, you have probably unwittingly requested a Cartesian product. Note that for tables of even modest size the number of rows returned by a Cartesian product can be explosive. If you generate a Cartesian product of one table with 1,000 rows with another table with 2,000 rows, your query will return 1,000 x 2,000 = 2,000,000 rows! That's right--two million rows! Hence, you should very careful to avoid unintentional Cartesian products.
To fix this query, we need to specify how the owner numbers stored in the PARCEL table should be matched to owner numbers in the OWNERS table. In PARCEL, the owner numbers are stored in ONUM, while in OWNERS they are stored in OWNERNUM. Here is the repaired query with the appropriate join condition in the WHERE clause:
SELECT p.parcelid, p.onum, o.ownernum, o.oname FROM parcels p, owners o WHERE p.onum = o.ownernum;
This query returns 20 rows, which is definitely an improvement.
Note that an unintended Cartesian product can easily be hidden in a query that joins three or more tables or in a query with joins plus a GROUP BY expression (because it will not be obvious that too many rows were collapsed into the rows with the unique 'group by' values).
NULL is a special value that means "missing" or "unknown." It can be placed in a column of any type, unless the column has been declared NOT NULL, in which case NULL values are forbidden. A NULL is not the same as zero or any other value.
Special rules apply when NULL values are involved:
For more information on NULL values and how Postgrestreats them, use Google or another search engine with keywords such as 'postgres SQL NULL in group by'.
Take a look at this query:
SELECT p.parcelid, f.fdate FROM parcels p, fires f WHERE p.parcelid = f.parcelid ORDER BY p.parcelid, f.fdate;
The query above returned 6 rows, but there are 20 parcels. Fourteen parcels seem to be missing. Where did they go? The answer is that the SQL query only listed the parcels that occur in both tables, PARCELS and FIRES.
How do we get around this problem if we want to see all 20 rows in the parcel table, whether they match a record in FIRES or not? The answer is an outer join. The standard join is also known as an "inner join," meaning that the default behavior of not matching NULL values occurs. In an outer join, we explicitly tell our SQL engine that we want it to display NULL values that would otherwise be excluded to each column in the WHERE clause where the additional NULLs should appear.The syntax is the same as MS-Access uses to specify joins:
Note the difference between 'left' and 'right' outer joins. This query will run but return a different result:SELECT p.parcelid, f.fdate FROM parcels p left outer join fires f on p.parcelid = f.parcelid ORDER BY p.parcelid, f.fdate;
SELECT p.parcelid, f.fdate FROM parcels p right outer join fires f on p.parcelid = f.parcelid ORDER BY p.parcelid, f.fdate;
SELECT p.parcelid, p.add1, p.add2, f.fdate FROM parcels p, fires f WHERE p.parcelid = f.parcelid ORDER BY p.parcelid, f.fdate;Redo with inner join explicitly required in the 'from' clause:
SELECT p.parcelid, p.add1, p.add2, f.fdate FROM parcels p inner join fires f on p.parcelid = f.parcelid ORDER BY p.parcelid, f.fdate;Repeat the same query, except list all the parcels, whether they had a fire or not (this requires an outer join):
List the count of fires for the parcels that had a fire (this requires an inner join with grouping):SELECT p.parcelid, p.add1, p.add2, f.fdate FROM parcels p left outer join fires f on p.parcelid = f.parcelid ORDER BY p.parcelid, f.fdate;
SELECT p.parcelid, COUNT(fdate) fire_count FROM parcels p, fires f WHERE p.parcelid = f.parcelid GROUP BY P.PARCELID ORDER BY COUNT(fdate) DESC, p.parcelid;Note that the query above lists the parcels in descending order of the count of fires. When specifying a group function or other expression in the ORDER BY clause, you must use the full expression, even if you defined a column alias for it in the SELECT list. In this case, we must use COUNT(FDATE) rather than FIRE_COUNT in the ORDER BY clause. The DESC keyword after COUNT(FDATE) indicates that we want the fire counts shown in descending order, rather than the default ascending order. You need to apply the DESC keyword to every expression in the ORDER BY clause that you want in descending order.
List the count of fires for all parcels, whether it experienced a fire or not (this requires an outer join with grouping):
SELECT p.parcelid, COUNT(fdate) fire_count FROM parcels p left outer join fires f on p.parcelid = f.parcelid GROUP BY p.parcelid ORDER BY COUNT(fdate) DESC, p.parcelid;Note that the query above uses an outer join on the FIRES table -- indicated by the outer join symbol (+) -- to include the parcels that are not listed in the FIRES table in the count. The outer join symbol indicates where a NULL should replace a real value if none is available in the table. Note that, if joining PARCELS and FIRES required matching two columns instead of just one, then we would need to use the outer join symbol on the FIRES side of the join for both columns.
The query below runs but returns the wrong result--some parcels that had no fires show up with one fire in the count. Why?
SELECT p.parcelid, COUNT(*) bogus_fire_count FROM parcels p left outer join fires f on p.parcelid = f.parcelid GROUP BY p.parcelid ORDER BY COUNT(*) DESC, p.parcelid;
Views are the most useful of these alternate object types for our purposes. A view is a query that is stored in the database, then treated like a table. Unlike a table you create using CREATE TABLE ... AS SELECT, which creates a one-time snapshot of the data returned by the query, a view will reflect the current state of the tables in the underlying query. Hence, if the tables in the database are changing over time, the same query on a view may return different results at different times. Creating a view is similar to the CREATE TABLE ... AS SELECT statement; instead, use CREATE VIEW ... AS SELECT. For example:
CREATE VIEW parcel_owners AS SELECT p.parcelid, o.oname, P.ADD1, P.ADD2, P.ZIP FROM parcels p, owners o WHERE p.onum = o.ownernum(+); CREATE VIEW owner_sqft AS SELECT o.oname, SUM(SQFT) TOTAL_SQFT FROM parcels p, owners o WHERE p.onum = o.ownernum GROUP BY o.oname;
Once the view is created, it will appear in the 'views' folder within the relevant schema within your database. (The default schema is 'public'.) Postgres maintains a number of system tables that list information about users, schemas, databases, etc. (See: postgres docs on system information functions).
In MS-Access, pasting a 'create' statement in the SQL query window will
not work. Use the Create/Table tabs first to define the table structure
and then you can use 'insert' SQL statements to add rows.
If you use pgAdmin4 to access a Postgres database, clicking on a table
in the main window will show, in the SQL pane, the SQL commands needed
to create the table (and specify data types, permissions, etc.).
This can be handy if you wish to create a similar table without a lot of
typing.
Information about tables, columns, data types, and the like are maintained in system tables. For example, this query lists all columns defined within all tables of the current database that are in schema='public':
This query lists all tables with the 'boston96' schema (of the eboston database):select * from information_schema.columns
where table_schema='public';
select * from information_schema.tables
where table_schema='boston96';
However, you need administrative privileges in order to be able to query these system tables within the 'information_schema' schema.
INSERT INTO table1 (col1, col2, ...) VALUES (value1, value2, ...)
or
INSERT INTO table1 (col1, col2, ...) SELECT ...
INSERT INTO fires (parcelid, fdate, ignfactor, estloss) VALUES (12, '17-JAN-96', 2, 35000)
DELETE FROM table1 WHERE conditions;Example: Delete fires with losses less than $50000 from the database
DELETE FROM fires WHERE estloss < 50000;
UPDATE table1 SET col1 = value1, col2 = value2, ... WHERE conditions;
or
UPDATE table1 SET col1 = (SELECT ...) WHERE conditions;
UPDATE tax SET bldval = 200000 WHERE parcelid = 11;Note: You can also update a table using a subquery. This typically involves one or more correlated subqueries. However, correlated subqueries are beyond this overview and not standardized in the SQL syntax used by the major vendors.
Normally, we are not particularly concerned about transaction control since we are focused on analyzing and cross-referencing existing databases to which we have read-only access and we are generating new data only in temporary tables or personal lookup tables that do not need to be shared. However, we will need to concern ourselves with COMMIT and ROLLBACK for exercises involving the use of data manipulation language terms such as UPDATE. If we run an UPDATE query that fails due to a SQL error, then any pending changes to the database tables that have occurred since the last COMMIT will be rolled back.
CREATE TABLE table1 ... ;
CREATE VIEW view1 ... ; CREATE INDEX index1 ON table1 (col1, ... );
DROP TABLE table1; DROP VIEW view1; DROP INDEX index1;
ALTER TABLE table1 ADD COLUMN columname integer; ALTER VIEW IF EXISTS view1 OWNER TO new_owner; ALTER INDEX IF EXISTS index1 RENAME TO index2;
GRANT SELECT ON table1 TO user1; GRANT SELECT, INSERT, UPDATE (col1) ON table1 TO user1 WITH GRANT OPTION;
REVOKE ALL ON table1 FROM user1;
Written February, 2001-2006 (Tom
Grayson, Joe Ferreira), Last Modified 28 February 2018 (jf)