Massachusetts Institute of Technology - Department of Urban Studies and Planning


11.521
Spatial Database Management and Advanced Geographic Information Systems
11.523
Fundamentals of Spatial Database Management
11.524
Advanced Geographic Information System Project 

Handling One-to-Many Relations + Joins, Grouping, Aggregation, and Subqueries

Eric Huntley - 15 February 2018


Today's Outline:


Thinking Relationally and Handling One-to-Many Relations

Review use of SQL with Toy Parcel database:

SQL queries and table joins
Sample Parcel Database Schema
  • See the sample parcel database =====>
  • SQL Tips - more on SQL help pages
    • Use text editor to save/print SQL queries
    • Textedit, TextWrangler, Notepad++,...
    • Illustrate pgAdmin3

  • Basic joins & 'group by': owners with number of parcels
SELECT OWNERS.ONAME, COUNT(*) PARCEL_COUNT 
  FROM PARCELS, OWNERS
 WHERE PARCELS.ONUM = OWNERS.OWNERNUM
 GROUP BY OWNERS.ONAME
 ORDER BY OWNERS.ONAME;
...OR...
SELECT OWNERS.ONAME, COUNT(*) AS PARCEL_COUNT
FROM PARCELS inner join OWNERS on PARCELS.ONUM = OWNERS.OWNERNUM
GROUP BY OWNERS.ONAME
ORDER BY OWNERS.ONAME;
parcel database schema

 

The basic SELECT statement to query one or more tables:

   SELECT [DISTINCT] column_name1[, column_name2, ...]
      FROM table_name1[, table_name2, ...]
     WHERE search_condition1
      [AND search_condition2 ...]
      [OR search_condition3...]
    [GROUP BY column_names]
   [HAVING column_names]
    [ORDER BY column_names];

Handling one-to-many Relations: e.g., parcels & fires or parcels & owners


GROUP BY, and Aggregation Functions:

These examples draw on the sample PARCELS database that we have used previously.
1. List all the fires, including the date of the fire: 2. List the count of fires by parcel:
  SELECT PARCELID, FDATE
    FROM FIRES
ORDER BY PARCELID, FDATE;
 SELECT PARCELID, COUNT(FDATE) AS FIRE_COUNT 
   FROM FIRES 
GROUP BY PARCELID 
ORDER BY PARCELID;
Groups are shown in color, but this query does not collapse groups into a single row.
Groups and summary functions have been calculated; notice that no FDATE values are shown. (Why? What summary of FDATE would make sense?)
PARCELID FDATE
2 02-AUG-88
2 02-APR-89
3 26-JUL-89
3 26-JUL-90
7 01-AUG-87
20 02-JUL-89
PARCELID FIRE_COUNT
2 2
3 2
7 1
20 1

 

The Different Roles of the WHERE Clause and the HAVING Clause

The WHERE clause restricts the rows that are processed before any grouping occurs.

The HAVING clause is used with GROUP BY to limit the groups returned after grouping has occurred.

3. List all the fires that occurred on or after 1 August 1988: 4. List the count of fires that occurred on or after 1 August 1988 by parcel: 5. List the count of fires that occurred on or after 1 August 1988 by parcel for parcels that had more than one fire:
 SELECT PARCELID, FDATE
   FROM FIRES
  WHERE FDATE >=
          TO_DATE('01-AUG-1988',
                  'DD-MON-YYYY')
ORDER BY PARCELID, FDATE;
 SELECT PARCELID,
        COUNT(FDATE) AS FIRE_COUNT
   FROM FIRES
  WHERE FDATE >=
          TO_DATE('01-AUG-1988',
                  'DD-MON-YYYY')
GROUP BY PARCELID
ORDER BY PARCELID;
 SELECT PARCELID,
        COUNT(FDATE) AS FIRE_COUNT
   FROM FIRES
  WHERE FDATE >=
          TO_DATE('01-AUG-1988',
                  'DD-MON-YYYY')
GROUP BY PARCELID
  HAVING COUNT(FDATE) > 1 
ORDER BY PARCELID;
Groups are shown in color, but this query does not actually perform grouping. Note that the fire at parcel 7 on 1 August 1987 has been excluded by the WHERE clause.
This query shows the result of grouping, but no HAVING clause is applied. Groups that satisfy the HAVING clause of Query 5 are shown in bold.
Final result, after groups that fail the HAVING clause have been eliminated.
PARCELID FDATE
2 02-AUG-88
2 02-APR-89
3 26-JUL-89
3 26-JUL-90
20 02-JUL-89

 

PARCELID FIRE_COUNT
2 2
3 2
20 1
PARCELID FIRE_COUNT
2 2
3 2

Rules for GROUP BY Queries

Think for a minute why this query does not make sense. For each distinct value of PARCELID, there may be multiple values of FDATE. For example, the parcel with PARCELID = 2 had fires on both 2 Aug. 1988 and 2 Apr. 1989. When we group by PARCELID alone, the results of the query will have at most one row for each value of PARCELID. If we include FDATE in the SELECT list, which FDATE should Oracle pick for PARCELID = 2? The answer is undefined, and that is why the query is invalid. Oracle is unable to pick a single value of an unaggregated item to represent a group.

To fix this query, we must ensure that the SELECT list and the GROUP BY clause contain the same expressions (excluding expressions that use group functions such as COUNT(FDATE)).We have two choices. First, we can remove FDATE from the SELECT list (and the ORDER BY clause):

    SELECT PARCELID, COUNT(FDATE) AS FIRE_COUNT 
      FROM FIRES
     GROUP BY PARCELID
     ORDER BY PARCELID;

Second, we can add FDATE to the GROUP BY clause:

   SELECT PARCELID, FDATE, COUNT(FDATE) AS FIRE_COUNT
      FROM FIRES
     GROUP BY PARCELID, FDATE
     ORDER BY PARCELID, FDATE
Be careful when picking this second option! Adding a column to the GROUP BY clause may change the meaning of your groups, as in this example. Notice that all the FIRE_COUNT values for this last query are 1.That's because by adding FDATE to the GROUP BY clause we have effectively made each group a single row from the FIRES table--not very interesting!

Suppose we want to look at losses for fires by ignition factor (IGNFACTOR), but want to ignore the case where IGNFACTOR is 2. We can write this query two ways.

N.B.: To find rows with NULLs in them, you must use the syntax "expr IS NULL", as above. If you use "expr = NULL", the query will run but not return any rows. The other comparison operators (e.g., =, <>, !=, >, <, >=, <=, LIKE) will never match NULL. For example, the query below executes but returns no rows:

GROUP BY Examples

Find the parcels that experienced more than one fire:

  SELECT PARCELID, COUNT(FDATE) AS FIRE_COUNT
    FROM FIRES
GROUP BY PARCELID
  HAVING COUNT(FDATE) > 1
ORDER BY PARCELID;

List the fires with a loss of at least $40,000:

  SELECT PARCELID, FDATE, ESTLOSS
    FROM FIRES
   WHERE ESTLOSS >= 40000
ORDER BY PARCELID, FDATE, ESTLOSS;

List the count of fires by parcel, counting only fires with a loss of at least $40,000:

  SELECT PARCELID, COUNT(FDATE) AS FIRE_COUNT    
   FROM FIRES 
  WHERE ESTLOSS >= 40000
GROUP BY PARCELID 
ORDER BY PARCELID;

Find the parcels that experienced more than one fire with a loss of at least $40,000:

  SELECT PARCELID, COUNT(FDATE) AS FIRE_COUNT 
    FROM FIRES 
   WHERE ESTLOSS >= 40000 
GROUP BY PARCELID 
  HAVING COUNT(FDATE) > 1 
ORDER BY PARCELID; 

Advanced Queries: Self-Join:

Example: Did any parcel have a fire with little damage (< 10000) and a fire with lots of damage (>40000)

The following query does not work, because it is not possible for value for a single column in a single row to contain two values at the same time:

SELECT F1.PARCELID, FDATE
FROM FIRES
WHERE ESTLOSS < 10000
AND ESTLOSS >= 40000;

If you have trouble imagining the self-join, pretend that we actually created two copies of MATCH, M1 and M2:

CREATE TABLE F1 AS
    SELECT * FROM FIRES;
CREATE TABLE F2 AS
    SELECT * FROM FIRES

Example: Find the time that passed between a fire on a parcel and all fires occurring within 300 days later on the same parcel

Note that a number of days can be added to a date.


Advanced Queries: Subqueries

A subquery can be nested within a query

Example: Find the parcel with the highest estimated loss from a fire

SELECT *
  FROM FIRES
 WHERE ESTLOSS =
         (SELECT MAX(ESTLOSS)
            FROM FIRES);

Alternatively, include the subquery as an inline "table" in the FROM clause:

SELECT F.* 
  FROM FIRES F,
       (SELECT MAX(ESTLOSS) MAXLOSS 
           FROM FIRES) M
 WHERE F.ESTLOSS = M.MAXLOSS;

Example: Find the parcels that have not had a fire

SELECT *
  FROM PARCELS
 WHERE PARCELID NOT IN
         (SELECT PARCELID
            FROM FIRES);

or, more efficiently,

SELECT *
  FROM PARCELS P
 WHERE NOT EXISTS
         (SELECT NULL
            FROM FIRES F
           WHERE P.PARCELID = F.PARCELID);

Example: Find the parcels that have not obtained a permit:

SELECT *
  FROM PARCELS
 WHERE (PID, WPB) NOT IN
         (SELECT PID, WPB
            FROM PERMITS);

or, more efficiently,

SELECT *
  FROM PARCELS P
 WHERE NOT EXISTS
         (SELECT NULL
            FROM PERMITS T
           WHERE P.PID = T.PID and P.WPB = T.WPB);


Lab #2 Discussion: 
    Make sure everyone has found and understands the pre-constructed q_condosum query saved in MS-Access


Home
| Syllabus | Lectures | Labs | CRON| MIT

Developed by Joseph Ferreira and Tom Grayson.
Last modified: Rounaq Basu February 12, 2018