by J. H. Saltzer, 27 April 2003, minor update 11 February 2004
Overview: False alarms can be a nuisance, but attempts to suppress them usually lead to the missing of legitimate events. This trade-off arises in many engineering situations, all of which can be characterized by an approximate mapping of real events that are hard to identify automatically to some proxy that is easier to work with.
There is a helpful representation for this trade-off in the form of a two-by-two matrix. The first step in using this representation is to clearly distinguish the classification of the real events from the classification that the proxy provides.
Examples: Some proxies and the corresponding trade-offs:
Smoke detectors: | nuisance alarms vs missed fires. (burglar alarms and automobile theft alarms are essentially the same.) |
Search queries: | precision vs recall |
Spam filters: | spam that slips through versus wanted messages blocked.
Feature: there is typically an active agent trying to sneak unwanted things past the filter by pushing from the outside. |
Porn filters: | like spam, except that there may be an active agent trying to sneak unwanted things past the filter by pulling from the inside. |
Political filters: | resemble both porn and spam filters; there may be active agents trying to defeat them both inside and outside. |
News filters: | no active agents trying to sneak things past, but the user's real interests may evolve with the news. |
TTL cache invalidation: | expiration is a proxy for invalidity; an expired entry may still be valid and an unexpired entry may already be invalid. |
The law: | abuses not outlawed vs acceptable activities criminalized |
IFF: | unrecognized foe vs unrecognized friend |
Trust: | misplaced trust vs unwarranted suspicion |
Model: There is some set of real things we wish to partition into two categories, which we can call In and Out, but there is no direct way of doing the partitioning. On the other hand, there is a proxy for those things that is relatively easy to partition. The problem is that the proxy is only approximate. (N.B. proxy and approximate have the common Latin root proximare, meaning to be near. This common root is not an accident. By definition, a proxy is approximate.)
A consequence of approximation of proxies: There are four categorization outcomes, two of which are desirable and two of which are undesirable:
The first step in using this model is to recognize and clarify the distinction between the real and the proxy. That is, identify the real categories, then identify the proxy categories.
The trade-off: You may be able to reduce the frequency of one of the undesirable outcomes by adjusting some parameter of the proxy. But that adjustment will probably increase the frequency of the other undesirable outcome.
Analysis: A proxy is not very interesting unless most of the time it produces desirable outcomes. The problem is that there are always some residual undesirable outcomes. One can often reduce the rate/probability of one of the undesirable outcomes but the trade-off is that the rate/probability of the other undesirable outcome will increase. (This is an example of the engineering folklore principle that There is a limited amount of goodness in the universe.) To decide how to set this trade-off, you need to know the cost of the two undesirable outcomes and the benefit of the two desirable outcomes. You may also need to know or estimate the probability that a new candidate for categorization will be In or Out, to estimate the frequencies of the two undesirable outcomes.
Improvement: Reducing both undesirable outcomes simultaneously usually requires discovering a better proxy.
Relation to hints: One way to look at a proxy is that the categorizations it generates are only hints. Hints need to be verified. A better proxy then is one that either (1) reduces at least one of the undesirable outcomes without increasing the other one or (2) makes it easier to verify.
Representations: One can conveniently (and insightfully) represent a false-alarm/missed-event trade-off with either a Venn diagram or a 2 x 2 matrix.
Examples:
Start by asking the two questions for every example: (1) What are the real categories? (2) What are the proxy categories?1. Smoke detector
real categories: fire/no fire
proxy categories: smoke detector signals/is quiet
no fire fire detector every- | missed quiet one | fire, happy | disaster ----------|---------- detector nuisance | disaster signals alarm, | averted wasted | fire call|
2. Document retrieval
real categories: wanted documents/unwanted documents
proxy categories: query matches/query misses
wanted unwanted matches | precision query happy | failure, user | must plod | through junk ----------|---------- missed recall | by query failure, | happy missed | user opportunity |
3. Emergency locator beacon
real categories: ship safe/ship in distress
proxy categories: beacon signal/no signal
ship ship OK sinking beacon OK | ship quiet | lost ----------|---------- beacon calls wasted | ship for help search | saved
4. Military "Identify-friend-or-foe" (IFF) system
real categories: friend or foe
proxy categories: IFF reports friend/foe
friend foe IFF says OK | battle friend | lost ----------|---------- IFF says friendly | battle FOE fire | won incident
Today's example:
RISKS-LIST: Risks-Forum Digest Sunday 20 April 2003 Volume 22 : Issue 70
-----------------
Date: Fri, 18 Apr 2003 12:51:15 -0400
From: griffith@dweeb.org (Jim Griffith)
Subject: Turtle triggers search and rescue effort
The U.S. Coast Guard launched a massive search and rescue effort earlier this week after picking up an emergency distress beacon signal. They finally pinpointed the cause - a turtle had become tangled in a rope tied to a discarded beacon. The original owner was located, and he said he'd lost it some time ago.
http://www.cnn.com/2003/WORLD/americas/04/18/bermuda.turtle.search.ap/index.html
Saltzer@mit.edu