Library motive

Elimination with a Motive (Conor McBride)

Edward: Elimination rules play an important role in computations over datatypes in proof assistants like Coq. In his paper "Elimination with a Motive", Conor McBride argued that "we should exploit a hypothesis not in terms of its immediate consequences, but in terms of the leverage it exerts on an arbitrary goal: we should give elimination a motive." In other words, proofs in a refinement setting (backwards reasoning) should use their goals to guide elimination.
The purpose of this document is to port Conor's pedagogical examples to Coq notation, as well as give a tutorial about John Major equality (also known as heterogenous equality.) This is basically an annotated version of the first part of the paper--I reused most of the text, adding comments here and there as necessary.

Motivation

Conjunction and Disjunction

Conor: Undergraduates learning natural deduction are typically taught the following elimination rules for conjunction and disjunction.

Lemma and_project_l : forall {A B : Prop}, A /\ B -> A.
  tauto. Qed.
Lemma and_project_r : forall {A B : Prop}, A /\ B -> B.
  tauto. Qed.
Lemma or_elim : forall {A B C : Prop}, A \/ B -> (A -> C) -> (B -> C) -> C.
  destruct 1; tauto. Qed.

Students usually grasp the and_project rules easily, but find or_elim frightening. Edward: We can make this clear by attempting to do proofs in the forward style: (here False is a stand-in for some goal which the student doesn't know they are proving yet.)

Goal forall (A B : Prop), A /\ B -> False.
  intros. pose proof (and_project_l H).
Abort.

Goal forall (A B : Prop), A \/ B -> False.
  intros. try pose proof (or_elim H).
  (* Error: Cannot infer the implicit parameter C of or_elim. *)
Abort.

Coq has the same difficulty as our hypothetical student: the trouble is that C appears from nowhere: students struggle to dream up of C's which will eventually lead to their goal.
When they learn proof by refinement (backwards reasoning), or_elim finally makes sense: given A \/ B, we can instantiate C with our goal, splitting it into cases.

Goal forall (A B : Prop), A \/ B -> B \/ A.
  intros. apply (or_elim H); auto.
Qed.

The need to prove the goal is why we are eliminating A \/ B: the goal is the motive for the elimination. We can choose the appropriate motive precisely because 'C appears from nowhere': or_elim is parametric in its motive.
Elimination rules whose conclusion is a parameter--the motive variable--allow us to exploit a hypothesis whatever the goal, just as 'left rules' in sequent calculus analyse hypotheses regardless of what stands right of the turnstile. (Edward: To refresh your sequent calculus, check out http://logitext.mit.edu/logitext.fcgi/tutorial scrolling to the "Summary. Here are all the inference rules for first-order logic.")
In this light, the 'simplicity' of the and-project rules is less attractive: we may only exploit A /\ B when we want to know A or we want to know B. I join the many advocates of the 'Gentzenized' alternative (in reference to so-called 'Gentzen systems', also known as sequent calculi), exploiting A /\ B, whatever our motive C.

Lemma and_elim : forall {A B C : Prop}, A /\ B -> (A -> B -> C) -> C. tauto. Qed.

Edward: As an aside, on the other side of the Curry-Howard correspondence, the projection rules are fst/snd, whereas the Gentzenized rule is uncurry.

Structural Induction and Recursion

'Mathematical Induction' is another common example of elimination with a motive.

Lemma nat_induction : forall (P : nat -> Prop),
                        P 0 ->
                        (forall n, P n -> P (S n)) ->
                        forall n, P n. exact nat_ind. Qed.

Here, P stands for a family of propositions indexed by a number. Not even the most ardent 'forwardist' is bold enough to suggest that we should search our collection of established facts for a pair related by P in order to add forall n, nat : P n to that collection. nat_induction needs a motive not only because, like or_elim, it splits the proof into cases, but also because the abstract n being eliminated is instantiated in each case. The point of induction is not just to decompose a hypothesis but to simplify the goal: where constructor symbols appear, computation can happen.
If we allow P to stand for a nat-indexed family of types and supply the appropriate computation behavior, induction becomes dependently typed primitive recursion, supporting function on n whose return type depends on n. The explicit indexing of P by numbers makes a strong connection to pattern matching and structural recursion. We can expose these patterns even in simply typed examples by making a definition:

Definition Plus := fun (x y : nat) => nat.

Goal forall x y : nat, Plus x y.
  induction x.
  (* forall y : nat, Plus 0 y *)
  Focus 2.
  (*
  x : nat
  IHx : forall y : nat, Plus x y
  ============================
   forall y : nat, Plus (S x) y
  *)

Abort.

The return type of the goal reads like the left-hand side of a functional program 'under construction'. Induction splits our programming problem in two: we can read off the instantiated patterns and, in the S case, the legitimate class of recursive calls.
An elimination rule with an indexed motive variable P justifies a kind of pattern analysis, 'matching' P's arguments in the conclusion (the goal patterns) against P's arguments in the premises (the subgoal patterns): P's arguments in inductive hypotheses (the recursion patterns) allow the corresponding recursive calls. To equip an elimination rule with a computational behavior is to give its associated pattern matching and structural recursion an operational semantics.

(* Illustrated, we have:

 forall (P : nat -> Prop),
                     P    0 ->    (* subgoal patterns *)
   (forall n, P n -> P (S n)) ->  (*      --"--       *)
             (* \__ recursion patterns *)
           forall n, P    n.      (* goal patterns *)

Or, more concretely:

                                    (forall y, Plus    0  y) -> (* subgoal patterns *)
  (forall x, (forall y, Plus x y) -> forall y, Plus (S x) y) -> (*      --"--       *)
             (* \__ recursion patterns *)
                                   forall x y, Plus    x  y     (* goal patterns *) *)


Relation Induction

Inductively defined relations may also be presented with an elimination rule corresponding to induction on derivations. For example, <= may be defined as follows:

Inductive le : nat -> nat -> Prop :=
  | le_refl : forall n, le n n
  | le_succ : forall m n, le m n -> le m (S n).
Hint Constructors le.

Definition le_induction :
  forall (P : nat -> nat -> Prop),
    (forall n, P n n) ->
    (forall m n, le m n -> P m n -> P m (S n)) ->
    forall {m n}, le m n -> P m n.
  exact le_ind.
Qed.

Relation induction is easy to apply if the eliminated hypothesis, le m n, ranges over the entire domain of the relation: we can choose P by naive 'undergraduate' textual matching.

Goal forall m n, le m n -> le m n.
  induction 1; auto.
Qed.

However, if the hypothesis is instantiated, we still need a P indexed over the whole domain, so we employ a well known goal transformation which I learned from James McKinna--use a general le m n, but constrain m and n with equations.

Goal forall (P : nat -> nat -> Prop) {m n}, le m n -> P m n.
  intro P. apply (le_induction (fun m n => P m n)).
Abort.

Goal forall {m}, le m 0 -> m = 0.
  intro. apply (le_induction _). (* not obvious *)
  trivial.
  (* In fact, Coq gets it wrong, picking (fun m n => m = n).
  This is not provable.

  m : nat
  ============================
   forall m0 n : nat, le m0 n -> m0 = n -> m0 = S n *)

Abort.

Goal forall {m}, le m 0 -> m = 0.
  intro. induction 1. trivial.
Abort.

We should generalize and add an equation.

Goal forall {m n}, le m n -> n = 0 -> m = 0.
  apply (le_induction (fun m n => n = 0 -> m = 0)); try inversion 3; trivial.
  (* Note: there is a typo in the paper: m <= n should not be included in P *)
Qed.

More generally, an elimination rule for an inductive relation, R : forall (x : X) ..., Prop typically requires some P : forall (x : X) ..., Prop as motive.

(* 
Goal forall (y : Y) ..., R (t y) ... -> P y ... .
  assert (forall (x : X) (y : Y) ..., R x ... -> x = t y -> ... -> P x ...).
  apply (induction (fun x ... => forall (y : Y) ..., x = t y -> ... -> P y ...)).
*)


Plugging this P and the proof of R (t y) ... into the rule delivers a proof of P y ... subject to the trivial equations t y = t y .... This technique gives a slightly clumsier for P than we chose for m <= 0, which only constrains one argument, so needs only one equation. It is not hard to see how to remove the unnecessary equation.
Note that our chosen P x ... resembles the goal, but with some equations inserted and R (t y) ... missing. P is not indexed over the proof of R x ..., so elimination tells us nothing about it: we an safely omit it from the motive.

Induction for Dependent Datatypes

The datatype analogue of inductively defined relations are dependent families of datatypes, such as vectors--list of a given length--defined as follows:

Inductive Vect (A : Type) : nat -> Type :=
  | vnil : Vect A 0
  | vcons : forall {n}, A -> Vect A n -> Vect A (S n).

Lemma Vect_elim : forall {A : Type} (P : forall {n : nat}, Vect A n -> Type),
                         P (vnil A) ->
                         (forall {n} (x : A) {xs}, @P n xs -> @P (S n) (vcons A x xs)) ->
                         forall {n} (xs : Vect A n), @P n xs.
  exact Vect_rect.
Qed.

(* NB: Conor wants A to be inferred for constructors but explicit for the type;
Coq doesn't support this. *)


Proof terms for relations are interesting only for what they say about the indices--their structure is unimportant. The terms in dependent datatypes are the actual data. Corresponding, the motive P of Vect_elim is indexed not only over the length of n, but also over the vector itself: we care if a vector is vnil or vcons. On the other hand, P is not indexed over the element type A, which is parametric to the entire inductive definition.
'Constraint by equation' also works for instantiated datatypes. For example:

Definition VTail := fun {A : Type} {m : nat} (xxs : Vect A (S m)) => Vect A m.

Require Import JMeq.
Infix "==" := JMeq (at level 70, no associativity).

Goal forall {A : Type} {m : nat} (xxs : Vect A (S m)), VTail xxs.
  intro A.
  assert (forall {n : nat} (xs : Vect A n) {m : nat} (xxs : Vect A (S m)),
            n = S m -> xs == xxs (* !! *) -> VTail xxs) as X.
  apply (Vect_elim (fun {n : nat} (xs : Vect A n) =>
                      forall {m : nat} (xxs: Vect A (S m)),
                        n = S m -> xs == xxs (* !! *) -> VTail xxs)).

Solving the equations refutes the base case premises...

  (*
  A : Type
  ============================
   forall (m : nat) (xxs : Vect A (S m)),
   0 = S m -> vnil A == xxs -> VTail xxs
  *)

  inversion 1.

...and reduces the step case to:

  (*
  A : Type
  ============================
   forall (n : nat) (x : A) (xs : Vect A n),
   (forall (m : nat) (xxs : Vect A (S m)), n = S m -> xs == xxs -> VTail xxs) ->
   forall (m : nat) (xxs : Vect A (S m)),
   S n = S m -> vcons A x xs == xxs -> VTail xxs
  *)

  inversion 2; subst. intro eq; rewrite <- eq.

The return type again shows the one pattern possible, and xs is the tail we seek.

  (*
  ============================
   VTail (vcons A x xs)
  *)

  exact xs.

  (* ungeneralize the hypothesis... *)
  intros; apply (X (S m) xxs); auto.
Qed.

Unlike with <=, our chosen P did quantify over the eliminated xxs--it is matched to the xs in the goal patterns. Omitted this time is A, parametric to the definition, so kept parametric in the elimination. We must be sensitive to these distinctions in order to deliver appropriate behaviour, whatever the elimination rule.

Equational Constraints and Dependent Types

By now, the eagle-eyed will have noticed that I write batched equations like x = t y ... without worrying about type safety. Indeed, in the above example, I wrote xs == xxs where xs : Vect A n and xxs : Vect A (S m). The conventional Martin-Lof definition of = forbids such heterogeneous equations, relating elements of different types. You can thus deduce that I am using an unconventional definition.
I define == as follows:

Print JMeq.

(*
Inductive JMeq (A : Type) (x : A) : forall B : Type, B -> Prop :=
    JMeq_refl : x == x
*)


Check JMeq_ind.

(*
JMeq_ind
     : forall (A : Type) (x : A) (P : A -> Prop),
       P x -> forall y : A, x == y -> P y
*)


This == can compare anything to a, even if it is not in A. Correspondingly, we may form heterogeneous sequences s == t .... However, the introduction and elimination rules follow the conventional homogeneous definition: we shall only treat something as an equal of a if its type really is A. I call this 'John Major' equality, because it widens aspirations to equality without affecting the practical outcome.
If s ... and t ... are vectors in the same telescope (Edward: a telescope is a string of abstractions, i.e. a generalized binding), then the leftmost equation s1 == t1 is homogeneous and thus vulnerable to elimination.
'John Major' equality is equivalent to extending Martin-Lof equality with Altenkirch and Streicher's 'uniqueness of identity proofs' axiom, often referred to as 'axiom K'. It is clear that the new equality subsumes the old. On the other hand, we can write a heterogeneous equation a = b as a homogeneous equation between pairs (A, a) = (B, b) in the type exists (T : Type), T. Clearly (A, a) equals itself. The elimination rule follows if a = a' is a consequence of (A, a) = (A, a'), and this is a wel known variant of axiom K. The details of this construction can be found in Conor McBride's thesis.

The rest

Edward: In the rest of the paper, Conor McBride describes the tactic 'BasicElim', which automatically performs the procedure we've seen alluded to above.
Fortunately for the average Coq user, a variant of this tactic has been implemented, called 'dependent induction. We can see how it solves both of our previous problems.

Require Import Equality.

Goal forall {m}, le m 0 -> m = 0.
  intros m H. dependent induction H; trivial.
Qed.

Lemma vtail : forall {A : Type} {m : nat} (xxs : Vect A (S m)), VTail xxs.
  intros A m xxs. dependent induction xxs; trivial.
Qed.
Print Assumptions vtail.

As you can see, dependent induction uses JMeq internally.
There is more discussion about JMeq in Adam Chlipala's book CPDT:
http://adam.chlipala.net/cpdt/html/Equality.html
JMeq (and axiom K) are very convenient assumptions to make, although in the light of new developments such as Homotopy Type Theory, there may be interesting mathematics to be done in type theories with models that do not validate axiom K. So while some of the more recent, practically oriented dependently typed languages (e.g. Idris) offer heterogenous equality as the default, there are also many reasons why Coq developments will try to avoid using it. Another downside of 'dependent induction' is that it is not very fast, which can cause problems for large developments.