7.3.1 Classification of stationary points

Next: 7.3.2 Nonisolated stationary points Up: 7.3 More about stationary Previous: 7.3 More about stationary Contents Index

7.3.1 Classification of stationary points

Let us first recall the definitions of local extrema at stationary points:

Definition 7.3.1. Suppose that $D : {\bf R}^n \rightarrow {\bf R}$ is a scalar field on ${\bf R}^n$ . Let ${\bf u}$ be a stationary point of , that is $\nabla D({\bf u}) = {\bf0}$ . Then

${\bf u}$ is a local maximum if there exists a neighborhood of ${\bf u}$ such that for all ${\bf v} \in U$ , $D({\bf v}) \leq D({\bf u})$ .
${\bf u}$ is a local minimum if there exists a neighborhood of ${\bf u}$ such that for all ${\bf v} \in U$ , $D({\bf v}) \geq D({\bf u})$ .

To illustrate the application of these definitions, let us begin with a simple one-parameter function (which may arise, for example, in the point-curve distance problem). Suppose is a stationary point of , i.e. $\dot{D}(u_0) = 0$ . Then a Taylor series expansion tells us that

$\displaystyle D(u_0+h) - D(u_0) = h \dot{D}(u_0) + \frac{1}{2}h^2 \ddot{D}(u_0 + ht) %\nonumber \\ = \frac{1}{2}h^2 \ddot{D}(u_0 + ht)\;,$

(7.18)

where $t \in [0,1]$ is some specific value depending on . From Definition 7.3.1, we see that is a local maximum if the right hand side of (7.18) is never positive for close to 0 and a local minimum if the right hand side is nonnegative for close to 0 . Now, if $\ddot{D}(u_0) \neq 0$ , then by continuity, there is some $\delta$ such that for $\vert h\vert < \delta$ , $\ddot{D}(u_0)$ has the same sign as $\ddot{D}(u_0 + ht)$ . Therefore, if $\ddot{D}(u_0) < 0$ , then $h^2 \ddot{D}(u_0 + ht) < 0$ for $\vert h\vert < \delta$ and hence is a local maximum. Similarly, if $\ddot{D}(u_0) > 0$ , then is a local minimum. These results are familiar from calculus [166].

Unfortunately, the problem becomes more complicated if $\ddot{D}(u_0) = 0$ . In this case we can not say anything about the sign of $\ddot{D}(u_0 + ht)$ and must therefore expand the Taylor series to higher derivatives. For example, if the second derivative is zero but the third derivative is nonzero, then we will have neither a maximum nor a minimum but a point of inflection. Consider the function ; in any neighborhood of the stationary point , the function takes on both positive and negative values and thus is neither a maximum nor a minimum. If the third derivative is also zero, we have to look at the fourth derivative, and so on. Fortunately, these exceptions are rare, but a robust point classification algorithm should be able to handle such cases. In general, if $D^{(n)}(a)$ is the first derivative function that does not vanish, then the function has [152]

if n is odd; neither maximum nor minimum.
if n is even;
- if $D^{(n)}(a) < 0$ ; maximum,
- if $D^{(n)}(a) > 0$ ; minimum.

Similar analysis can be performed with functions of more than one variable. Consider a function of two variables, which might arise, for example, in a curve-curve distance problem. A Taylor expansion around the stationary point to second order leads to

$\displaystyle D(u_0+h,v_0+k) - D(u_0,v_0) = \frac{h^2}{2}D_{uu} (u_0+ht, v_0+kt... ...D_{uv} (u_0+ht, v_0+kt)%\nonumber \\ + \frac{k^2}{2}D_{vv} (u_0+ht, v_0+kt)\;,$

(7.19)

where $t \in [0,1]$ . Instead of one second order term, this time we have three to deal with, which of course makes classification more difficult. However, by simply completing the square with the second order terms, we can easily formulate conditions for minima, maxima, and saddle points. To see this, let $a \equiv \frac{D_{uu}}{2}, b \equiv \frac{D_{uv}}{2},$ and $c \equiv \frac {D_{vv}}{2}$ . Then we have

$\displaystyle D(u_0+h,v_0+k) - D(u_0, v_0) = a h^2 + 2b h k + c k^2 %\nonumber ... ...ac{b^2}{a})k^2 %\nonumber \\ = a(h+\frac{b}{a} k)^2 + (c-\frac{b^2}{a}) k^2\;.$

(7.20)

Now and appear only within squared expressions. This new form enables us to ignore the signs of and and instead concentrate on the signs of the coefficients and $(c-\frac{b^2}{a})$ . To see this, notice that within any arbitrarily small neighborhood of , the ratio $\frac{h}{k}$ takes on all possible values between $-\infty$ and $\infty$ , and therefore the first term of the right hand side of (7.20) can become extremely small or extremely large compared to the second term. However, the signs of each term are not so easily changed. These are determined solely by the signs of and $(c-\frac{b^2}{a})$ . If both coefficients are nonzero at , a continuity argument shows that the expressions $D_{uu}(u_0+ht, v_0+kt)$ and $D_{vv}(u_0+ht, v_0+kt) - \frac{D_{uv}^2(u_0+ht, v_0+kt)} {D_{uu}(u_0+ht, v_0+kt)}$ do not change sign as long as is sufficiently close to . Therefore, in order to determine the signs of these coefficients within a small neighborhood of , we can simply evaluate them at . If the coefficients are nonzero there, then their signs will enable us to classify the stationary point. For example, if both coefficients are positive at the stationary point, then clearly the right hand side of (7.20) is positive within some neighborhood of the stationary point, and hence can be classified as a minimum. If the first coefficient is negative and the second is positive, then because either term may be made zero at different parts of any neighborhood, the right hand side of (7.20) may be positive or negative arbitrarily close to the stationary point, and thus is neither a maximum nor a minimum. The other possibilities lead to the following theorem, which is proven more fully in [166]:

Theorem 7.3.1.

If $D_{uu} < 0$ and $D_{uv}^2 < D_{uu} D_{vv}$ at the stationary point , then is a local maximum.
If $D_{uu} > 0$ and $D_{uv}^2 < D_{uu} D_{vv}$ , then is a local minimum.
If $D_{uv}^2 > D_{uu} D_{vv}$ then is a saddle point (neither a maximum nor a minimum).
If none of the above conditions apply, then it is necessary to examine higher-order derivatives.

Notice that the third condition above applies even if $D_{uu} = 0$ . In this case, the second order term reduces to . As long as $b \neq 0$ , specific choices of and within any arbitrarily small neighborhood of the stationary point can make this term either positive or negative, and hence is neither a maximum nor a minimum.

It is possible to complete the square for functions of more than two variables; however, in order to develop general conditions for minima and maxima of such functions, we will employ the powerful theory of quadratic forms [410].

Definition 7.3.2. A quadratic form is an expression of the form ${\bf u}^{\rm T} A {\bf u}$ , where ${\bf u} = (u_1 u_2 \ldots u_n)^{\rm T}$ is an -dimensional column vector of unknowns and is an $n \times n$ matrix. A quadratic form ${\bf u}^{\rm T} A {\bf u}$ is said to be

positive definite if ${\bf u}^{\rm T} A {\bf u} > 0$ for all ${\bf u} \neq 0$ .
positive semidefinite if ${\bf u}^{\rm T} A {\bf u} \geq 0$ for all ${\bf u}$ .
negative definite (semidefinite) if $-{\bf u}^{\rm T} A {\bf u}$ is positive definite (semidefinite).
indefinite otherwise (i.e. the form takes on both positive and negative values).

To see how these quadratic forms are used, let us take a Taylor expansion of a function of variables $D({\bf u})$ around the stationary point ${\bf u} = {\bf u}_0$ :

$\displaystyle D({\bf u}_0 + {\bf h})-D({\bf u}_0) = \frac{1}{2} {\bf h}^{\rm T} H {\bf h} + O (\vert{\bf h}\vert^3)\;.$

(7.21)

Here is the Hessian matrix of ; the element of the th row, th column, is given by

$\displaystyle H_{ij} = \frac{\partial^2 D}{\partial u_i \partial u_j} ({\bf u}_0)\;.$

(7.22)

Now clearly, if the quadratic form ${\bf h}^{\rm T} H {\bf h}$ is positive definite, then within some neighborhood of the stationary point ${\bf u}_0$ , the right hand side of (7.21) is nonnegative, and therefore ${\bf u}_0$ is a local minimum. Similarly, if the quadratic form is negative definite, then ${\bf u}_0$ is a local maximum.

At this point, we can use a familiar theorem of linear algebra whose proof is given in [410]:

Theorem 7.3.2. Let ${\bf u}^{\rm T} A {\bf u}$ be a quadratic form, let be a real symmetric matrix, and for $i = 1, 2, \ldots, n$ , let be the determinant of the upper left $i \times i$ submatrix of A (which we will denote ). Then the quadratic form is positive definite if and only if for all .

If the condition in the above theorem is relaxed to $\geq$ , a corresponding theorem applies to positive semidefinite forms (that is, a form is positive semidefinite if and only if $d_i \geq 0$ for all ).

Theorem 7.3.2 immediately gives us a necessary and sufficient condition for negative definiteness as well. For, if we let $B \equiv -A$ , the quadratic form ${\bf u}^{\rm T} A {\bf u}$ is negative definite if and only if ${\bf u}^{\rm T} B {\bf u}$ is positive definite if and only if $\det B_i > 0$ for all . But $\det B_i = (-1)^i \det A_i$ and so we have the following corollary:

Corollary 7.3.1. Let and be as before. Then the quadratic form ${\bf u}^{\rm T} A {\bf u}$ is negative definite if and only if for all .

Simply applying these conditions gives us the following theorem:

Theorem 7.3.3. Let be as before, and let be its Hessian matrix, evaluated at the stationary point ${\bf u}_0$ . Let be the determinant of the upper left $i \times i$ submatrix of .

If for all , then ${\bf u}_0$ is a local minimum.
If for all , then ${\bf u}_0$ is a local maximum.

If, on the other hand, the form ${\bf h}^{\rm T} H {\bf h}$ is indefinite, we can conclude that ${\bf u}_0$ is neither a minimum nor a maximum (in two dimensions, such points are usually called saddle points), as the following theorem shows:

Theorem 7.3.4. Let $H, D, {\bf u}_0,$ and be as before. Suppose that the do not satisfy the conditions for definiteness or semidefiniteness. Then ${\bf u}_0$ is neither a local minimum nor a local maximum.

Proof: Because ${\bf h}^{\rm T} H {\bf h}$ is indefinite, there exist vectors ${\bf h}_1$ and ${\bf h}_2$ such that ${\bf h}_1^{\rm T} H {\bf h}_1 < 0$ and ${\bf h}_2^{\rm T} H {\bf h}_2 > 0$ . Pick an arbitrary neighborhood of ${\bf u}_0$ . Then for any $\varepsilon > 0$ smaller than some positive number $\delta$ , ${\bf u}_0 + \varepsilon {\bf h}_1 \in U, {\bf u}_0 + \varepsilon {\bf h}_2 \in U, (\varepsilon {\bf h}_1)^{\rm T} H (\varepsilon {\bf h}_1) < 0,$ and $(\varepsilon {\bf h}_2)^{\rm T} H (\varepsilon {\bf h}_2) > 0$ .

Using (7.21), we obtain

$\displaystyle D({\bf u}_0+\varepsilon {\bf h}_1) - D({\bf u}_0) = \frac{1}{2} \... ...lon^2 {\bf h}_1^{\rm T} H {\bf h}_1 + O(\varepsilon^3 \vert{\bf h}_1\vert^3)\;,$			(7.23)
$\displaystyle D({\bf u}_0+\varepsilon {\bf h}_2) - D({\bf u}_0) = \frac{1}{2} \... ...lon^2 {\bf h}_2^{\rm T} H {\bf h}_2 + O(\varepsilon^3 \vert{\bf h}_2\vert^3)\;.$			(7.24)

Clearly, for sufficiently small $\varepsilon$ , the right hand side of (7.23) will remain negative, while the right hand side of (7.24) will remain positive. Thus we have, for sufficiently small $\varepsilon$ ,

$\displaystyle D({\bf u}_0+\varepsilon {\bf h}_1) < D({\bf u}_0) < D({\bf u}_0+\varepsilon {\bf h}_2)\;.$

(7.25)

Since was an arbitrary neighborhood of ${\bf u}_0$ , there is no neighborhood of ${\bf u}_0$ where $D({\bf u}_0) \geq D({\bf u})$ for all $u \in U$ or where $D({\bf u}_0) \leq D({\bf u})$ for all $u \in U$ . Therefore ${\bf u}_0$ is neither a local minimum nor a local maximum.

Unfortunately, trouble can occur if the Hessian is only semidefinite and not definite. In this case we need to look at higher order derivatives or use some other method to classify the point precisely. Although this is not too difficult for one-parameter functions, it becomes progressively more involved as the number of variables increases (see for example, [323,129]).

Next: 7.3.2 Nonisolated stationary points Up: 7.3 More about stationary Previous: 7.3 More about stationary Contents Index

December 2009