next up previous contents index
Next: 7.3.2 Nonisolated stationary points Up: 7.3 More about stationary Previous: 7.3 More about stationary   Contents   Index


7.3.1 Classification of stationary points

Let us first recall the definitions of local extrema at stationary points:

Definition 7.3.1. Suppose that $ D : {\bf R}^n \rightarrow {\bf R}$ is a scalar field on $ {\bf R}^n$ . Let $ {\bf u}$ be a stationary point of $ D$ , that is $ \nabla D({\bf u}) = {\bf0}$ . Then

  1. $ {\bf u}$ is a local maximum if there exists a neighborhood $ U$ of $ {\bf u}$ such that for all $ {\bf v} \in U$ , $ D({\bf v}) \leq D({\bf u})$ .
  2. $ {\bf u}$ is a local minimum if there exists a neighborhood $ U$ of $ {\bf u}$ such that for all $ {\bf v} \in U$ , $ D({\bf v}) \geq D({\bf u})$ .

To illustrate the application of these definitions, let us begin with a simple one-parameter function $ D(u)$ (which may arise, for example, in the point-curve distance problem). Suppose $ u_0$ is a stationary point of $ D(u)$ , i.e. $ \dot{D}(u_0) = 0$ . Then a Taylor series expansion tells us that

$\displaystyle D(u_0+h) - D(u_0) = h \dot{D}(u_0) + \frac{1}{2}h^2 \ddot{D}(u_0 + ht) %\nonumber \\
= \frac{1}{2}h^2 \ddot{D}(u_0 + ht)\;,$     (7.18)

where $ t \in [0,1] $ is some specific value depending on $ h$ . From Definition 7.3.1, we see that $ u_0$ is a local maximum if the right hand side of (7.18) is never positive for $ h$ close to 0 and a local minimum if the right hand side is nonnegative for $ h$ close to 0 . Now, if $ \ddot{D}(u_0) \neq 0$ , then by continuity, there is some $ \delta$ such that for $ \vert h\vert < \delta$ , $ \ddot{D}(u_0)$ has the same sign as $ \ddot{D}(u_0 + ht)$ . Therefore, if $ \ddot{D}(u_0) < 0$ , then $ h^2 \ddot{D}(u_0 + ht) < 0$ for $ \vert h\vert < \delta$ and hence $ u_0$ is a local maximum. Similarly, if $ \ddot{D}(u_0) > 0$ , then $ u_0$ is a local minimum. These results are familiar from calculus [166].

Unfortunately, the problem becomes more complicated if $ \ddot{D}(u_0)
= 0$ . In this case we can not say anything about the sign of $ \ddot{D}(u_0 + ht)$ and must therefore expand the Taylor series to higher derivatives. For example, if the second derivative is zero but the third derivative is nonzero, then we will have neither a maximum nor a minimum but a point of inflection. Consider the function $ y=x^3$ ; in any neighborhood of the stationary point $ x=0$ , the function takes on both positive and negative values and thus $ x=0$ is neither a maximum nor a minimum. If the third derivative is also zero, we have to look at the fourth derivative, and so on. Fortunately, these exceptions are rare, but a robust point classification algorithm should be able to handle such cases. In general, if $ D^{(n)}(a)$ is the first derivative function that does not vanish, then the function $ D(u)$ has [152]

Similar analysis can be performed with functions of more than one variable. Consider a function $ D(u,v)$ of two variables, which might arise, for example, in a curve-curve distance problem. A Taylor expansion around the stationary point $ (u_0,v_0)$ to second order leads to

$\displaystyle D(u_0+h,v_0+k) - D(u_0,v_0) = \frac{h^2}{2}D_{uu} (u_0+ht,
v_0+kt...
...D_{uv} (u_0+ht, v_0+kt)%\nonumber \\
+ \frac{k^2}{2}D_{vv}
(u_0+ht, v_0+kt)\;,$     (7.19)

where $ t \in [0,1] $ . Instead of one second order term, this time we have three to deal with, which of course makes classification more difficult. However, by simply completing the square with the second order terms, we can easily formulate conditions for minima, maxima, and saddle points. To see this, let $ a
\equiv \frac{D_{uu}}{2}, b \equiv \frac{D_{uv}}{2},$ and $ c \equiv
\frac {D_{vv}}{2}$ . Then we have
$\displaystyle D(u_0+h,v_0+k) - D(u_0, v_0) = a h^2 + 2b h k + c k^2 %\nonumber ...
...ac{b^2}{a})k^2 %\nonumber \\
= a(h+\frac{b}{a} k)^2 + (c-\frac{b^2}{a}) k^2\;.$     (7.20)

Now $ h$ and $ k$ appear only within squared expressions. This new form enables us to ignore the signs of $ h$ and $ k$ and instead concentrate on the signs of the coefficients $ a$ and $ (c-\frac{b^2}{a})$ . To see this, notice that within any arbitrarily small neighborhood of $ (u_0,v_0)$ , the ratio $ \frac{h}{k}$ takes on all possible values between $ -\infty$ and $ \infty$ , and therefore the first term of the right hand side of (7.20) can become extremely small or extremely large compared to the second term. However, the signs of each term are not so easily changed. These are determined solely by the signs of $ a$ and $ (c-\frac{b^2}{a})$ . If both coefficients are nonzero at $ (u_0,v_0)$ , a continuity argument shows that the expressions $ D_{uu}(u_0+ht, v_0+kt)$ and $ D_{vv}(u_0+ht, v_0+kt) -
\frac{D_{uv}^2(u_0+ht, v_0+kt)} {D_{uu}(u_0+ht, v_0+kt)}$ do not change sign as long as $ (u_0+ht, v_0+kt)$ is sufficiently close to $ (u_0,v_0)$ . Therefore, in order to determine the signs of these coefficients within a small neighborhood of $ (u_0,v_0)$ , we can simply evaluate them at $ (u_0,v_0)$ . If the coefficients are nonzero there, then their signs will enable us to classify the stationary point. For example, if both coefficients are positive at the stationary point, then clearly the right hand side of (7.20) is positive within some neighborhood of the stationary point, and hence $ (u_0,v_0)$ can be classified as a minimum. If the first coefficient is negative and the second is positive, then because either term may be made zero at different parts of any neighborhood, the right hand side of (7.20) may be positive or negative arbitrarily close to the stationary point, and thus $ (u_0,v_0)$ is neither a maximum nor a minimum. The other possibilities lead to the following theorem, which is proven more fully in [166]:

Theorem 7.3.1.

  1. If $ D_{uu} < 0$ and $ D_{uv}^2 < D_{uu} D_{vv}$ at the stationary point $ (u_0,v_0)$ , then $ (u_0,v_0)$ is a local maximum.

  2. If $ D_{uu} > 0$ and $ D_{uv}^2 < D_{uu} D_{vv}$ , then $ (u_0,v_0)$ is a local minimum.

  3. If $ D_{uv}^2 > D_{uu} D_{vv}$ then $ (u_0,v_0)$ is a saddle point (neither a maximum nor a minimum).

  4. If none of the above conditions apply, then it is necessary to examine higher-order derivatives.

Notice that the third condition above applies even if $ D_{uu} = 0$ . In this case, the second order term reduces to $ 2bhk + c k^2$ . As long as $ b \neq 0$ , specific choices of $ h$ and $ k$ within any arbitrarily small neighborhood of the stationary point can make this term either positive or negative, and hence $ (u_0,v_0)$ is neither a maximum nor a minimum.

It is possible to complete the square for functions of more than two variables; however, in order to develop general conditions for minima and maxima of such functions, we will employ the powerful theory of quadratic forms [410].

Definition 7.3.2. A quadratic form is an expression of the form $ {\bf u}^{\rm T} A {\bf
u}$ , where $ {\bf u} = (u_1 u_2 \ldots  u_n)^{\rm T}$ is an $ n$ -dimensional column vector of unknowns and $ A$ is an $ n \times n$ matrix. A quadratic form $ {\bf u}^{\rm T} A {\bf
u}$ is said to be

  1. positive definite if $ {\bf u}^{\rm T} A {\bf u} > 0$ for all $ {\bf u} \neq 0$ .

  2. positive semidefinite if $ {\bf u}^{\rm T} A {\bf u} \geq 0$ for all $ {\bf u}$ .

  3. negative definite (semidefinite) if $ -{\bf u}^{\rm T} A {\bf u}$ is positive definite (semidefinite).

  4. indefinite otherwise (i.e. the form takes on both positive and negative values).

To see how these quadratic forms are used, let us take a Taylor expansion of a function of $ n$ variables $ D({\bf
u})$ around the stationary point $ {\bf u} = {\bf u}_0$ :

$\displaystyle D({\bf u}_0 + {\bf h})-D({\bf u}_0) = \frac{1}{2} {\bf h}^{\rm T} H
{\bf h} + O (\vert{\bf h}\vert^3)\;.$     (7.21)

Here $ H $ is the Hessian matrix of $ D$ ; the element of the $ i$ th row, $ j$ th column, is given by
$\displaystyle H_{ij} = \frac{\partial^2 D}{\partial u_i \partial u_j} ({\bf u}_0)\;.$     (7.22)

Now clearly, if the quadratic form $ {\bf h}^{\rm T} H {\bf h}$ is positive definite, then within some neighborhood of the stationary point $ {\bf u}_0$ , the right hand side of (7.21) is nonnegative, and therefore $ {\bf u}_0$ is a local minimum. Similarly, if the quadratic form is negative definite, then $ {\bf u}_0$ is a local maximum.

At this point, we can use a familiar theorem of linear algebra whose proof is given in [410]:

Theorem 7.3.2. Let $ {\bf u}^{\rm T} A {\bf
u}$ be a quadratic form, let $ A$ be a real symmetric matrix, and for $ i = 1, 2, \ldots, n$ , let $ d_i$ be the determinant of the upper left $ i \times i$ submatrix of A (which we will denote $ A_i$ ). Then the quadratic form is positive definite if and only if $ d_i > 0$ for all $ i$ .

If the $ >$ condition in the above theorem is relaxed to $ \geq$ , a corresponding theorem applies to positive semidefinite forms (that is, a form is positive semidefinite if and only if $ d_i \geq 0$ for all $ i$ ).

Theorem 7.3.2 immediately gives us a necessary and sufficient condition for negative definiteness as well. For, if we let $ B \equiv -A$ , the quadratic form $ {\bf u}^{\rm T} A {\bf
u}$ is negative definite if and only if $ {\bf u}^{\rm T} B {\bf u}$ is positive definite if and only if $ \det B_i > 0$ for all $ i$ . But $ \det
B_i = (-1)^i \det A_i$ and so we have the following corollary:

Corollary 7.3.1. Let $ A$ and $ d_i$ be as before. Then the quadratic form $ {\bf u}^{\rm T} A {\bf
u}$ is negative definite if and only if $ (-1)^i d_i > 0$ for all $ i$ .

Simply applying these conditions gives us the following theorem:

Theorem 7.3.3. Let $ D$ be as before, and let $ H $ be its Hessian matrix, evaluated at the stationary point $ {\bf u}_0$ . Let $ d_i$ be the determinant of the upper left $ i \times i$ submatrix $ H_i$ of $ H $ .

  1. If $ d_i > 0$ for all $ i$ , then $ {\bf u}_0$ is a local minimum.

  2. If $ (-1)^i d_i > 0$ for all $ i$ , then $ {\bf u}_0$ is a local maximum.

If, on the other hand, the form $ {\bf h}^{\rm T} H {\bf h}$ is indefinite, we can conclude that $ {\bf u}_0$ is neither a minimum nor a maximum (in two dimensions, such points are usually called saddle points), as the following theorem shows:

Theorem 7.3.4. Let $ H, D, {\bf u}_0,$ and $ d_i$ be as before. Suppose that the $ d_i$ do not satisfy the conditions for definiteness or semidefiniteness. Then $ {\bf u}_0$ is neither a local minimum nor a local maximum.

Proof: Because $ {\bf h}^{\rm T} H {\bf h}$ is indefinite, there exist vectors $ {\bf h}_1$ and $ {\bf h}_2$ such that $ {\bf h}_1^{\rm T}
H {\bf h}_1 < 0$ and $ {\bf h}_2^{\rm T} H {\bf h}_2 > 0$ . Pick an arbitrary neighborhood $ U$ of $ {\bf u}_0$ . Then for any $ \varepsilon >
0$ smaller than some positive number $ \delta$ , $ {\bf u}_0 +
\varepsilon {\bf h}_1 \in U, {\bf u}_0 + \varepsilon {\bf h}_2
\in U, (\varepsilon {\bf h}_1)^{\rm T} H (\varepsilon {\bf h}_1) < 0,$ and $ (\varepsilon {\bf h}_2)^{\rm T} H (\varepsilon {\bf h}_2) > 0$ .

Using (7.21), we obtain

$\displaystyle D({\bf u}_0+\varepsilon {\bf h}_1) - D({\bf u}_0) = \frac{1}{2}
\...
...lon^2 {\bf h}_1^{\rm T} H {\bf h}_1 + O(\varepsilon^3
\vert{\bf h}_1\vert^3)\;,$     (7.23)
$\displaystyle D({\bf u}_0+\varepsilon {\bf h}_2) - D({\bf u}_0) = \frac{1}{2}
\...
...lon^2 {\bf h}_2^{\rm T} H {\bf h}_2 + O(\varepsilon^3
\vert{\bf h}_2\vert^3)\;.$     (7.24)

Clearly, for sufficiently small $ \varepsilon$ , the right hand side of (7.23) will remain negative, while the right hand side of (7.24) will remain positive. Thus we have, for sufficiently small $ \varepsilon$ ,
$\displaystyle D({\bf u}_0+\varepsilon {\bf h}_1) < D({\bf u}_0) < D({\bf
u}_0+\varepsilon {\bf h}_2)\;.$     (7.25)

Since $ U$ was an arbitrary neighborhood of $ {\bf u}_0$ , there is no neighborhood of $ {\bf u}_0$ where $ D({\bf u}_0) \geq D({\bf u})$ for all $ u \in U$ or where $ D({\bf u}_0) \leq D({\bf u})$ for all $ u \in U$ . Therefore $ {\bf u}_0$ is neither a local minimum nor a local maximum.

Unfortunately, trouble can occur if the Hessian is only semidefinite and not definite. In this case we need to look at higher order derivatives or use some other method to classify the point precisely. Although this is not too difficult for one-parameter functions, it becomes progressively more involved as the number of variables increases (see for example, [323,129]).


next up previous contents index
Next: 7.3.2 Nonisolated stationary points Up: 7.3 More about stationary Previous: 7.3 More about stationary   Contents   Index
December 2009