Non-Linear Least-Squares

Non-linear Least-Squares

We are trying to find a choice of the parameters P that minimizes SSE, i.e. it makes F(P) = 0, where

F_k(P) = d(SSE)/dP_k= Sum { 2(W_i*(Y_i-f(X_i; P)) (-W_i*f_k(X_i;P)) }

and the sum is over i (the index of the data points X_i,Y_i with weights W_i). f_k(X_i;P)=df/dP_k evaluated at X=X_i using parameters P. "SSE" is the weighted sum of squares of the errors, see the Course_Notes on linear least squares.

To solve the nonlinear system of N_parameters equations F_k(P)=0 with N_parameters unknowns P_k, we can use Newton's method (or other similar methods, e.g. Marquardt's method which smoothly switches from a simple downhill search to Newton's method as it approaches the minimum is actually the most popular method these days). For any of these methods, we need to calculate the Jacobian of F(P), which is also called the Hessian of SSE(P):

J_kn = dF_k/dP_n = d²(SSE)/dP_kdP_n

Doing the algebra,

J_kn(X_i;P) = Sum { 2 W_i² [f_k(X_i;P)*f_n(X_i;P) - (Y_i-f(X_i; P))*f_kn(X_i;P) ] }

where f_kn = d²f/dP_kdP_nand the sum is over the data points i. (It turns out that the second term inside the square brackets can usually be neglected, so often people do not bother to compute the second derivative of f. But strictly speaking it should be there.) Note that J is a symmetric matrix for nonlinear least-squares problems.

Tough Non-Linear Least-Squares Problems
Non-Linear Least-Squares Problems are usually much more difficult to solve than Linear Least-Squares Problems, so if there is any way to convert your fitting form to be linear in the parameters P you should try that. If you have trouble converging a non-linear least-squares problem, remember that you are minimizing a function, so you can hold some of the parameters fixed and you will still improve your solution. So fix one or more of the nonlinear parameters, solve the lower-dimensionality problem, and than use those calculated parameters as an improved initial guess for the full problem. Often one can find a reasonable initial guess at P by graphing various trial solutions (e.g. graphing Y vs. X with different choices of P, and graphing SSE vs P). There are specialized least-squares fitting programs available that use Marquardt's method and other algorithms which are more robust than Newton's method.
In all methods, when you are very close to the solution you are essentially solving a linear least squares problem (since the variation in P will be very small). Even linear least-squares problems frequently fail because the matrix is nearly singular; physically this is because either the fitting form is intrinsically singular or near-singular, or (more commonly) because the data are insufficient to determine all the parameters. Non-linear least-squares can fail for these reasons, or for a variety of other reasons due to the nonlinearity. For more discussion, see Numerical Recipes in C.