Non-linear Least-Squares

We are trying to find a choice of the parameters P that minimizes SSE, i.e. it makes F(P) = 0, where

Fk(P) = d(SSE)/dPk= Sum { 2(Wi*(Yi-f(Xi; P)) (-Wi*fk(Xi;P)) }

and the sum is over i (the index of the data points Xi,Yi with weights Wi). fk(Xi;P)=df/dPk evaluated at X=Xi using parameters P. "SSE" is the weighted sum of squares of the errors, see the Course_Notes on linear least squares.

To solve the nonlinear system of Nparameters equations Fk(P)=0 with Nparameters unknowns Pk, we can use Newton's method (or other similar methods, e.g. Marquardt's method which smoothly switches from a simple downhill search to Newton's method as it approaches the minimum is actually the most popular method these days). For any of these methods, we need to calculate the Jacobian of F(P), which is also called the Hessian of SSE(P):

Jkn = dFk/dPn = d2(SSE)/dPkdPn

Doing the algebra,

Jkn(Xi;P) = Sum { 2 Wi2 [fk(Xi;P)*fn(Xi;P) -  (Yi-f(Xi; P))*fkn(Xi;P) ] }

where fkn = d2f/dPkdPn and the sum is over the data points i. (It turns out that the second term inside the square brackets can usually be neglected, so often people do not bother to compute the second derivative of f. But strictly speaking it should be there.) Note that J is a symmetric matrix for nonlinear least-squares problems.

Tough Non-Linear Least-Squares Problems
Non-Linear Least-Squares Problems are usually much more difficult to solve than Linear Least-Squares Problems, so if there is any way to convert your fitting form to be linear in the parameters P you should try that. If you have trouble converging a non-linear least-squares problem, remember that you are minimizing a function, so you can hold some of the parameters fixed and you will still improve your solution. So fix one or more of the nonlinear parameters, solve the lower-dimensionality problem, and than use those calculated parameters as an improved initial guess for the full problem. Often one can find a reasonable initial guess at P by graphing various trial solutions (e.g. graphing Y vs. X with different choices of P, and graphing SSE vs P). There are specialized least-squares fitting programs available that use Marquardt's method and other algorithms which are more robust than Newton's method.
    In all methods, when you are very close to the solution you are essentially solving a linear least squares problem (since the variation in P will be very small). Even linear least-squares problems frequently fail because the matrix is nearly singular; physically this is because either the fitting form is intrinsically singular or near-singular, or (more commonly) because the data are insufficient to determine all the parameters. Non-linear least-squares can fail for these reasons, or for a variety of other reasons due to the nonlinearity. For more discussion, see Numerical Recipes in C.