We are trying to find a choice of the parameters P that minimizes SSE, i.e. it makes F(P) = 0, where
Fk(P) = d(SSE)/dPk= Sum { 2(Wi*(Yi-f(Xi; P)) (-Wi*fk(Xi;P)) }
and the sum is over i (the index of the data points Xi,Yi with weights Wi). fk(Xi;P)=df/dPk evaluated at X=Xi using parameters P. "SSE" is the weighted sum of squares of the errors, see the Course_Notes on linear least squares.
To solve the nonlinear system of Nparameters equations Fk(P)=0 with Nparameters unknowns Pk, we can use Newton's method (or other similar methods, e.g. Marquardt's method which smoothly switches from a simple downhill search to Newton's method as it approaches the minimum is actually the most popular method these days). For any of these methods, we need to calculate the Jacobian of F(P), which is also called the Hessian of SSE(P):
Jkn = dFk/dPn = d2(SSE)/dPkdPn
Doing the algebra,
Jkn(Xi;P) = Sum { 2 Wi2 [fk(Xi;P)*fn(Xi;P) - (Yi-f(Xi; P))*fkn(Xi;P) ] }
where fkn = d2f/dPkdPn and the sum is over the data points i. (It turns out that the second term inside the square brackets can usually be neglected, so often people do not bother to compute the second derivative of f. But strictly speaking it should be there.) Note that J is a symmetric matrix for nonlinear least-squares problems.
Tough Non-Linear Least-Squares Problems
Non-Linear Least-Squares Problems are usually much more difficult to
solve than Linear Least-Squares Problems, so if there is any way to convert
your fitting form to be linear in the parameters P you should try that.
If you have trouble converging a non-linear least-squares problem, remember
that you are minimizing a function, so you can hold some of the parameters
fixed and you will still improve your solution. So fix one or more of the
nonlinear parameters, solve the lower-dimensionality problem, and than
use those calculated parameters as an improved initial guess for the full
problem. Often one can find a reasonable initial guess at P by graphing
various trial solutions (e.g. graphing Y vs. X with different choices of
P, and graphing SSE vs P). There are specialized least-squares fitting
programs available that use Marquardt's method and other algorithms which
are more robust than Newton's method.
In all methods, when you are very close to the solution
you are essentially solving a linear least squares problem (since the variation
in P will be very small). Even linear least-squares problems frequently
fail because the matrix is nearly singular; physically this is because
either the fitting form is intrinsically singular or near-singular, or
(more commonly) because the data are insufficient to determine all the
parameters. Non-linear least-squares can fail for these reasons, or for
a variety of other reasons due to the nonlinearity. For more discussion,
see Numerical Recipes in C.