Xi is a vector that contains all the input variables (usually they correspond to experimental variables that the experimentalist controls) that correspond to the measurement yi. For example, in a chemical experiment Xi would typically include the temperature and the initial concentrations of the reagents.
The vector C contains all the parameters (i.e. numbers) in the theoretical model which are known imprecisely, i.e. the numbers in the model that we feel comfortable varying to try to improve the agreement between the data and the model. For example, a theoretical model for a rocket’s trajectory would include a lot of numbers: the gravitational constant, the mass of the rocket, the density and viscosity of air, the wind speed, etc. We probably do not want to vary the gravitational constant – that number is known very precisely from other experiments – but we might feel free to adjust the number we use for the wind speed a little in order to improve the fit between data on the observed trajectory and our computer model. So the wind speed would be one of the adjustable numbers in C, but the gravitational constant would not be.
Most of Chapter 9 is focused on algorithms to compute values for C to get the best possible fit. There are several conceptual issues which are much more important than the algorithms, we will discuss these first.
Because all of the numbers have uncertainties, we don’t expect an exact fit. First we need to develop an estimate of what sorts of uncertainties (also called “error bars”) we have in our data (and in our model). Then we want to compare the discrepancies between the data and the model. If the discrepancies are about the size we expected from the error bars, we say “the model is consistent with the data”. Remember that just because some data are consistent with a model, it doesn’t mean that the model is correct! For example, there are a lot of data consistent with the model that the earth is flat.
On the other hand, if the discrepancies are much larger than the error bars, we may be able to say “the model is inconsistent with the data”. In this case, after some careful consideration and double checks to make sure the data is right, we may be able to conclude “the model is wrong”.
For a practical example, see NMM Figure 9.4. The assumption is that the model ‘f’ for this case is a straight line. There is one point (near x=4.5) that lies quite far from any reasonable straight line through the rest of the data. The crucial question is: how large is the uncertainty in that data point? If the uncertainty could be as large as 0.5 unit in y, we should be happy with the fit, and proclaim “the data is consistent with the model”. However, if all the data points are known to six significant figures we have a serious problem. After a few double checks, we would be justified in saying “the data are inconsistent with a straight line model”. (This may seem crazy, but there are a lot of physical phenomena which lead to wiggles and sharp peaks and dips in data traces, so it is definitely possible that all the data in Fig. 9.4 are correct.) We cannot determine whether or not data are consistent with a model unless we have an estimate of the uncertainties.
The last three paragraphs are critically important for your career as a scientist/engineer, and as an informed citizen. If you don’t understand them, you MUST stop and go back, or talk with one of the course instructors.
where yik is the value of yi measured the kth time we did the experiment, the sum is from k=1:N, and <yi> is the average of all the N repeated measurements of yi (all N measurements made with the same fixed Xi).
Many sources of randomness lead to the familiar “bell curve” Gaussian distribution (yes, same Karl F. Gauss) in the measured values. The amazing “central-limit theorem” shows that even when the underlying source of the randomness is non-Gaussian, the data can be usefully expressed in a way which leads to a Gaussian distribution of the mean. (For details, see any statistics textbook.). As a consequence, we will only concern ourselves with data sets which have a Gaussian (also called “normal”) distribution, where each measurement has about a 2/3 chance of lying within one standard deviation of the mean value, and is unlikely to lie more than a few standard deviations away from the mean.
The more repeat measurements we take, the more confident we are that the average value of these measurements is close to the true mean value of the complete distribution (the average we would obtain if we repeated the measurement a very large number of times). The uncertainty in the mean value <yi> after N repeats is
smean,i = si / sqrt(N)
This uncertainty in the mean smean,i is the number that we will want to compare with the deviation between the model and the average of the repeated meausrements:
| <yi> – f(Xi, C) |
If the random error is the main reason why the model and the data have discrepancies, then we expect that about 2/3 of the time the deviation will be less than smean,i, and 99% of the time the deviation will be less than 3 smean,i. (Note however that one sometimes finds “outliers” in data sets, irreproducible data values which are completely weird. Normally we discard these.)
If the deviations are larger than expected, it could be because of systematic errors in the experiment or approximations made in the model – or the model could just be wrong. If the deviations are much smaller than expected from the data reproducibility, we will be surprised; overfitting is a likely explanation.
N.B. Each time we make a repeat measurement yik, we expect it will differ by roughly s from <yi> (it could be more or less, but something in this ballpark) – this fluctuation comes from a real source of randomness in our experiment, and doesn’t reduce with the number of repeats. But after a few measurements, our average value <yi> will start to ‘settle down’ and stop fluctuating so much. As the number of repeats gets very large we expect that <yi> will become nearly constant near the true mean of the distribution (and smean will become very small).
To minimize a function, you look for places where the derivatives are equal to zero. In our case we want
0 = ¶r/¶cn= -2 S ( <yi> - f(Xi,C) ) / smean,i2) (¶f/¶cn) for all n=1:Nparams
For the specific case where our fitting function is a straight line, both the slope and intercept are adjustable, and all the data points have the same uncertainty, this system of equations is the same as Eq. 9.6 in the NMM text. Equations 9.10 through 9.15 show how to write these equations in a matrix form which can be easily solved using backslash or the Matsolve program you wrote.
f(Xm,C) = S fn(Xm) cn
which is the same as saying
Ytheoretical = F(X,C) = A*C
where Amn = fn(Xm)(see NMM equations 9.25, 9.26). Note that with this form,
¶f/¶cn = fn(Xm) = Amn
See if you can convince yourself that our general equation
0 = -2 S ( <ym> - f(Xm,C) ) / smean,m2) (¶f/¶cn)for all n=1:Nparams
reduces to
0 = B’*Z – (B’*B)*C
i.e.
(B’*B) * C = (B’*Z)(in the familiar form matrix*unknown = vector)
where Bmn =Amn/smean,m and zm = <ym>/smean,m for any function F of this form. Again, if you assume that all of the data points have equal uncertainties, you get Eq. 9.26 as in the text.