In this problem, you will use least-square fitting to examine the original 1662 data demonstrating Boyle's law: the volume v of a gas is inversely proportional to the pressure p (for constant mass and temperature).
In that experiment, described here (UVa), Robert Boyle measured the volume of air trapped in the sealed end of a glass tube as they poured mercury in the other end. The total pressure p on the trapped air is proportional to the height of the mercury, plus the height of mercury balancing atmospheric pressure (as measured by a Torricelli barometer). They found that the volume of the gas varied inversely with v. The experiment is depicted here:
(a) Open Boyle's 1662 manuscript, find the chapter on the "new experiment", and locate the experimental data, which is labelled A Table of the Condensation of Air. Extract the data from the first column "A" and enter it into a Julia array v
below — this is the height of the air column, proportional to its volume. (There is a second column "A" that contains the same data divided by 4.) Extract the data from the column "D" and enter it into a Julia array p
below — this is the height of the mercury column (including the amount corresponding to atmospheric pressure), and is proportional to the total pressure.
(b) Perform a least square fit to the model v=α/p, i.e. find α that minimizes the sum of the squared error ∑k(vk−α/pk)2 for Boyle's data points (pk,vk). (Note: in Julia, you can make a vector of the inverse pressures with 1 ./ p
.) Plot the data and the fit (v vs. p) using the provided code below.
Note: if you find yourself using calculus here or in subsequent parts, you are re-inventing the wheel — we already derived how to minimize the squared error in class, so please re-write it as a matrix least-squares problem and use the 18.06 results.
(c) Perform a least-square fit to a more complicated model, v=αp+v0, for fit parameters α and v0. (That is, suppose that Boyle was slightly wrong, and that there is a minimum volume v0 even for p→∞, similar to van der Waals equation.) What α and v0 do you obtain?
(d) If the gas in Boyle's experiment exactly obeyed Boyle's law, i.e. if the theoretically correct v0 is 0 in part (c), would you expect to get v0=0 from a least-square fit of experimental data? Why or why not? (No detailed math please, just a sentence or two of explanation.)
(e) Alternatively, suppose we don't know the power law in our model: suppose v=αpn for an unknown power n and an unknown coefficient α. This depends nonlinearly on n, so at first glance it may seem that we cannot do a linear least-square fit using 18.06 techniques. However, show that logv depends linearly on two unknown fit parameters in this model, and hence do a least-square fit of logvk to the log of the model to estimate the parameters α and n from Boyle's data.
# part (a):
v = [48,46,44,42,40,38,36,34,32,30,28,26,24,23,22,21,20,19,18,17,16,15,14,13,12] # data from Boyle table, column A
p = [29+2/16,30+9/16,31+15/16,33+8/16,35+5/16,37,39+4/16,41+10/16,44+3/16,47+1/16,50+5/16,54+5/16,58+13/16,61+5/16,64+1/16,67+1/16,70+11/16,74+2/16,77+14/16,82+12/16,87+14/16,93+1/16,100+7/16,107+13/16,117+9/16] # data from Boyle table, column D
25-element Vector{Float64}: 29.125 30.5625 31.9375 33.5 35.3125 37.0 39.25 41.625 44.1875 47.0625 50.3125 54.3125 58.8125 61.3125 64.0625 67.0625 70.6875 74.125 77.875 82.75 87.875 93.0625 100.4375 107.8125 117.5625
(b) Note that the model v=α/p can be written as
(1/p11/p2⋮1/pk)α=(v1v2⋮vk)
By writting A=(1/p11/p2⋮1/pk) and b=(v1v2⋮vk), the least-square solution is α=(ATA)−1ATb. In Julia, A is simply 1 ./ p
and the least-square solution can be obtained using A \ b
.
# part (b)
α = (1 ./ p) \ v # least-square fit
1407.8383939127086
We obtain α=1407.8384.
# part (b) plot
using Plots # you might have to install it first: import Pkg; Pkg.add("Plots")
plot(p, v, seriestype = :scatter, label="data", title = "Problem 1(b): Boyle's Law fit", fmt=:png)
plot!(p, α ./ p, label="fit $(round(α,sigdigits=3))/p")
xlabel!("pressure (mercury height, in)")
ylabel!("gas volume (height, in)")
(c) For the model v=αp+v0, we can rewrite it as
(11/p111/p2⋮⋮11/pk)(v0α)=(v1v2⋮vk)
Again, we can solve for the least-square solution using A \ b
:
x = [ones(25) 1 ./ p] \ v # least-square fit
2-element Vector{Float64}: 0.03915184309358892 1406.0919291526475
We obtain α=1406.0919, v0=0.0392.
(d) No, we would not expect to get v0=0 exactly, because all practical experimental data includes errors (hopefully small!) that lead to errors in the fit coefficients.
More explicitly, from x=A∖v, it’s quite clear that errors in v (or A) will generally cause errors in x, and in fact for an error δv in v there is a corresponding error δx=A∖dv in the fit coefficients x, so that if δv becomes smaller then δx becomes smaller at the same rate. Looking at the plot in part (b), we can see by eye that the errors δv are rather small (the α/p fit almost goes through the data points), so it is not surprising that in part (c) we get a v0 that is quite small (≈ 300x smaller than the measured v values), and which by inspection seems smaller than the experimental errors.
In a statistics class, we would then take the next step of computing the average magnitude of the error δx (e.g. the “covariance”, the average δxδxT) from the average size of the errors δv in the inputs (the average δvδvT). We could even rigorously compute the probability that v0=0 is consistent with this data. But that goes beyond the scope of 18.06.
(e) For the model v=αpn, we can take log on both sides to get logv=log(αpn)=logα+log(pn)=logα+nlogp
This shows that logv depends linearly on two unknown fit parameters logα, n.
We can rewrite the above equation as
(1logp11logp2⋮⋮1logpk)(logαn)=(logv1logv2⋮logvk)
Again, we can solve for the least-square solution using A \ b
:
x = [ones(25) log.(p)] \ log.(v)
2-element Vector{Float64}: 7.255502686548169 -1.0013839808494183
We obtain logα=7.2555 and n=−1.0014. Therefore, we have α=1415.8746, n=−1.0014.
Similar to part (d), we can see that n is not exactly −1 due to experimental errors.
Let q be some unit vector (i.e. ‖q‖=1). Define the "reflector" matrix: F=I−2qqT
(a) Show that F is unitary.
(b) Why is this a "reflector"? Suppose that we are in 2 dimensions (i.e. q∈R2). Draw a picture showing q pointing in some arbitrary direction, and sketch x and Fx for three cases: when x is parallel to q, perpendicular to q (and nonzero), or in some other direction (neither parallel nor perpendicular to q).
(a) Note that FT=(I−2qqT)T=IT−2(qqT)T=I−2(qT)TqT=I−2qqT=F
Therefore, FTF=(I−2qqT)(I−2qqT)=I−I(2qqT)−(2qqT)I+(2qqT)(2qqT)=I−4qqT+4qqTqqT
Now, since q is a unit vector, we have qTq=‖q‖2=1. So FTF=I−4qqT+4q(qTq)qT=I−4qqT+4qqT=I
Therefore, F is unitary.
(b) F is a "reflector" because it reflects any point across the hyperplane orthogonal to q.
To see this, recall that qqTqTq=qqT is the projection matrix onto the line spanned by the unit vector q, and so P=(I−qqT) is the projection matrix onto the orthogonal complement of the line, i.e. the hyperplane orthogonal to q.
In other words, for any point x, we have x=Px+e where e=x−Px is a vector at the point Px pointing towards the point x. Now, we consider Px−e, which can be viewed geometrically as the reflection of x across the hyperplane. We have Px−e=Px−(x−Px)=2Px−x=2(I−qqT)x−x=(I−2qqT)x=Fx This shows that F is a "reflector" that reflects any point across the hyperplane orthogonal to q. In fact, it is called a Householder reflector.
An example is as follows:
using LaTeXStrings
import PyPlot as plt
q = [3/5, 4/5] # an arbitrary unit vector
F = [1 0; 0 1] - 2*q*q' # the reflector matrix
x1 = [6, 8] # parallel to q
x2 = [4, -3] # perpendicular to q
x3 = [1, 5] # some other direction
Fx1 = F*x1
Fx2 = F*x2
Fx3 = F*x3
# the hyperplane orthogonal to q
plt.plot([-2*x2[1],2*x2[1]], [-2*x2[2],2*x2[2]],"k--",zorder=-10)
plt.text(-2*x2[1], -2*x2[2]-1.3, L"\mathrm{plane} \perp q", rotation=rad2deg(atan(x2[2], x2[1])))
# the original points
plt.arrow(0, 0, x1[1], x1[2], width=0.1, color="y")
plt.text(x1[1]+0.3, x1[2], L"x_1", color="y")
plt.arrow(0, 0, x2[1], x2[2], width=0.1, color="g")
plt.text(x2[1]+0.3, x2[2], L"x_2 = Fx_2", color="g")
plt.arrow(0, 0, x3[1], x3[2], width=0.1, color="b")
plt.text(x3[1]+0.3, x3[2], L"x_3", color="b")
# q
plt.arrow(0, 0, q[1], q[2], width = 0.15, color ="r")
plt.text(q[1]+0.5, q[2], L"q", color="r")
# the resulting points
plt.arrow(0, 0, Fx1[1], Fx1[2], color="y", head_width=0.4, linestyle=":", fill=false)
plt.text(Fx1[1]+0.5, Fx1[2], L"Fx_1", color="y")
plt.arrow(0, 0, Fx3[1], Fx3[2], color="b", head_width=0.4, linestyle=":", fill=false)
plt.text(Fx3[1]-0.4, Fx3[2]-0.8, L"Fx_3", color="b")
plt.axis("equal")
(-8.8, 8.8, -9.322000000000003, 9.202)
From Strang, section 4.3, problem 27 (distance between lines):
The points p(α)=(ααα) and q(β)=(β3β−1), as a function of the scalars α,β, form two lines in space that never meet.
(a) Choose α and β to minimize the squared distance ‖p−q‖2.
(b) The line connecting the closest points p(ˆα) and q(ˆβ) is perpendicular to ________________.
(a) We have p(α)=α(111) and q(β)=β(130)+(00−1), so the vector difference is p(α)−q(β)=α(111)−β(130)+(001)=(1−11−310)(αβ)+(001)=Ax−b where A=(1−11−310), b=(00−1). So, finding α,β minimizing the length of the vector p(α)−q(β) is the same as solving this least squares problem of minimizing ‖Ax−b‖2 over x=(αβ). The normal equations are ATAˆx=ATb. Plugging in A and b as above, this is the matrix equation (3−4−410)(ˆαˆβ)=(−10) which has solution ˆx=(ˆαˆβ)=(−57−27).
(b) The line connecting the closest points p(ˆα) and q(ˆβ) is perpendicular to the column space C(A), which means that it is perpendicular to both lines.
To see this, note that Ax−b is exactly p−q, and from class we showed that the least-square solution occurs when Aˆx is the orthogonal projection of b onto C(A), or equivalently Aˆx−b is orthogonal to C(A). That means, here, that the line connecting p(ˆα) to q(ˆβ) (parallel to p(ˆα)−q(ˆβ)) is orthogonal to C(A), i.e. it is orthogonal to (111) and (−1−30).
Let's double-check the answer in Julia:
[ 1 -1; 1 -3; 1 0 ] \ [0,0,-1]
2-element Vector{Float64}: -0.7142857142857146 -0.2857142857142858
Yup, this matches our answer from above:
[-5/7, -2/7]
2-element Vector{Float64}: -0.7142857142857143 -0.2857142857142857
(From Strang, section 4.4, problem 18.)
(a) Find orthonormal vectors q1,q2,q3 by Gram-Schmidt from a1,a2,a3 given by: a1=(1−100),a2=(01−10),a3=(001−1), which are a basis for the vectors perpendicular to d=(1111), i.e. a basis for the nullspace of ______________.
(b) If you form the 4×3 matrix Q=(q1q2q3), then what is QTd? (No arithmetic is required — think about it!)
(a) To simplify our calculation, we will first find an orthogonal basis {v1,v2,v3} using Gram-Schmidt without the normalizing.
For the first vector, we can simply take v1=a1=(1−100).
To find v2, we need to subtract the projection of a2 onto a1. So, v2=a2−v1vT1a2vT1v1=(1/21/2−10).
We need to do the same for a3, and we compute v3=a3−v1vT1a3vT1v1−v2vT2a3vT2v2=(1/31/31/3−1).
Once we have found v1,v2,v3, we can obtain orthonormal vecotrs q1,q2,q3 by normalizing each of them:
q1=v1‖v1‖=1√2(1−100), q2=v2‖v2‖=1√6(11−20), q3=v3‖v3‖=12√3(111−3)
By inspection, we can see they are each orthonormal to the rest and also to d, i.e. a basis for the nullspace of (1111).
(b) The matrix QTd will be the column vector recording the dot products of q1,q2,q3 with d, and you were told that d is orthogonal to a1,a2,a3 and hence it must be orthogonal to q1,q2,q3 as well (since they are in span{a1,a2,a3})! So, without any calculation, you should be able to see that QTd=0.
As a check, let's do the calculation in Julia:
q1 = 1/sqrt(2)*[1,-1,0,0]
q2 = 1/sqrt(6)*[1,1,-2,0]
q3 = 1/(2*sqrt(3))*[1,1,1,-3]
Q = [q1 q2 q3]
Q' * [1,1,1,1]
3-element Vector{Float64}: 0.0 0.0 -1.1102230246251565e-16
This is zero (up to roundoff errors), as expected.
We can also check that QTQ=I (up to roundoff errors):
Q'Q
3×3 Matrix{Float64}: 1.0 2.45143e-17 -8.8981e-18 2.45143e-17 1.0 -1.10972e-17 -8.8981e-18 -1.10972e-17 1.0
using LinearAlgebra # for I
Q'Q ≈ I
true
We could also use Julia's qr
function to find the Q.
qr(A)
computes a "QR factorization" object that stores both the Q and the R in the A=QR factorization. We can extract just the Q factor with qr(A).Q, but it turn out that this is the "full" QR factorization where Q is a squaare matrix consisting of an orthnormal basis for C(A) followed by an orthonormal basis for C(A)⊥=N(AT). To get just the "thin" QR factor consisting of the basis for C(A), the qr
documentation tells us to do Matrix(qr(A).Q
:
A = [[1,-1,0,0] [1,1,-2,0] [1,1,1,-3]] # [a1 a2 a3]
4×3 Matrix{Int64}: 1 1 1 -1 1 1 0 -2 1 0 0 -3
Matrix(qr(A).Q)
4×3 Matrix{Float64}: -0.707107 -0.408248 -0.288675 0.707107 -0.408248 -0.288675 0.0 0.816497 -0.288675 0.0 0.0 0.866025
Interestingly, this is not quite the same as what we got from Gram–Schmidt by hand calculation:
Q
4×3 Matrix{Float64}: 0.707107 0.408248 0.288675 -0.707107 0.408248 0.288675 0.0 -0.816497 0.288675 0.0 0.0 -0.866025
The difference is only in the signs of the second and third columns, which is okay: the QR factorization is only unique up to ± signs for real matrices, since you can always flip the sign of a q without changing orthonormality.
In fact, Julia's qr
function is using a totally different algorithm than Gram–Schmidt, called a "Householder QR" algorithm.