10.10 Introduction to Chemical Engineering

NMM Chapter 8: Solving Systems of Equations

Overview

Most of Chapter 8 is devoted to solving a single type of problem:

A*x = b

where A is a known mxn matrix, b is a known column vector with m entries, and x is the unknown column vector (with n entries). This is a compact way of writing down m linear equations in n unknowns. This is the most important type of equation in Chemical Engineering (and in most other engineering disciplines), so it is discussed in a lot of detail (and you will want to know most of the details).

In section 8.1 the necessary conditions on A and b for a solution to exist (and to be unique) are explained. In section 8.2 Gauss’s algorithm for solving this type of problem is detailed. Almost all important numerical methods are based on solving this problem, usually using variants of Gauss’s original algorithm, and solving Ax=b consumes a very large fraction of the world’s supercomputer cycles.

There are several ways one can run into technical problems. For example, the computed solution x_computed may not be as accurate (i.e. not as close to the true solution x_true) as you want it to be, and it may not be possible to do better. Sometimes the CPU time required to obtain a solution may be unacceptable, or (for problems with a large number of unknowns) the algorithm may require too much memory. All of these issues are discussed in section 8.3.

Modern matrix algorithms are oriented towards first factorizing the matrix into simpler forms, for example finding and lower triangular matrix L and an upper triangular matrix U such that L*U = A (L and U are called the factors of A). As discussed in Section 8.4, it is usually much easier to solve L*U*x = b numerically than it is to solve A*x=b.

Finally, in Section 8.5 a method for solving systems of nonlinear equations (including transcendental equations) by iteratively solving linear systems which approximate the nonlinear system is presented. (This is the famous algorithm developed by Sir Isaac Newton.) In 10.10 we will postpone discussing this until later in the semester.

Getting the System of Equations Into Standard Form

Some important steps in solving a problem are to:

Write out the relevant equations.
List the unknowns.
Recognize what type of equations you have to solve.
Rewrite the equations in the appropriate standard form.
Use the appropriate computer algorithm for solving that standard form.

In the case of root-finding/bisection, the standard form was f(x)=0, so we would rewrite problems like A exp(-Ea/RT) = h(T-To) + ?T⁴ in the standard form f(T)= A exp(-Ea/RT) - h(T-To) - ?T⁴ = 0 and we supplied the solver (bisect.m) with a function that returned the value of f(T). For systems of linear equations, the standard form is A*x = b, and the solvers will expect you to supply A and b as the inputs. So for example, if you wanted to use the computer to solve the system 2z = 3-3y14y – 20z + 7 = 0 we need to rewrite it, first as 2z + 3y = 3-20z +14y = -7 and then rewrite it again as Ax=b where

We would then feed A and b as the inputs to our numerical equation solver, and the solver would return as output a vector

where the numbers z_computed and y_computed are close to the z_true and y_true that exactly solve your system of equations.

Note that you could have written your equations in the opposite order without changing the answer. So you can freely permute the rows of A (as long as you simultaneously permute the numbers in b) without changing anything. You can also permute the columns of A (leaving b alone), and you will still get the correct numbers z_computed and y_computed back from the solver – only this time the first number in x_computed will be y_computed, and the second one will be z_computed. Do you see why?

For a more interesting and complicated example of going from a word problem to the correct standard form, see NMM Example 8.2.

Necessary Requirements for Ax=b to Have a Solution (and for it to be unique)

In Section 8.1.2, the requirements that A and b must satisfy for one (and only one) x to exist that satisfies Ax=b are laid out. These are derived from the subspace concept of Linear Algebra discussed in Chapter 7. You are all familiar with the idea that you must have as many equations as unknowns; this is requirement for a unique solution. In addition all of the equations must be linearly independent (equations you make just by adding, subtracting and scaling the equations you already have do not count). The final requirement is that the equations must be consistent, i.e. you could not solve this inconsistent set of equations: x + y = 0x + y = 1 The equations can only be consistent if the vector b must lie in the column space of A (see Chapter 7). This means that you could append b as the n+1th column of A, and by doing so you would not increase the dimension of the column space of A. This can be tested easily by the Matlab command: rank([A b]) == rank(A) which returns zero (=FALSE) if the system of equations is inconsistent.

For the important case where A is an NxN square matrix (number of equations = number of unknowns), and where all the columns of A are linearly independent (i.e. rank(A) is N), the columns of A span the full space R^N, and you will get a consistent set of equations with any b. In this case you can be sure that the solution x will exist and be unique.

An Algorithm for Solving Any System of Linear Equations Ax=b: Gaussian Elimination

The general algorithm for solving any system of equations is pretty complicated, so to start with we’ll consider only special types of A. First, if we are lucky and A is diagonal then all the unknowns are uncoupled, and the solution is just x_i = b_i/A_ii. See NMM 8.2.1.

Solving Triangular Systems of Equations (NMM 8.2.2)

It is also pretty easy to handle the case where A is a triangular square matrix (i.e. all the entries on one side of the diagonal are zero). The idea is that one row (either the first or the last) of A is all zeroes except for the diagonal, so we can solve for that unknown easily, just as we did for the diagonal matrix. Once we know the value of that unknown, we can plug it in to the other rows (i.e. the other equations) and so reduce the dimension of our problem. In vector form, we write it this way:

Ax=b is equivalent to:

x₁A₁ + x₂A₂ + x₃A₃ + ... + x_NA_N = b

where A_i means the ith column of A. This is also the same as

x₁A_m,1 + x₂A_m,2 + x₃A_m,3 + ... + x_NA_m,N = b_m for m=1,2,3,...,N

Note that all the unknowns (the numbers x_i) are on the left hand side.

Suppose A is upper triangular, i.e. all A_m,j=0 if m>j. In particular, when m=N, only A_N,N is non-zero. So for m=N there is only one nonzero term on the left hand side, and we can solve easily for x_N:

x_N = b_N/A_N,N

Now we know the numerical value of x_N, the last term in the sum is known, so we can move the last term to the right hand side:

x₁A₁ + x₂A₂ + x₃A₃ + ... + x_N-1A_N-1 = b - x_NA_N= b^N-1

we’ll call the modified vector on the right hand side (still a list of numbers, no unknowns) “b^N-1”, since we know its Nth entry is zero. Then we are back to almost the same situation as before:

x₁A_m,1 + x₂A_m,2 + x₃A_m,3 + ... + x_N-1A_m,N-1 = b^N-1_m for m=1,2,3,...,N-1

(m=N is not interesting anymore, since all of the terms on both sides are zero). We notice that for m=N-1, only the last term on the left hand side is nonzero. So we can solve easily for x_N-1:

x_N-1 = b^N-1_N-1/A_N-1,N-1

Now we know the last term, so we can move it to the right hand side, and define

b^N-1= b - x_N-1A_N-1

If we keep iterating in this way, we will gradually peel away at the terms on the left hand side until we know all the values in x.

A concise general algorithm implementing this procedure is given in section 8.2.2, for either upper or lower triangular matrices.

Square Matrices and Gaussian Elimination

The strategy for solving systems of equations where A is a square matrix is to first change them into triangular form, and then solve the triangular matrices using the procedure explained above. So what we are trying to do is to make some of the entries in A to be zero, i.e. to “eliminate” them, hence the name “Gaussian Elimination”. The basic idea is to multiply one of the equations by a scaling factor, so that when you add it to one of the other equations, one of the unknowns does not appear in the resulting equation. Many of you have seen this step before, possibly you called it “row reduction”.

For example, if you had these two equations:

2z + 3y = 3
-20z +14y = -7

and you wanted to replace the second equation with an equation that did not involve z, you could multiply all the terms in the first equation by 10 and add the result:

20z + 30y = 30

to the second equation. Now your systems of equations is:

2z + 3y = 3
44y = 23

or in matrix form:

Now A is upper triangular!

Gauss proposed a systematic procedure for doing this so that the right unknowns will be eliminated from the right equations, so you can be sure that you will have a triangular matrix. The details are given in Section 8.2.3. This is important, you should read this carefully, and try it out by hand on a few small matrices to be sure you understand it.

Sometimes you have bad luck, and one of the numbers you need to divide by in these algorithms accidentally equals zero. More commonly, you will run into situations where you are called to divide by a very small number; this will not cause a disaster, but will exacerbate numerical round-off errors. To avoid this, methods “with pivoting” automatically permute the equations whenever necessary to avoid dividing by a small number. For details of how this is implemented, see 8.2.4. In practice, most people use the version called “partial pivoting” in the text.

In Matlab, to solve Ax=b, you just type

>> x=A\b

This simple command does the whole procedure (reduction to triangular form with partial pivoting, followed by solution of the triangular system of equations), see NMM 8.2.5

Accuracy of the Solution: Poorly-Conditioned Equations

The numbers we give as the inputs A and b usually have some uncertainty in them (e.g. they might have come from an experimental measurement) and they certainly will contain some round-off errors (because the computer only represents a finite number of significant digits). We want to know how these (presumably small) errors in the inputs propagate to errors in the output x_computed, hopefully the errors in x_computed will also be small. In NMM Section 8.3.2, equations are presented that relate the error in x_computed to the errors in A and b. Note that these equations are only useful when the vectors x and b are both ‘well-scaled’, i.e. when all the entries in the vector are of comparable magnitudes; you can usually achieve this by adjusting the units of the components of x and b appropriately. If one of the entries is much smaller than the others, than its variation will not affect the norm of the vector much, and the equations in 8.3.2 will only tell you limits on the component(s) that dominate the norms.

The equations in 8.3.2 show that a property of A, called the “condition number”, sets an upper bound on how much the errors in A and b could get amplified in computing x. In Matlab, cond(A) returns the condition number. If the condition number is large, we could be in big trouble. A horrendous case is shown in Example 8.9 where the condition number is 10¹⁴ and x is really not determined at all. More typically the condition numbers are maybe 10³, and b might be known to 5 significant figures, with the result that x_computed will be accurate to about 2 significant figures. (As a rule of thumb, for well-scaled x and b, see NMM 8.3.5, you lose about log₁₀(cond(A)) significant figures in the arithmetic needed to compute x from A and b).

The key point is that A being poorly-conditioned (i.e. having a high condition number) is an intrinsic property of the system of equations we are using, it has nothing to do with the method we use to solve them (i.e. there is no way to get around it). If you get a super high condition number, it often means that you made a mistake in writing down the equations, e.g. you accidentally repeated an equation, or one of your equations is a linear combination of the others. If your equations come from experiments, and the condition number is large, it may indicate the experiment is poorly designed.

An example of poor conditioning from a poorly designed experiment

Your boss gives you a very nice balance (reports weights to 5 significant figures), and asks you to tell him how many milligrams of a chemical are in a glass jar; he guesses it is something like 10 mg. You decide to make the measurement by first measuring the mass of the jar+chemical (data1), and then to dump the chemical out and measure the mass of the jar (data2). By hefting it, you guess the jar will weigh about 10 grams Mjar, and the balance returns weights in grams also. So you write these mass conservation equations:

1*Mjar + 1000*mg_chemical = data1
1*Mjar + 0*mg_chemical = data2

you make the measurements, and find data1=9.0011 g and data2=9.0000 g
i.e.

Note that x and b are both well-scaled (i.e. their entries have comparable magnitudes). Your boss wants to know the second component of x. You solve the system of equations by

>> x=A\b

and find x(2)=mg_chemical=11.000. Now you want to know how many significant figures you should tell your boss. You know that the numbers in A are perfectly accurate (since you believe mass is conserved, and there are exactly 1000 mg in a gram.) However, the numbers in b are only known to 5 significant figures (that’s how good your balance is.) Now you can compute the approximate number of significant figures in x:

>> sigfigs = 5 - log10(cond(A))

or you could use the more detailed equations given in the textbook, to get essentially the same result. What you find is that you can only give your boss 2 significant figures, even though he gave you a 5-significant-figure instrument. The poor experimental design in this example is using such a heavy container: one would do much better by pouring the chemical into a very light container with a weight comparable to or lighter than the weight of the chemical. The way we are running the present experiment forces us to subtract two larger numbers, a process which always kills off significant figures.

If A*x_computed?b, does that guarantee that x_computed is close to x_true?

The surprising answer, in NMM 8.3.4, is “NO!”. If cond(A) is large, all bets are off – in that case there are many different x’s which approximately solve the equation Ax=b. In some cases this is acceptable: you may not really care what x you use, as long as it approximately solves the equation. But sometimes, like the experimental example above, you really care what x_true is; then poor conditioning is a disaster.

NMM 8.3.6 discusses how Matlab computes cond(A); if you are not curious you should skip this section.

LU Factorization: Efficiently Solving Similar Systems of Equations

The Gaussian elimination program works by converting a general square system of equations Ax=b into the easy-to-solve triangular problem Ux=y, where U is upper-triangular. In this procedure, the computer performs several operations on A which convert it to U, and very similar operations on b to convert it to y. The operations in this algorithm depend on the values in A, but they do not depend at all on the values in b, in fact we obtain exactly the same U for any b. So, if we were solving a bunch of similar problems:

    Ax = b
    Aw = c
    Av = d

and if A were a big matrix (say 1000x1000) we could save a lot of time by only computing U once, and by storing the operations that we are supposed to perform on the right hand side.

What Gaussian elimination is really doing is factoring A into two triangular matrices:

A = L*U where L is a lower triangular matrix. It is not hard to modify the algorithm so that you save L (as well as U). Once you have L and U, it is easy to solve the systems of equations:

Solve Ax=b by first solving

Ly=b for y

then solve

Ux=y for x.

For Aw=c, we just solve

Lq=c for q

And then solve

Uw = q for w.

At each stage you only need to solve a triangular system of equations, which is many times easier than solving a square system of equations. The details of the factorization procedure are given in NMM 8.4.; you don’t need to know all the details, but you do need to get the general idea.

In reality, we will want to use partial pivoting to avoid dividing by zero or small numbers. So we will be permuting the rows of A and of b, and then finding the LU factors for the permuted A. In matrix language, we want to find P,L, and U given an A, where P is the correct permutation matrix (to avoid division by small numbers), and L and U are lower and upper triangular matrices:

P*A = L*U

Note that this L and U are different from what you would get without doing the row permutation, but they are the right ones for solving

P*A*x = P*b

Matlab provides a convenient LU factorization routine with partial pivoting:

>> [L,U,P] = lu(A)

You could use this to efficiently solve 2 similar systems of equations sequentially:

>> y = L\(P*b);
>> x = U\y;
>> q = L\(P*c);
>> w = U\q;

Matlab is smart enough to recognize that L and U are already triangular, so it will solve these equations very efficiently.

If you knew both b and c from the start, and you didn’t plan to ever use L and U again, you could do this a different way: catenate b and c into a single matrix, and solve both problems simultaneously. In Matlab you would do it this way:

>> B = [b c];
>> X = A \ B;
>> x=X(:,1);
>> y=X(:,2);

Either way works fine, for about the same CPU time, but the first method has the advantage that you store L, U, and P for future use. Matlab is so smart that you actually do not need to save P (it will do the permutations automatically), for details see the discussion of lu.m towards the end of NMM 8.4.1, and the summary of what the backslash operator does in 8.4.3

NMM 8.5 covers Nonlinear Systems of Equations, which is really a very different topic; we will come back to this later in the semester.

Next Chapter: Least-Squares Fitting
Back to Study Guide Index
Back to 10.10 home page
last modified: October 16, 2002