Matrix/Linear Algebra Refresher Terms defined: ------------------------------ matrix matrix size matrix element vector column vector row vector square matrix diagonal main diagonal off diagonal upper triangle lower triangle diagonal matrix identity matrix toeplitz matrix triangular matrix trace ones matrix zeros matrix transpose symmetric matrix matrix addition/subtraction matrix multiplication inner product dot product outer product scalar matrix inverse determinant singular ill-conditioned simultaneous equations overdetermined row/column space null space singular value decomposition (SVD/PCA) ------------------------------ A matrix is a table of numbers. The numbers can be anything (pos, neg, zero, integer, fractions, pi, real, imaginary, complex, etc). A matrix is characterized by its size, ie, the number of rows and columns as "n-by-m", where the first number is the number of rows and the second is the number of columns. Eg, "3-by-4" or "3-x-4" means a matrix with 3 rows and 4 columns. Brackets are usually drawn around the matrix. M = [a11 a12 a13] [a21 a22 a23] Each number in the matrix is called an element. First subscript is the row, second is the col. A vector is a special matrix in which the number of cols or rows is 1. A column vector is a vector in which the number of cols=1. A row vector is a vector in which the number of rows=1. A matrix can be thought of as being composed of a collection of row vectors or column vectors. A matrix is square if nrows=ncols. Diagonal: elements where row=col+constant. Main diagonal, constant = 0. Off diagonal, constant != 0. Triangular matrix: all elements either above or below the main diagonal are zero. Upper triangle: all elements above the main diagonal are non-zero Lower triangle: all elements below the main diagonal are non-zero. Diagonal matrix: all elements are 0 except main diagonal. Identity matrix: diagonal matrix with all 1s on main diagonal and zeros everywhere else Toeplitz matrix: elements within a diag have same value throughout matrix. Trace: sum of the main diagonal components of a square matrix. Ones matrix. All elements are 1. Can be any size. Zero matrix. All elements are 0. Can be any size. Transpose. Reflect around diagonal. Swap rows and cols. Use "T", "t", or "'". Changes an n-by-m to m-by-n. Symmetric matrix: M' = M (symmetric about diagonal) Add/Subtract. Just add/subtract the matching elements. For this reason, sizes must be the same. Multiplication. Number of cols in the first must match the number of rows in the second. Multiplication: Inner product (dot product) Vectors: Multiply matching elements and sum: [1 2] * [3 4]' = 1*3 + 2*4 = [11] (a 1x1 matrix) [1 2 3] * [7 9 11]' = 1*7 + 2*9 + 3*11 = [58] (a 1x1 matrix) Matrices: reduce to a series of vector multiplications [1 2 3] [ 7 8] [4 5 6] [ 9 10] = [ 58 64] (a 2x2 matrix) [11 12] [139 154] The final matrix size will be equal to the number of rows of the first matrix by the number of columns in the second matrix. Multiplication: Outer product: [1] * [3 4] = [1*3 1*4] = [3 4] (a 2x2 matrix) [2] [2*3 2*4] [6 8] Scalar: single number without a size designation. Different than a 1x1 matrix. No brackets. Multiplication of a matrix by a scalar. Multiply all elements of the matrix by scalar value. .5 * [1 2] = [0.5 1.0] [3 4] [1.5 2.0] You could not do this if 0.5 was a 1x1 matrix ([0.5]) because dimensions would not match. Put it all together: alpha = .5 % scalar x = [1 -1]'; y = [3 4]'; M = [2 5; 3 6]; x*y' + alpha*M = [ 4.0 6.5] [-1.5 -1.0] Inverse: M*inv(M)=I. M must be square Other conditions on M too. Hard to do (not something you can eyeball) M = [1 2] [3 4] inv(M) = [-2.0 1.0] [ 1.5 -0.5] Determinant of a 2x2 M = [a11 a12] [a21 a22] delta = a11*a22 - a21*a12 = -2 Inverse of a 1x1: inv([a11]) = [1/a11] Inverse of a 2x2: inv(M) = (1/delta) * [ a22 -a12] [-a21 a11] But, what if: M = [1.0 2.0] [0.5 1.0] delta = 1*1 - 0.5*2 = 0 Cannot compute 1/delta, so cannot compute inverse! Matrix is "singular". Note: you'd never actually do these computations by hand. You'd use Matlab (or some other software package). A dumb way to compute the sum of a list of numbers: list: 1, 2 ,3. x = [1 2 3]'; a = [1 1 1] sum = x'*a = 1*1 + 2*1 + 3*1 = 5 A dumb way to compute the length of the list N = a'*a = 1*1 + 1*1 + 1*1 = 3 A dumb way to compute the average mu = x'*a * inv(a'*a) = sum/N Sum of the squares of a list: list: 1, 2 ,3. x = [1 2 3]'; SS = x'*x; A dumb way to compute the variance e = x - mu*a; var = e'e/(a'a - 1) = SSE/(N-1) Selective average: list: 1, 2 ,3, 4, 5. But only want to compute the average for 1, 3, and 5. x = [1 2 3 4 5]'; a = [1 0 1 0 1]'; sum = x'*a = 1*1 + 2*0 + 3*1 + 4*0 + 5*1 = 9 N = a'*a = 3 average = x'*a * inv(a'*a) = 3 Correlation of two lists of numbers: x = 1, 2 ,3 y = 7, 9, 11 sum(x_i * y_i) = 58 x = [1 2 3]' y = [7 9 11]' x'*y Correlation coefficient: rho = sum(x_i * y_i)/sqrt(sum(x_i^2) * sum(y_i^2)) = x'*y/sqrt((x'*x)*(y'*y)); Algebra: y = x*b, solve for b in terms of y and x: b = y/x = inv(x)*y Simultaneous equations: y1 = x11*b1 + x12*b2 y2 = x21*b1 + x22*b2 Solve for b1 and b2 in terms of y1, y2, and xii Two equations, two unknowns. Use one equation to solve for b1 in terms of xii, y, and b2, then substitute into the other equation to solve for b2. Linear algebra: a way to represent simultaneous equations: y = [y1 y1]' b = [b1 b2]' X = [x11 x12] [x21 x22] y = X*b, solve for b in terms of y and X: b = inv(X)*y Fitting a line (slope and offset) y = b + mx If we have two measures of y (y1 and y2) that correspond to two values of x (x1 and x2), then y1 = b + m*x1 y2 = b + m*x2 Two equations, two unknowns (m and b) y = [y1 y2]' X = [1 x1] [1 x2] beta = [b m]' y = X*beta beta = inv(X)*y Will this always work? Hint: determinant. Will this always work well? Overdetermined: more equations than unknowns y1 = b + m*x1 y2 = b + m*x2 y3 = b + m*x3 Three equations, two unknowns (m and b) y = [y1 y2 y3]' X = [1 x1] [1 x2] [1 x3] beta = [b m]' y = X*beta Cannot compute inv(X) because X is not square. betahat = inv(X'*X)*X'*y = (X+)*y (pseudo inverse) Will this always work? Hint: determinant. Will this always work well? "Best fit" Ill-conditioned vs singular Fitting a plane (offset and slopes) z = b + m1*x + m2*y Three equations, three unknowns: z1 = b + m1*x1 + m2*y1 z2 = b + m1*x2 + m2*y2 z3 = b + m1*x3 + m2*y3 z = [z1 z2 z3]' X = [1 x1 y1] [1 x1 y2] [1 x1 y3] beta = [b m1 m2]' Z = X * beta beta = inv(X)*Z Geometric interpretation of vectors: Spaces: Row space of a matrix - all the vectors that can be represented by a weighted sum of all row vectors of a matrix. Col space of a matrix - all the vectors that can be represented by a weighted sum of the column vectors of a matrix. Row null space of a matrix - all the vectors that CANNOT be represented by a weighted sum of all row vectors of a matrix. Col null space of a matrix - all the vectors that CANNOT be represented by a weighted sum of all column vectors of a matrix. Singular Value Decomposition (PCA): M = U*S*V' S is a diagonal matrix of the singular values.