{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 18.06 pset 6\n",
    "\n",
    "Due Wednesday October 17 at 10:55am."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problem 1 (10 points)\n",
    "\n",
    "Recall that, if $x \\in \\mathbb{R}^n$, then $\\nabla_x f(x)$ (for a scalar-valued function $f$) is a column vector\n",
    "$$\n",
    "\\nabla_x f = \\begin{pmatrix} \\frac{\\partial f}{\\partial x_1} \\\\ \\frac{\\partial f}{\\partial x_2} \\\\ \\vdots \\\\ \\frac{\\partial f}{\\partial x_n} \\end{pmatrix}\n",
    "$$\n",
    "(This is the \"uphill\" direction in which $f$ changes most rapidly.)\n",
    "\n",
    "**(a)** If $f(x) = \\frac{x^T A x}{x^T x}$ for some $n \\times n$ matrix $A$ (not necessarily symmetric!) and $x \\ne 0$, write $\\nabla_x f$ as a matrix expression (not individual components) involving $A$ and $x$.\n",
    "\n",
    "**(b)** For the $f(x)$ from (a), $f(\\alpha x)$ has what relationship to $f(x)$ for any real $\\alpha \\ne 0$?  It follows that $\\nabla_x f$ must be *orthogonal* to what vector?  Check that this is true of your answer from (a)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problem 2 (5 points)\n",
    "\n",
    "If $f(A)$ is a scalar function of an $m\\times n$ *matrix* $A = \\begin{pmatrix} a_{11} & a_{12} & \\cdots \\\\ a_{21} & a_{22} & \\cdots \\\\ \\vdots & \\vdots & \\ddots \\end{pmatrix}$, then it is useful define the gradient with respect to the *matrix* as another $m\\times n$ matrix:\n",
    "$$\n",
    "\\nabla_A f = \\begin{pmatrix} \\frac{\\partial f}{\\partial a_{11}} & \\frac{\\partial f}{\\partial a_{12}} & \\cdots \\\\ \\frac{\\partial f}{\\partial a_{21}} & \\frac{\\partial f}{\\partial a_{22}} & \\cdots \\\\ \\vdots & \\vdots & \\ddots \\end{pmatrix}\n",
    "$$\n",
    "Given this definition, give a matrix expression (not in terms of individual components) for $\\nabla_A f$ with $f(A) = x^T A y$ where $x\\in \\mathbb{R}^m$ and $y\\in \\mathbb{R}^n$ are constant vectors.\n",
    "\n",
    "(This kind of derivative shows up frequently in machine learning, where $A$ is a \"weight\" matrix in a neural network.)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problem 3 (10 points)\n",
    "\n",
    "Suppose that we minimize the length of a vector along a line:\n",
    "$$\n",
    "\\min_{\\alpha \\in \\mathbb{R}} \\Vert u + \\alpha v \\Vert\n",
    "$$\n",
    "for some nonzero vectors $u, v \\in \\mathbb{R}^n$, finding the minimizer $\\hat{\\alpha}$.\n",
    "\n",
    "**(a)** If we write this in the form of a \"standard\" least-square problem $\\min_x \\Vert b - Ax \\Vert$, what are $A$, $b$, and $x$ in terms of the above?\n",
    "\n",
    "**(b)** Solve the normal equations to find an explicit solution $\\hat{\\alpha}$.\n",
    "\n",
    "**(c)** At this minimum, $u + \\hat{\\alpha} v$ is orthogonal to what vector?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problem 4 (15 points)\n",
    "\n",
    "Suppose that we have $m$ data points $\\{ (a_1, b_1), (a_2, b_2), \\ldots, (a_m, b_m) \\}$ that we want to perform a least-square fit to a function of the following form:\n",
    "\n",
    "$$\n",
    "f(a) = x_1 + x_2 a + x_3 a^2 + x_4 (a-1)^2\n",
    "$$\n",
    "\n",
    "That is, we want to minimize $\\sum_{i=1}^m [b_i - f(a_i)]^2$ over all possible $x \\in \\mathbb{R}^4$.\n",
    "\n",
    "**(a)** Formulate this in matrix form as in class: we are minimizing $\\Vert b - Ax \\Vert$ for what matrix $A$ and vector $b$?\n",
    "\n",
    "**(b)** Give the rank of $A$ and $A^T A$ and a basis for $N(A) = N(A^T A)$ (assuming that our data has at least 4 distinct $a_i$ values).  What does this tell you about the solutions to the normal equations $A^T A \\hat{x} = A^T b$ for the fit coefficients $\\hat{x}$?\n",
    "\n",
    "**(c)** Modify the following Julia code to create your matrix $A$ from the given data vectors $a$ and $b$ (see also the polynomial fitting examples in the lecture notes) and plot your least-square fit.\n",
    "\n",
    "**(d)** If the least-square solution is not unique, Julia's `x̂ = A \\ b` finds the `x̂` with **minimum length**, i.e. it minimizes $\\Vert \\hat{x} \\Vert$ over all possible solutions to $A^T A \\hat{x} = A^T b$.  In this problem, that means that Julia's `x̂` must be **orthogonal to what vector(s)**?  (Hint: see problem 3.) Check that this is true of the result that Julia gives you below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = [0.6, -0.1, 0.2, 0.3, 0.4, 0.35, 0.01, 0.5, 0.67, 0.88]\n",
    "b = [1.07943, 1.12779, 0.884219, 0.845884, 0.899928, 0.871585, 0.95691, 1.0084, 1.23807, 1.67931]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A = [ ??? ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x̂ = A \\ b # equivalent to solving normal equations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "using PyPlot\n",
    "plot(a, b, \"ro\")\n",
    "â = linspace(-0.1, 1, 100)\n",
    "plot(â, x̂[1] .+ x̂[2] .* â .+ x̂[3] .* â.^2 +  x̂[4] .* (â .- 1).^2, \"b-\")\n",
    "xlabel(L\"a\")\n",
    "ylabel(L\"b\")\n",
    "legend([\"data\", \"fit\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problem 5 (10 points)\n",
    "\n",
    "(From Strang 4.2, problem 10.)\n",
    "\n",
    "Project $a_1 = \\begin{pmatrix} 1 \\\\ 0 \\end{pmatrix}$ onto the line spanned by $a_2 = \\begin{pmatrix} 1 \\\\ 2 \\end{pmatrix}$.  Then project the result back onto the line spanned by $a_1$.  Multiply these projection matrices $P_1 P_2$: is this a projection?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problem 6 (10 points)\n",
    "\n",
    "(From Strang 4.2 problem 19.)\n",
    "\n",
    "To find the projection matrix onto the plane $x-y-2z=0$, choose two vectors in that plane (the null space of what matrix?) and make them columns of $A$ so that the plane is $C(A)$.  Then compute (by hand) the projection of the point $\\begin{pmatrix} 0 \\\\ 6 \\\\ 12 \\end{pmatrix}$ onto this plane, and check your result in Julia."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A = [...]\n",
    "\n",
    "# project [0,6,12] onto C(A):\n",
    "..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problem 7 (10 points)\n",
    "\n",
    "(From Strang, section 4.2, problem 30.)\n",
    "\n",
    "**(a)** Find the projection matrix $P_C$ onto the column space $C(A)$ (after looking closely at the matrix!) for $A = \\begin{pmatrix} 3 & 6 & 6 \\\\ 4 & 8 & 8 \\end{pmatrix}$.\n",
    "\n",
    "**(b)** Find the 3 by 3 projection matrix $P_R$ onto the row space of $A$.  (You can google the formula for the inverse of a 2 by 2 matrix to try to shorten your algebra… though the fact that A is rank-deficient may give you some trouble… but there is an even simpler way to do it if you realize that the row space is `_____`-dimensional.)  Multiply $B = P_C A P_R$.  Your answer $B$ may be a little surprising at first — can you explain it?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problem 8 (10 points)\n",
    "\n",
    "Given two $m \\times n$ matrices $A$ and $B$ and two right-hand sides $b, c \\in \\mathbb{R}^m$, suppose that we want to minimize:\n",
    "$$\n",
    "f(x) = \\Vert b - Ax \\Vert^2 + \\Vert c - Bx \\Vert^2\n",
    "$$\n",
    "over $x \\in \\mathbb{R}^n$.  That is, we are minimizing the *sum* of two least-square fitting problems.\n",
    "\n",
    "**(a)** $\\Vert b \\Vert^2 + \\Vert c\\Vert^2 = \\Vert w \\Vert^2 $ for a vector $w \\in \\mathbb{R}^{2m}$.  Give such a $w$.\n",
    "\n",
    "**(b)** Write down a matrix equation $C \\hat{x} = d$ whose solution $\\hat{x}$ gives the minimum of $f(x)$.  Give explicit formulas for $C$ and $d$ in terms of $A, B, b, c$.  Hint: use your answer from (a) to convert this into a \"normal\" least-squares problem first."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Julia 0.6.2",
   "language": "julia",
   "name": "julia-0.6"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "0.6.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}