SVD decomposition

That post is self-contained and describes a simple proof of the existence of the SVD decomposition. The norm we consider is the $L^2$ norm: $\Vert x \Vert ^2 = x_1^2 + \ldots + x_n^2$ . For matrices, $\Vert A \Vert = \textrm{max}_{\Vert x \Vert =1}\Vert Ax \Vert$ .

We start with a lemma:

$\textbf{Lemma}$ If $A \in \mathbb{R}^{m \times n},$ then there exists a unit vector $z$ such that $A^{T} A z=\sigma^{2} z$ where $\sigma=\|A\|$ .

$\textbf{Proof}$ . On the sphere $\{u \in \mathbb{R}^n / \| u \|=1 \}$ , the continuous function $u \mapsto \| Au \|^2$ reaches a maximum at some unit vector $u_1$ . By definition of $\|A\|$ , this maximum value is $\sigma = \|A\|$ . Now consider the differentiable function $F:v \in \mathbb R^n-\{0\} \mapsto \| Av \|^2-\sigma^2 \|v\|^2$ defined on the open set $R^n-\{0\}$ : one has $F(v)=\|v\|^2.\left( \left\lVert A \frac{v}{\|v\|^2} \right\rVert-\sigma^2 \leq 0 \right)$ , so $F$ has a maximum at $u_1$ . Therefore $u_1$ is a critical point and $\frac{\partial F}{\partial x}$ vanishes there. Since

$\frac{\partial F}{\partial x}=\frac{\partial}{\partial x}\left(x^{\top} \cdot A^{\top} \cdot A \cdot x-\sigma^2 \cdot x^{\top} \cdot x\right)=2 \cdot A^{\top} \cdot A \cdot x-2 \cdot \sigma^2 \cdot x$

one has $A^{\top} A u_1= \sigma^2 u_1$ .

$\textbf{Theorem (Singular Value Decomposition)}$ . Let $A$ be a real $m$ by $n$ matrix, then there exist orthogonal matrices $U \in \mathbb{R}^{m \times m}$ with columns $u_{1}, \ldots u_{m}$ , and $V \in \mathbb{R}^{n \times n}$ with columns $v_{1}, \ldots v_{n}$ satisfying

$U^{\top} A V= \Sigma=\textrm{diag} \left(\sigma_{1}, \ldots, \sigma_{p}\right) \in \mathbb{R}^{m \times n}, \quad p=\min \{m, n\}$

where $\sigma_{1} \geq \sigma_{2} \geq \ldots \geq \sigma_{p} \geq 0$ .

$\textbf{proof}$ Assume $A \neq 0$ (otherwise the proof is immediate). Using the lemma, there exists a unit vector $v_1$ such that $A^{\top} A v_1= \sigma^2 v_1$ . Define $u_1:= \frac{Av_1}{\sigma}$ (observe it is also a unit vector). Together $u_1, v_1$ satisfy

\begin{align} Av_1 &= \sigma u_1 \\ A^{\top}u_1 &= \sigma v_1 \end{align}

This implies that $A (v_1^{\perp}) \subset u_1^{\perp}$ (indeed, $v \perp v_1 \Rightarrow v^{\top}v_1=0 \Rightarrow (Av)^{\top}u_1=v^{\top}A^{\top}u_1=v^{\top}\sigma v_1=0$ ). Find $(n-1)$ vectors $v_2, \dots v_n$ to create an orthogonal matrix $V$ with columns the $v_i$ , and $(m-1)$ vectors $u_2, \ldots u_m$ to create an orthogonal matrix $U$ .

Now,

U^{\top} A V = U^{\top} \left[Av_1 | Av_2| \cdots |Av_n \right]= U^{\top} \left[\sigma u_1 | Av_2|\cdots |Av_n \right]

Thus

U^{\top} A V= \left(\begin{array}{c|ccc} \sigma & 0 & \cdots & 0 \\ \hline 0 & & & \\ \vdots & & B & \\ 0 & & & \\ \end{array}\right)

The zero entries in the first column are obtained by forming the products of the rows $u_2^{\top}, \ldots u_m^{\top}$ with the column $\sigma u_1$ , and the first row is obtained by forming the product

u_1^{\top} \left[ \sigma u_1 \mid Av_2 \mid \cdots \mid Av_n \right]

remembering that all the $Av_2, \ldots Av_n$ are orthogonal to $u_1$ .

This concludes the proof.