Blog Logo

12 Jan 2024 ~ 3 min read

SVD decomposition


That post is self-contained and describes a simple proof of the existence of the SVD decomposition. The norm we consider is the L2L^2 norm: x2=x12++xn2\Vert x \Vert ^2 = x_1^2 + \ldots + x_n^2. For matrices, A=maxx=1Ax\Vert A \Vert = \textrm{max}_{\Vert x \Vert =1}\Vert Ax \Vert.

We start with a lemma:

Lemma\textbf{Lemma} If ARm×n,A \in \mathbb{R}^{m \times n}, then there exists a unit vector zz such that ATAz=σ2zA^{T} A z=\sigma^{2} z where σ=A\sigma=\|A\|.

Proof\textbf{Proof}. On the sphere {uRn/u=1}\{u \in \mathbb{R}^n / \| u \|=1 \}, the continuous function uAu2u \mapsto \| Au \|^2 reaches a maximum at some unit vector u1u_1. By definition of A\|A\|, this maximum value is σ=A\sigma = \|A\|. Now consider the differentiable function F:vRn{0}Av2σ2v2F:v \in \mathbb R^n-\{0\} \mapsto \| Av \|^2-\sigma^2 \|v\|^2 defined on the open set Rn{0}R^n-\{0\}: one has F(v)=v2.(Avv2σ20)F(v)=\|v\|^2.\left( \left\lVert A \frac{v}{\|v\|^2} \right\rVert-\sigma^2 \leq 0 \right), so FF has a maximum at u1u_1. Therefore u1u_1 is a critical point and Fx\frac{\partial F}{\partial x} vanishes there. Since

Fx=x(xAAxσ2xx)=2AAx2σ2x\frac{\partial F}{\partial x}=\frac{\partial}{\partial x}\left(x^{\top} \cdot A^{\top} \cdot A \cdot x-\sigma^2 \cdot x^{\top} \cdot x\right)=2 \cdot A^{\top} \cdot A \cdot x-2 \cdot \sigma^2 \cdot x

one has AAu1=σ2u1A^{\top} A u_1= \sigma^2 u_1.

Theorem (Singular Value Decomposition)\textbf{Theorem (Singular Value Decomposition)}. Let AA be a real mm by nn matrix, then there exist orthogonal matrices URm×mU \in \mathbb{R}^{m \times m} with columns u1,umu_{1}, \ldots u_{m}, and VRn×nV \in \mathbb{R}^{n \times n} with columns v1,vnv_{1}, \ldots v_{n} satisfying

UAV=Σ=diag(σ1,,σp)Rm×n,p=min{m,n}U^{\top} A V= \Sigma=\textrm{diag} \left(\sigma_{1}, \ldots, \sigma_{p}\right) \in \mathbb{R}^{m \times n}, \quad p=\min \{m, n\}

where σ1σ2σp0\sigma_{1} \geq \sigma_{2} \geq \ldots \geq \sigma_{p} \geq 0.

proof\textbf{proof} Assume A0A \neq 0 (otherwise the proof is immediate). Using the lemma, there exists a unit vector v1v_1 such that AAv1=σ2v1A^{\top} A v_1= \sigma^2 v_1. Define u1:=Av1σu_1:= \frac{Av_1}{\sigma} (observe it is also a unit vector). Together u1,v1u_1, v_1 satisfy

Av1=σu1Au1=σv1\begin{align} Av_1 &= \sigma u_1 \\ A^{\top}u_1 &= \sigma v_1 \end{align}

This implies that A(v1)u1A (v_1^{\perp}) \subset u_1^{\perp} (indeed, vv1vv1=0(Av)u1=vAu1=vσv1=0v \perp v_1 \Rightarrow v^{\top}v_1=0 \Rightarrow (Av)^{\top}u_1=v^{\top}A^{\top}u_1=v^{\top}\sigma v_1=0). Find (n1)(n-1) vectors v2,vnv_2, \dots v_n to create an orthogonal matrix VV with columns the viv_i, and (m1)(m-1) vectors u2,umu_2, \ldots u_m to create an orthogonal matrix UU.

Now,

UAV=U[Av1Av2Avn]=U[σu1Av2Avn]U^{\top} A V = U^{\top} \left[Av_1 | Av_2| \cdots |Av_n \right]= U^{\top} \left[\sigma u_1 | Av_2|\cdots |Av_n \right]

Thus

UAV=(σ000B0)U^{\top} A V= \left(\begin{array}{c|ccc} \sigma & 0 & \cdots & 0 \\ \hline 0 & & & \\ \vdots & & B & \\ 0 & & & \\ \end{array}\right)

The zero entries in the first column are obtained by forming the products of the rows u2,umu_2^{\top}, \ldots u_m^{\top} with the column σu1\sigma u_1, and the first row is obtained by forming the product

u1[σu1Av2Avn]u_1^{\top} \left[ \sigma u_1 \mid Av_2 \mid \cdots \mid Av_n \right]

remembering that all the Av2,AvnAv_2, \ldots Av_n are orthogonal to u1u_1.

This concludes the proof.


Headshot of Sylvain Bonnot

I'm Sylvain. I'm a data scientist based near Paris. You can see some of my work on GitHub.