43.05.08 · numerical-analysis / 05-svd-low-rank

Total least squares and the generalised SVD

shipped3 tiersLean: none

Anchor (Master): Golub-Van Loan 2013 *Matrix Computations* 4e (Johns Hopkins) §6.3, §8.7.3-8.7.4; Van Huffel-Vandewalle 1991 *The Total Least Squares Problem* (SIAM) Ch. 2-4; Paige-Saunders 1981 *Towards a generalized singular value decomposition* (SIAM J. Numer. Anal.); Golub-Van Loan 1980 *An analysis of the total least squares problem* (SIAM J. Numer. Anal.)

Intuition Beginner

Ordinary least squares assumes your inputs are perfect and only your outputs are noisy. You fit a line, and you measure each miss as a vertical drop from the data point straight down to the line. That vertical drop treats the horizontal coordinate as exact. But in most real experiments both coordinates are measured, and both carry error. A line fitted as if only the vertical coordinate were noisy will be biased, because it never asks the line to move sideways to meet a point that was mismeasured horizontally.

Total least squares fixes this. Instead of the vertical drop, it measures each miss as the shortest distance from the data point to the line — the perpendicular distance. A point can now be off because its input was wrong, its output was wrong, or both. The fitted line is the one that makes the total of these squared perpendicular distances as small as possible. Because the error can live in any direction, this is called the errors-in-variables fit.

The remarkable part is that the recipe for the best perpendicular fit is still a singular value decomposition — the same rotate-stretch-rotate factorisation used everywhere in this chapter. You stack the inputs and outputs side by side into one wide table, factor it, and read the answer off the smallest stretch direction. The smallest stretch direction is the thin axis of the data cloud, and the best line is the one lying along the cloud, perpendicular to that thin axis.

Visual Beginner

The table contrasts the two fits on the single question that separates them: which direction the error is allowed to point.

Fit	Error measured as	Inputs assumed	Best for
Ordinary least squares	vertical drop to the line	exact (noise-free)	only the output is noisy
Total least squares	perpendicular distance to the line	noisy, like the outputs	both input and output noisy

The difference is a right angle. Ordinary least squares drops straight down from each point; total least squares drops along the shortest path, which meets the line at a right angle. When the data points are far from the line, or when the inputs are very noisy, the two fitted lines can tilt noticeably apart.

The geometry is the whole lesson. Ordinary least squares answers "how far down to the line," total least squares answers "how far to the line in any direction." The second is the honest question whenever the horizontal coordinate is itself a measurement.

Worked example Beginner

Fit a line through the origin, $y = x m$ , to two data points $(1, 1)$ and $(2, 1)$ , where both coordinates are measured and noisy.

First the ordinary fit, which trusts the $x$ values. It minimises the squared vertical gaps. The slope that does this is the ratio of the cross term to the input energy: $m = (1 \cdot 1 + 2 \cdot 1) / (1 \cdot 1 + 2 \cdot 2) = 3/5 = 0.6$ . So the ordinary line is $y = 0.6 x$ .

Now the total fit, which trusts neither coordinate. Stack the data into a wide table with the inputs in the first column and the outputs in the second:

M = (1211) .

The total fit asks for the thin direction of this cloud — the direction in which the two columns vary least together. The stretch factors of $M$ are about $2.62$ and about $0.38$ , and the thin direction (the smallest-stretch direction) turns out to point roughly along $(0.53, - 0.85)$ in the input-output plane. Reading the slope off that thin direction gives $m \approx 0.62$ , a touch steeper than the ordinary $0.6$ .

What this tells us: when you stop trusting the $x$ values, the best line tilts. The total fit found a slightly steeper line because it let the points move sideways as well as up and down to meet it. With only two points the shift is small, but with noisy inputs across many points the bias the ordinary fit carries can be large, and the total fit removes it.

Check your understanding Beginner

Exercise (easy, multiple choice).

In total least squares, the error of each data point is measured as

A. the vertical distance down to the fitted line B. the horizontal distance across to the fitted line C. the perpendicular (shortest) distance to the fitted line D. the distance from the origin to the point

Hint

Total least squares lets the error point in any direction, so it uses the shortest route from the point to the line.

Answer

C. the perpendicular (shortest) distance to the fitted line. Feedback-correct: total least squares minimises the sum of squared perpendicular distances, allowing error in both coordinates. Feedback-wrong: the vertical distance (A) is ordinary least squares, which trusts the inputs; the horizontal distance (B) would trust only the outputs; the origin distance (D) is unrelated to the fit.

Exercise (easy, multiple choice).

To compute the total-least-squares line, you stack the inputs and outputs into one wide table and read the answer off

A. the largest stretch direction of the table B. the smallest stretch direction of the table C. the average of all stretch directions D. the sum of the table's entries

Hint

The best line lies along the data cloud, so the leftover direction perpendicular to it is the thinnest one.

Answer

B. the smallest stretch direction of the table. Feedback-correct: the smallest-stretch (smallest singular value) direction is the thin axis of the data cloud, and the fit lies perpendicular to it. Feedback-wrong: the largest stretch direction (A) points along the cloud's main spread, not across it; an average (C) or a sum (D) does not pick out a direction.

Formal definition Intermediate+

Let $A \in R^{m \times n}$ with $m > n$ and $b \in R^{m}$ , and consider the overdetermined system $A x \approx b$ . Ordinary least squares, treated in 43.04.01, perturbs only $b$ : it solves $min_{r} ∥ r ∥_{2}$ subject to $A x = b - r$ , charging error to the right-hand side alone. The total least squares (TLS) problem charges error to both $A$ and $b$ . It seeks the smallest perturbation of the augmented data $[A ∣ b]$ that makes the perturbed system consistent:

E, r min [E ∣ r]_{F} subject to (A + E) x = b + r for some x \in R^{n},

where $∥ \cdot ∥_{F}$ is the Frobenius norm. The minimising $x$ , when it exists and is unique, is the TLS solution. The constraint says the perturbed augmented matrix $[A + E ∣ b + r]$ has the vector $(- 1 x)$ in its kernel, hence is rank-deficient. So TLS is the problem of finding the nearest rank-deficient augmented matrix in Frobenius norm, then reading the solution off its kernel.

The SVD solution. Write the SVD (developed in 01.01.12) of the $m \times (n + 1)$ augmented matrix as

[A ∣ b] = \tilde{U} \tilde{Σ} \tilde{V}^{T}, \tilde{Σ} = diag (σ_{1}, \dots, σ_{n + 1}), σ_{1} \geq \dots \geq σ_{n + 1} \geq 0.

By the Eckart-Young theorem 01.01.12, the nearest rank- $n$ matrix in Frobenius norm is obtained by zeroing $σ_{n + 1}$ , and the kernel of that nearest matrix is spanned by $\tilde{v}_{n + 1}$ , the last right singular vector. Partition $\tilde{v}_{n + 1} = (γ z)$ with $z \in R^{n}$ and $γ \in R$ its final entry. When $γ \neq = 0$ , the TLS solution is

x_{TLS} = - \frac{1}{γ} z .

Genericity and the degenerate case. The construction requires $γ \neq = 0$ , and uniqueness requires $σ_{n} > σ_{n + 1}$ (a simple smallest singular value of the augmented matrix). Equivalently, the genericity condition is $σ_{m i n} (A) > σ_{n + 1} ([A ∣ b])$ : the smallest singular value of $A$ alone strictly exceeds the smallest singular value of the augmented matrix. When $σ_{n} = σ_{n + 1}$ the minimiser is non-unique, and when $γ = 0$ (the last right singular vector is orthogonal to the $b$ -coordinate) no finite TLS solution exists — the nearest rank-deficient augmented matrix decouples $b$ from $range (A + E)$ .

The closed form. When the genericity condition holds, the TLS solution admits the closed form

x_{TLS} = (A^{T} A - σ_{n + 1}^{2} I_{n})^{- 1} A^{T} b .

This is the ordinary normal-equations solution with the smallest augmented singular value subtracted from the diagonal of $A^{T} A$ — a deregularisation: where Tikhonov regularisation adds a positive multiple of $I$ to stabilise, TLS subtracts $σ_{n + 1}^{2} I$ , which is why TLS amplifies noise more than ordinary least squares and why $A^{T} A - σ_{n + 1}^{2} I$ must stay positive definite for the solution to exist.

The generalised SVD. The generalised singular value decomposition (GSVD) factors a matrix pair rather than a single matrix. For $A \in R^{m_{A} \times n}$ and $B \in R^{m_{B} \times n}$ sharing $n$ columns, with the stacked matrix $(B A)$ of full column rank, there exist orthogonal $U \in R^{m_{A} \times m_{A}}$ , $V \in R^{m_{B} \times m_{B}}$ and a nonsingular $X \in R^{n \times n}$ with

A = U Σ_{A} X^{- 1}, B = V Σ_{B} X^{- 1},

where $Σ_{A} = diag (α_{1}, \dots, α_{n})$ and $Σ_{B} = diag (β_{1}, \dots, β_{n})$ (rectangular diagonal) carry non-negative entries normalised so that $α_{i}^{2} + β_{i}^{2} = 1$ . The ratios $γ_{i} = α_{i} / β_{i}$ are the generalised singular values of the pair $(A, B)$ ; they are exactly the square roots of the eigenvalues of the pencil $A^{T} A - λ B^{T} B$ , linking the GSVD to the generalised eigenvalue problem cross-referenced in 43.06.10. When $B = I$ , the generalised singular values reduce to the ordinary singular values of $A$ .

Counterexamples to common slips

TLS is not ordinary least squares with $b$ perturbed too. Ordinary least squares perturbs $b$ but holds $A$ fixed; TLS perturbs the whole augmented matrix and minimises the joint Frobenius norm of $[E ∣ r]$ . The two coincide only when $A$ has error-free columns, i.e. when the right model is mixed LS-TLS, not full TLS.
The TLS solution need not exist. If the last right singular vector of $[A ∣ b]$ has a zero in its $b$ -coordinate ( $γ = 0$ ), the formula $x = - z / γ$ is undefined and the problem has no finite solution. This is a genuine feature of the perpendicular-distance geometry, not a numerical artefact.
The closed form subtracts $σ_{n + 1}^{2} I$ ; it does not add it. Writing $A^{T} A + σ_{n + 1}^{2} I$ would be Tikhonov-type regularisation, the opposite stabilising operation. TLS deregularises, so it is more, not less, sensitive than ordinary least squares.
The GSVD's $X$ is not orthogonal in general. Only $U$ and $V$ are orthogonal; $X$ is merely nonsingular. Demanding $X$ orthogonal would force $A$ and $B$ to be simultaneously diagonalisable by the same orthogonal change of basis, which fails for a generic pair.

Key theorem with proof Intermediate+

Theorem (the TLS solution from the smallest singular triple). Let $A \in R^{m \times n}$ , $b \in R^{m}$ , $m > n$ , and let $[A ∣ b] = \tilde{U} \tilde{Σ} \tilde{V}^{T}$ be a singular value decomposition with $σ_{1} \geq \dots \geq σ_{n + 1}$ . Suppose the genericity condition $σ_{n} > σ_{n + 1}$ holds and the last right singular vector $\tilde{v}_{n + 1} = (γ z)$ has $γ \neq = 0$ . Then the total-least-squares problem has the unique solution $x_{TLS} = - z / γ$ , the minimal perturbation has Frobenius norm $∥ [E ∣ r] ∥_{F} = σ_{n + 1}$ , and $x_{TLS}$ satisfies the closed form $x_{TLS} = (A^{T} A - σ_{n + 1}^{2} I)^{- 1} A^{T} b$ . ^{[Golub, G. H. & Van Loan, C. F. — An analysis of the total least squares problem]}

Proof. The constraint $(A + E) x = b + r$ says the vector $(- 1 x)$ lies in the kernel of $C := [A + E ∣ b + r] = [A ∣ b] + [E ∣ r]$ . A matrix has a nonzero kernel vector exactly when it is rank-deficient, so the TLS problem is: minimise $∥ [E ∣ r] ∥_{F}$ over all $[E ∣ r]$ for which $[A ∣ b] + [E ∣ r]$ has rank at most $n$ (one less than its $n + 1$ columns), subject to the kernel containing a vector with last entry $- 1$ .

By the Eckart-Young theorem 01.01.12, among all rank- $n$ matrices the one nearest $[A ∣ b]$ in Frobenius norm is

C_{⋆} = \tilde{U} \tilde{Σ}_{⋆} \tilde{V}^{T}, \tilde{Σ}_{⋆} = diag (σ_{1}, \dots, σ_{n}, 0),

obtained by setting $σ_{n + 1} \mapsto 0$ , and the minimal distance is $∥ [A ∣ b] - C_{⋆} ∥_{F} = σ_{n + 1}$ . The genericity condition $σ_{n} > σ_{n + 1}$ makes this nearest rank- $n$ matrix unique. Its kernel is one-dimensional, spanned by the right singular vector $\tilde{v}_{n + 1}$ associated with the zeroed singular value, since $C_{⋆} \tilde{v}_{n + 1} = \tilde{U} \tilde{Σ}_{⋆} \tilde{V}^{T} \tilde{v}_{n + 1} = \tilde{U} \tilde{Σ}_{⋆} e_{n + 1} = 0$ .

A kernel vector usable as a solution must have its last coordinate normalisable to $- 1$ , which requires $γ \neq = 0$ . Under that hypothesis, scale $\tilde{v}_{n + 1} = (γ z)$ by $- 1/ γ$ to obtain the kernel vector $(- 1 - z / γ) = (- 1 x _{TLS})$ . The first $n$ coordinates give $x_{TLS} = - z / γ$ , and the constraint is met with $[E ∣ r] = C_{⋆} - [A ∣ b]$ of Frobenius norm $σ_{n + 1}$ .

For the closed form, the eigen-relation $[A ∣ b]^{T} [A ∣ b] \tilde{v}_{n + 1} = σ_{n + 1}^{2} \tilde{v}_{n + 1}$ written in block form with $\tilde{v}_{n + 1} = (γ z)$ reads

(A^{T} A b^{T} A A^{T} b b^{T} b) (z γ) = σ_{n + 1}^{2} (z γ) .

The top block gives $A^{T} A z + γ A^{T} b = σ_{n + 1}^{2} z$ , that is $(A^{T} A - σ_{n + 1}^{2} I) z = - γ A^{T} b$ . Dividing by $- γ$ and using $x_{TLS} = - z / γ$ yields $(A^{T} A - σ_{n + 1}^{2} I) x_{TLS} = A^{T} b$ . The genericity condition $σ_{n} > σ_{n + 1}$ forces $σ_{n + 1} < σ_{m i n} (A)$ , so the singular values of $A$ all exceed $σ_{n + 1}$ , hence $A^{T} A - σ_{n + 1}^{2} I$ is positive definite and invertible, giving the stated closed form. $□$

Bridge. This theorem builds toward the unified view of fitting as low-rank approximation, and it appears again in 43.06.10, where the same pencil $A^{T} A - λ B^{T} B$ reappears as the generalised eigenproblem the GSVD diagonalises. The foundational reason TLS reduces to an SVD is the Eckart-Young theorem of 01.01.12: minimising a Frobenius perturbation subject to a rank drop is exactly the optimal low-rank approximation problem, and the constraint that the kernel carry a vector with last entry $- 1$ is what converts the abstract kernel direction into an affine solution $x$ . Putting these together, ordinary least squares of 43.04.01 and total least squares are two faces of one optimisation — column-space projection versus nearest-rank-deficient augmentation — and the closed forms differ only by the term $- σ_{n + 1}^{2} I$ , which is the central insight that TLS is a deregularised normal-equations solve. The TLS route generalises the single-matrix SVD to the augmented matrix, and the GSVD generalises it again to a matrix pair, so the bridge is the same low-rank-approximation principle widening from one matrix to two.

Exercises Intermediate+

Exercise 7 (hard, symbolic).

Show that when $B = I_{n}$ , the generalised singular values $γ_{i} = α_{i} / β_{i}$ of the pair $(A, I)$ equal the ordinary singular values of $A$ .

Hint

With $B = I$ , the relation $B = V Σ_{B} X^{- 1}$ forces $V Σ_{B} = X$ ; substitute into $A = U Σ_{A} X^{- 1}$ and use the normalisation.

Answer

From $I = V Σ_{B} X^{- 1}$ we get $X = V Σ_{B}$ , so $X^{- 1} = Σ_{B}^{- 1} V^{T}$ (with $Σ_{B}$ diagonal positive). Then $A = U Σ_{A} X^{- 1} = U Σ_{A} Σ_{B}^{- 1} V^{T} = U diag (α_{i} / β_{i}) V^{T}$ . This is an ordinary SVD of $A$ with orthogonal $U, V$ and diagonal entries $α_{i} / β_{i} = γ_{i}$ , so the generalised singular values are the ordinary singular values $σ_{i} (A)$ . The normalisation $α_{i}^{2} + β_{i}^{2} = 1$ fixes $β_{i} = 1/ 1 + σ_{i}^{2}$ and $α_{i} = σ_{i} / 1 + σ_{i}^{2}$ . Rubric: full credit for solving for $X$ , substituting, and identifying the ordinary SVD.

Exercise 8 (hard, symbolic).

Explain quantitatively why the TLS solution is more sensitive to perturbations than the ordinary least-squares solution, using the two closed forms $x_{LS} = (A^{T} A)^{- 1} A^{T} b$ and $x_{TLS} = (A^{T} A - σ_{n + 1}^{2} I)^{- 1} A^{T} b$ .

Hint

Compare the smallest eigenvalues of the two matrices being inverted; the condition number is the ratio of largest to smallest eigenvalue.

Answer

Both solutions invert a shifted Gram matrix. The eigenvalues of $A^{T} A$ are $σ_{i} (A)^{2}$ , so $x_{LS}$ inverts a matrix with smallest eigenvalue $σ_{n} (A)^{2}$ and condition number $σ_{1} (A)^{2} / σ_{n} (A)^{2} = κ_{2} (A)^{2}$ . The TLS matrix $A^{T} A - σ_{n + 1}^{2} I$ has eigenvalues $σ_{i} (A)^{2} - σ_{n + 1}^{2}$ , so its smallest eigenvalue $σ_{n} (A)^{2} - σ_{n + 1}^{2}$ is strictly smaller and its condition number $(σ_{1} (A)^{2} - σ_{n + 1}^{2}) / (σ_{n} (A)^{2} - σ_{n + 1}^{2})$ is strictly larger whenever $σ_{n + 1} > 0$ . The subtraction pushes the smallest eigenvalue toward zero, inflating the condition number and the noise amplification. As $σ_{n} (A) \to σ_{n + 1}$ (the genericity boundary) the TLS matrix becomes singular and the solution diverges. Rubric: full credit for the eigenvalue comparison, the smaller-smallest-eigenvalue observation, and the divergence at the genericity boundary.

Advanced results Master

Theorem (orthogonal-distance optimality of TLS). Let the rows of $[A ∣ b]$ be the data points $p_{i} \in R^{n + 1}$ . The TLS fit minimises the sum of squared orthogonal (perpendicular) distances from the points $p_{i}$ to the hyperplane through the origin with unit normal $\tilde{v}_{n + 1}$ , whereas ordinary least squares minimises the sum of squared distances measured only along the $b$ -coordinate axis. Concretely, the orthogonal distance from $p_{i}$ to the hyperplane ${p : \tilde{v}_{n + 1}^{T} p = 0}$ is $∣ \tilde{v}_{n + 1}^{T} p_{i} ∣$ , and

i = 1 \sum m ∣ \tilde{v}_{n + 1}^{T} p_{i} ∣^{2} = [A ∣ b] \tilde{v}_{n + 1}_{2}^{2} = σ_{n + 1}^{2},

minimised over unit vectors by the smallest right singular vector. ^{[Van Huffel, S. & Vandewalle, J. — The Total Least Squares Problem: Computational Aspects and Analysis]} The vector $\tilde{v}_{n + 1}$ is the unit normal that the data cloud is thinnest along; the fitted relation $\tilde{v}_{n + 1}^{T} (β a) = 0$ rearranges, after extracting the $b$ -coordinate, into $b = A x_{TLS}$ . This is the precise sense in which TLS is orthogonal distance regression: it is principal-component fitting of a hyperplane to the augmented data, the dual of ordinary least squares' axis-aligned residual.

Theorem (the GSVD via the CS decomposition; Paige-Saunders form). Let $A \in R^{m_{A} \times n}$ , $B \in R^{m_{B} \times n}$ with $(B A)$ of column rank $n$ . Form a thin QR (or any orthonormal basis) $(B A) = QR$ with $Q = (Q _{B} Q _{A})$ having orthonormal columns and $R \in R^{n \times n}$ nonsingular. The CS decomposition of the column-orthonormal $Q$ supplies orthogonal $U, V$ and an orthogonal $W$ with

Q_{A} = U C W^{T}, Q_{B} = V S W^{T}, C = diag (α_{i}), S = diag (β_{i}), C^{T} C + S^{T} S = I .

Setting $X^{- 1} = W^{T} R$ recovers $A = Q_{A} R = U C (W^{T} R) = U Σ_{A} X^{- 1}$ and likewise $B = V Σ_{B} X^{- 1}$ , with $Σ_{A} = C$ , $Σ_{B} = S$ . ^{[Paige, C. C. & Saunders, M. A. — Towards a generalized singular value decomposition]} The CS decomposition guarantees $α_{i}^{2} + β_{i}^{2} = 1$ , so the generalised singular values $γ_{i} = α_{i} / β_{i}$ are well-defined whenever $β_{i} > 0$ , with $γ_{i} = + \infty$ encoding the directions in $ker B$ . The Paige-Saunders construction needs no rank assumption on $A$ or $B$ individually — only on the stack — which is why it is the numerically stable route, computing the GSVD without ever forming the cross-products $A^{T} A$ or $B^{T} B$ that would square the conditioning as in 43.04.01.

Theorem (GSVD diagonalises the symmetric-definite pencil). With the GSVD $A = U Σ_{A} X^{- 1}$ , $B = V Σ_{B} X^{- 1}$ and $B$ of full column rank, the columns of $X$ are generalised eigenvectors of the pencil $(A^{T} A, B^{T} B)$ :

A^{T} A x_{i} = γ_{i}^{2} B^{T} B x_{i}, γ_{i}^{2} = α_{i}^{2} / β_{i}^{2},

so $X^{T} (A^{T} A) X = Σ_{A}^{T} Σ_{A}$ and $X^{T} (B^{T} B) X = Σ_{B}^{T} Σ_{B}$ are simultaneously diagonal. ^{[Van Loan, C. F. — Generalizing the singular value decomposition]} This is the constructive, floating-point-stable form of the simultaneous diagonalisation of two quadratic forms whose existence theory the corpus carries in 01.01.19, and it is exactly the symmetric-definite specialisation of the generalised eigenproblem and QZ algorithm of 43.06.10. The GSVD thereby solves constrained and weighted least squares — minimising $∥ A x - c ∥_{2}$ subject to $∥ B x ∥_{2}$ fixed, or the Tikhonov problem $min_{x} ∥ A x - c ∥_{2}^{2} + μ^{2} ∥ B x ∥_{2}^{2}$ — by decoupling the two quadratic forms into the GSVD coordinates, where the regularised solution reads off termwise as $x = \sum_{i} \frac{α _{i} ( u _{i}^{T} c )}{α _{i}^{2} + μ ^{2} β _{i}^{2}} x_{i}$ .

Synthesis. The total-least-squares problem and the generalised SVD are one widening of the singular value decomposition, and the foundational reason both reduce to it is the Eckart-Young low-rank principle of 01.01.12: TLS is the nearest-rank-deficient augmentation of a single matrix, while the GSVD is the simultaneous orthogonal reduction of a pair. The central insight is that perpendicular-distance fitting is principal-component analysis of the augmented data — the residual that ordinary least squares of 43.04.01 measures along one axis becomes, in TLS, the thinnest direction of the whole cloud, and this is exactly the smallest singular triple of $[A ∣ b]$ . Putting these together, the deregularising closed form $x = (A^{T} A - σ_{n + 1}^{2} I)^{- 1} A^{T} b$ is dual to Tikhonov regularisation's stabilising $+ μ^{2} I$ , and the GSVD unifies both by diagonalising the pencil $(A^{T} A, B^{T} B)$ — the same pencil whose general (non-definite) case the QZ algorithm of 43.06.10 resolves and whose existence theory 01.01.19 supplies. The bridge is the CS decomposition: it computes the GSVD from an orthonormal basis of the stacked range without forming a single cross-product, which generalises the conditioning lesson of 43.04.01 from one matrix to a pair, so the whole subject is the SVD applied first to an augmented matrix and then to a pencil, with orthogonal transformations protecting the conditioning at every step.

Full proof set Master

Proposition (genericity controls existence and uniqueness of the TLS solution). Let $[A ∣ b]$ have singular values $σ_{1} \geq \dots \geq σ_{n + 1}$ and write $σ_{n}^{'} := σ_{m i n} (A)$ . The TLS problem has a unique solution if and only if $σ_{n} > σ_{n + 1}$ and the last right singular vector has $γ \neq = 0$ ; a sufficient condition for the latter is the strict interlacing $σ_{n}^{'} > σ_{n + 1}$ .

Proof. Uniqueness of the nearest rank- $n$ matrix in Frobenius norm holds exactly when the singular value being zeroed is strictly separated from the one above it, $σ_{n} > σ_{n + 1}$ (Eckart-Young uniqueness, 01.01.12); otherwise the minimising rank- $n$ matrix lies in a positive-dimensional family obtained by rotating within the $σ_{n} = σ_{n + 1}$ singular subspace, and the kernel direction is not unique. Given $σ_{n} > σ_{n + 1}$ , the kernel of the nearest rank- $n$ matrix is the line $R \tilde{v}_{n + 1}$ , and a finite TLS solution exists iff this line is transverse to the hyperplane ${γ = 0}$ , i.e. iff $γ \neq = 0$ . For the sufficient condition, the singular values of $A$ interlace those of $[A ∣ b]$ (a column-bordered matrix), giving $σ_{i + 1} ([A ∣ b]) \leq σ_{i} (A) \leq σ_{i} ([A ∣ b])$ ; in particular $σ_{n + 1} \leq σ_{n}^{'} = σ_{m i n} (A)$ . If this interlacing is strict, $σ_{n + 1} < σ_{n}^{'}$ , then $σ_{n + 1}$ is not a singular value of $A$ , so $\tilde{v}_{n + 1}$ cannot have $γ = 0$ : a vector $(0 z)$ in the smallest right singular subspace of $[A ∣ b]$ would satisfy $∥ [A ∣ b] (0 z) ∥_{2} = ∥ A z ∥_{2} \geq σ_{n}^{'} ∥ z ∥_{2} > σ_{n + 1} ∥ z ∥_{2}$ , contradicting that $(0 z)$ achieves the minimum $σ_{n + 1}$ . Hence $γ \neq = 0$ and the solution exists uniquely. $□$

Proposition (the minimal perturbation is explicit and attains $σ_{n + 1}$ ). Under the genericity condition, the optimal correction is the rank-one outer product $[E ∣ r] = - σ_{n + 1} \tilde{u}_{n + 1} \tilde{v}_{n + 1}^{T}$ , with $∥ [E ∣ r] ∥_{F} = σ_{n + 1}$ , and $[A + E ∣ b + r] (- 1 x _{TLS}) = 0$ .

Proof. The nearest rank- $n$ matrix is $C_{⋆} = \sum_{i = 1}^{n} σ_{i} \tilde{u}_{i} \tilde{v}_{i}^{T} = [A ∣ b] - σ_{n + 1} \tilde{u}_{n + 1} \tilde{v}_{n + 1}^{T}$ , so the correction is $[E ∣ r] = C_{⋆} - [A ∣ b] = - σ_{n + 1} \tilde{u}_{n + 1} \tilde{v}_{n + 1}^{T}$ , a rank-one matrix. Its Frobenius norm is $σ_{n + 1} ∥ \tilde{u}_{n + 1} ∥_{2} ∥ \tilde{v}_{n + 1} ∥_{2} = σ_{n + 1}$ since the singular vectors are unit. The kernel relation $C_{⋆} \tilde{v}_{n + 1} = 0$ scales, on dividing $\tilde{v}_{n + 1} = (γ z)$ by $- γ$ , to $C_{⋆} (- 1 x _{TLS}) = 0$ , i.e. $[A + E ∣ b + r] (- 1 x _{TLS}) = 0$ , which is the consistency constraint $(A + E) x_{TLS} = b + r$ . No smaller correction works, because any $[E ∣ r]$ with $∥ [E ∣ r] ∥_{F} < σ_{n + 1}$ leaves $[A ∣ b] + [E ∣ r]$ of full column rank $n + 1$ by the Eckart-Young distance-to-singularity bound, hence with kernel reduced to the zero vector, hence inconsistent for every $x$ . $□$

Proposition (the GSVD generalised singular values are the pencil eigenvalues). For a pair $(A, B)$ with $(B A)$ of full column rank and $B$ of full column rank, the scalars $γ_{i}^{2} = α_{i}^{2} / β_{i}^{2}$ are exactly the eigenvalues of the symmetric-definite pencil $A^{T} A - λ B^{T} B$ , with the GSVD columns $x_{i}$ of $X$ as generalised eigenvectors.

Proof. From $A = U Σ_{A} X^{- 1}$ , $A^{T} A = X^{- T} Σ_{A}^{T} Σ_{A} X^{- 1} = X^{- T} diag (α_{i}^{2}) X^{- 1}$ , and similarly $B^{T} B = X^{- T} diag (β_{i}^{2}) X^{- 1}$ . Multiplying both by $X$ on the right and $X^{T}$ on the left gives $X^{T} A^{T} A X = diag (α_{i}^{2})$ and $X^{T} B^{T} B X = diag (β_{i}^{2})$ , simultaneously diagonal. For the $i$ -th column $x_{i}$ of $X$ , the relation $A^{T} A x_{i} = γ_{i}^{2} B^{T} B x_{i}$ follows by comparing the $i$ -th columns of $A^{T} A X = X^{- T} diag (α_{i}^{2})$ and $B^{T} B X = X^{- T} diag (β_{i}^{2})$ : each equals $X^{- T} e_{i}$ times $α_{i}^{2}$ resp. $β_{i}^{2}$ , so $A^{T} A x_{i} = α_{i}^{2} X^{- T} e_{i} = (α_{i}^{2} / β_{i}^{2}) β_{i}^{2} X^{- T} e_{i} = γ_{i}^{2} B^{T} B x_{i}$ . Full column rank of $B$ makes $B^{T} B$ positive definite, so the pencil is regular and its $n$ finite eigenvalues are precisely the $γ_{i}^{2}$ . $□$

Connections Master

The entire low-rank machinery this unit rests on — the existence and uniqueness of the SVD, the Eckart-Young best-rank- $k$ approximation theorem that turns TLS into a smallest-singular-triple computation, and the dyadic expansion $A = \sum σ_{i} u_{i} v_{i}^{T}$ — is proved in 01.01.12; this unit specialises that theory to the augmented matrix $[A ∣ b]$ and to a matrix pair, neither of which the foundational SVD unit treats.
The ordinary least-squares problem that TLS reframes — the normal equations, the three solver algorithms, and the conditioning analysis $κ_{2} (A^{T} A) = κ_{2} (A)^{2}$ — is the subject of 43.04.01, and the deregularising closed form $x_{TLS} = (A^{T} A - σ_{n + 1}^{2} I)^{- 1} A^{T} b$ is exactly that unit's normal-equations solve with the smallest augmented singular value subtracted from the diagonal, making explicit why TLS amplifies noise more than ordinary least squares.
The GSVD diagonalises the symmetric-definite pencil $(A^{T} A, B^{T} B)$ , whose existence theory as simultaneous diagonalisation of two quadratic forms is 01.01.19 and whose general, possibly non-definite, numerical resolution by the QZ algorithm is 43.06.10; this unit supplies the SVD-based constructive form that sits between the abstract existence statement and the full pencil algorithm.

Historical & philosophical context Master

The errors-in-variables idea predates its modern numerical form: orthogonal-distance line fitting was studied by Adcock in 1878 and by Pearson in his 1901 paper on lines and planes of closest fit to systems of points, where the principal-axis construction that underlies total least squares first appears in statistics. The numerical-linear-algebra treatment is due to Golub and Van Loan, whose 1980 paper An analysis of the total least squares problem (SIAM Journal on Numerical Analysis 17, 883-893) gave the SVD solution from the smallest singular triple, the closed form, and the genericity and non-existence conditions in the form used here ^{[Golub, G. H. & Van Loan, C. F. — An analysis of the total least squares problem]}. The name total least squares is theirs.

The generalised singular value decomposition was introduced by Van Loan in 1976 as a simultaneous diagonalisation of a matrix pair, and recast by Paige and Saunders in 1981 into the CS-decomposition form that requires no rank assumption on the individual blocks and computes without cross-products ^{[Paige, C. C. & Saunders, M. A. — Towards a generalized singular value decomposition]}. Van Huffel and Vandewalle's 1991 monograph collected the computational theory of total least squares, including its multiple-right-hand-side and degenerate variants, and made the link between the TLS problem and the GSVD of the pair explicit.

Bibliography Master

@book{golub-vanloan-2013-tls,
  author    = {Golub, Gene H. and Van Loan, Charles F.},
  title     = {Matrix Computations},
  edition   = {4th},
  publisher = {Johns Hopkins University Press},
  year      = {2013},
  address   = {Baltimore},
  note      = {Sec. 6.3 (total least squares), Sec. 8.7.3-8.7.4 (the generalized SVD)}
}

@article{golub-vanloan-1980,
  author  = {Golub, Gene H. and Van Loan, Charles F.},
  title   = {An analysis of the total least squares problem},
  journal = {SIAM Journal on Numerical Analysis},
  volume  = {17},
  number  = {6},
  pages   = {883--893},
  year    = {1980}
}

@book{vanhuffel-vandewalle-1991,
  author    = {Van Huffel, Sabine and Vandewalle, Joos},
  title     = {The Total Least Squares Problem: Computational Aspects and Analysis},
  series    = {Frontiers in Applied Mathematics},
  volume    = {9},
  publisher = {Society for Industrial and Applied Mathematics},
  year      = {1991},
  address   = {Philadelphia}
}

@article{paige-saunders-1981,
  author  = {Paige, Christopher C. and Saunders, Michael A.},
  title   = {Towards a generalized singular value decomposition},
  journal = {SIAM Journal on Numerical Analysis},
  volume  = {18},
  number  = {3},
  pages   = {398--405},
  year    = {1981}
}

@article{vanloan-1976,
  author  = {Van Loan, Charles F.},
  title   = {Generalizing the singular value decomposition},
  journal = {SIAM Journal on Numerical Analysis},
  volume  = {13},
  number  = {1},
  pages   = {76--83},
  year    = {1976}
}

@article{pearson-1901,
  author  = {Pearson, Karl},
  title   = {On lines and planes of closest fit to systems of points in space},
  journal = {Philosophical Magazine, Series 6},
  volume  = {2},
  number  = {11},
  pages   = {559--572},
  year    = {1901}
}

Prerequisites

43.04.01
01.01.12

Tier anchors

beginner: Golub-Van Loan 2013 *Matrix Computations* 4e (Johns Hopkins) §6.3 (total least squares, informal errors-in-variables setup); Strang 2016 *Introduction to Linear Algebra* 5e (Wellesley-Cambridge) Ch. 7 (the SVD and best low-rank fit)
intermediate: Golub-Van Loan 2013 *Matrix Computations* 4e (Johns Hopkins) §6.3 (total least squares via the SVD of [A b]) and §8.7.3-8.7.4 (the generalised SVD); Van Huffel-Vandewalle 1991 *The Total Least Squares Problem* (SIAM) Ch. 2-3
master: Golub-Van Loan 2013 *Matrix Computations* 4e (Johns Hopkins) §6.3, §8.7.3-8.7.4; Van Huffel-Vandewalle 1991 *The Total Least Squares Problem* (SIAM) Ch. 2-4; Paige-Saunders 1981 *Towards a generalized singular value decomposition* (SIAM J. Numer. Anal.); Golub-Van Loan 1980 *An analysis of the total least squares problem* (SIAM J. Numer. Anal.)

References

Golub, G. H. & Van Loan, C. F. — Matrix Computations (4th ed.) · Johns Hopkins University Press, 2013. §6.3 (the total least squares problem, its SVD solution from the smallest singular triple of [A b], existence and genericity conditions), §8.7.3-8.7.4 (the generalised singular value decomposition of a matrix pair and its CS-decomposition derivation).
Golub, G. H. & Van Loan, C. F. — An analysis of the total least squares problem · SIAM Journal on Numerical Analysis 17 (1980), 883-893. The closed-form SVD solution of the errors-in-variables problem, the secular-equation characterisation, and the non-genericity / non-uniqueness conditions sigma_n(A) = sigma_{n+1}([A b]).
Van Huffel, S. & Vandewalle, J. — The Total Least Squares Problem: Computational Aspects and Analysis · SIAM Frontiers in Applied Mathematics 9, 1991. Ch. 2-3 (the basic TLS algorithm, the multiple-right-hand-side and degenerate cases), Ch. 4 (the algebraic connections to least squares and the relation to the GSVD).
Paige, C. C. & Saunders, M. A. — Towards a generalized singular value decomposition · SIAM Journal on Numerical Analysis 18 (1981), 398-405. The Paige-Saunders form of the GSVD for a matrix pair, derived from the CS decomposition of an orthonormal basis for the stacked range, valid without rank assumptions on the individual blocks.
Van Loan, C. F. — Generalizing the singular value decomposition · SIAM Journal on Numerical Analysis 13 (1976), 76-83. The original definition of the GSVD of a matrix pair (A, B) by simultaneous diagonalisation A = U Sigma_A X^{-1}, B = V Sigma_B X^{-1}, and its application to constrained and weighted least squares.

Estimated time

beginner: 20m
intermediate: 45m
master: 90m