43.05.08 · numerical-analysis / 05-svd-low-rank

Total least squares and the generalised SVD

shipped3 tiersLean: none

Anchor (Master): Golub-Van Loan 2013 *Matrix Computations* 4e (Johns Hopkins) §6.3, §8.7.3-8.7.4; Van Huffel-Vandewalle 1991 *The Total Least Squares Problem* (SIAM) Ch. 2-4; Paige-Saunders 1981 *Towards a generalized singular value decomposition* (SIAM J. Numer. Anal.); Golub-Van Loan 1980 *An analysis of the total least squares problem* (SIAM J. Numer. Anal.)

Intuition Beginner

Ordinary least squares assumes your inputs are perfect and only your outputs are noisy. You fit a line, and you measure each miss as a vertical drop from the data point straight down to the line. That vertical drop treats the horizontal coordinate as exact. But in most real experiments both coordinates are measured, and both carry error. A line fitted as if only the vertical coordinate were noisy will be biased, because it never asks the line to move sideways to meet a point that was mismeasured horizontally.

Total least squares fixes this. Instead of the vertical drop, it measures each miss as the shortest distance from the data point to the line — the perpendicular distance. A point can now be off because its input was wrong, its output was wrong, or both. The fitted line is the one that makes the total of these squared perpendicular distances as small as possible. Because the error can live in any direction, this is called the errors-in-variables fit.

The remarkable part is that the recipe for the best perpendicular fit is still a singular value decomposition — the same rotate-stretch-rotate factorisation used everywhere in this chapter. You stack the inputs and outputs side by side into one wide table, factor it, and read the answer off the smallest stretch direction. The smallest stretch direction is the thin axis of the data cloud, and the best line is the one lying along the cloud, perpendicular to that thin axis.

Visual Beginner

The table contrasts the two fits on the single question that separates them: which direction the error is allowed to point.

Fit Error measured as Inputs assumed Best for
Ordinary least squares vertical drop to the line exact (noise-free) only the output is noisy
Total least squares perpendicular distance to the line noisy, like the outputs both input and output noisy

The difference is a right angle. Ordinary least squares drops straight down from each point; total least squares drops along the shortest path, which meets the line at a right angle. When the data points are far from the line, or when the inputs are very noisy, the two fitted lines can tilt noticeably apart.

The geometry is the whole lesson. Ordinary least squares answers "how far down to the line," total least squares answers "how far to the line in any direction." The second is the honest question whenever the horizontal coordinate is itself a measurement.

Worked example Beginner

Fit a line through the origin, , to two data points and , where both coordinates are measured and noisy.

First the ordinary fit, which trusts the values. It minimises the squared vertical gaps. The slope that does this is the ratio of the cross term to the input energy: . So the ordinary line is .

Now the total fit, which trusts neither coordinate. Stack the data into a wide table with the inputs in the first column and the outputs in the second:

The total fit asks for the thin direction of this cloud — the direction in which the two columns vary least together. The stretch factors of are about and about , and the thin direction (the smallest-stretch direction) turns out to point roughly along in the input-output plane. Reading the slope off that thin direction gives , a touch steeper than the ordinary .

What this tells us: when you stop trusting the values, the best line tilts. The total fit found a slightly steeper line because it let the points move sideways as well as up and down to meet it. With only two points the shift is small, but with noisy inputs across many points the bias the ordinary fit carries can be large, and the total fit removes it.

Check your understanding Beginner

Formal definition Intermediate+

Let with and , and consider the overdetermined system . Ordinary least squares, treated in 43.04.01, perturbs only : it solves subject to , charging error to the right-hand side alone. The total least squares (TLS) problem charges error to both and . It seeks the smallest perturbation of the augmented data that makes the perturbed system consistent:

where is the Frobenius norm. The minimising , when it exists and is unique, is the TLS solution. The constraint says the perturbed augmented matrix has the vector in its kernel, hence is rank-deficient. So TLS is the problem of finding the nearest rank-deficient augmented matrix in Frobenius norm, then reading the solution off its kernel.

The SVD solution. Write the SVD (developed in 01.01.12) of the augmented matrix as

By the Eckart-Young theorem 01.01.12, the nearest rank- matrix in Frobenius norm is obtained by zeroing , and the kernel of that nearest matrix is spanned by , the last right singular vector. Partition with and its final entry. When , the TLS solution is

Genericity and the degenerate case. The construction requires , and uniqueness requires (a simple smallest singular value of the augmented matrix). Equivalently, the genericity condition is : the smallest singular value of alone strictly exceeds the smallest singular value of the augmented matrix. When the minimiser is non-unique, and when (the last right singular vector is orthogonal to the -coordinate) no finite TLS solution exists — the nearest rank-deficient augmented matrix decouples from .

The closed form. When the genericity condition holds, the TLS solution admits the closed form

This is the ordinary normal-equations solution with the smallest augmented singular value subtracted from the diagonal of — a deregularisation: where Tikhonov regularisation adds a positive multiple of to stabilise, TLS subtracts , which is why TLS amplifies noise more than ordinary least squares and why must stay positive definite for the solution to exist.

The generalised SVD. The generalised singular value decomposition (GSVD) factors a matrix pair rather than a single matrix. For and sharing columns, with the stacked matrix of full column rank, there exist orthogonal , and a nonsingular with

where and (rectangular diagonal) carry non-negative entries normalised so that . The ratios are the generalised singular values of the pair ; they are exactly the square roots of the eigenvalues of the pencil , linking the GSVD to the generalised eigenvalue problem cross-referenced in 43.06.10. When , the generalised singular values reduce to the ordinary singular values of .

Counterexamples to common slips

  • TLS is not ordinary least squares with perturbed too. Ordinary least squares perturbs but holds fixed; TLS perturbs the whole augmented matrix and minimises the joint Frobenius norm of . The two coincide only when has error-free columns, i.e. when the right model is mixed LS-TLS, not full TLS.
  • The TLS solution need not exist. If the last right singular vector of has a zero in its -coordinate (), the formula is undefined and the problem has no finite solution. This is a genuine feature of the perpendicular-distance geometry, not a numerical artefact.
  • The closed form subtracts ; it does not add it. Writing would be Tikhonov-type regularisation, the opposite stabilising operation. TLS deregularises, so it is more, not less, sensitive than ordinary least squares.
  • The GSVD's is not orthogonal in general. Only and are orthogonal; is merely nonsingular. Demanding orthogonal would force and to be simultaneously diagonalisable by the same orthogonal change of basis, which fails for a generic pair.

Key theorem with proof Intermediate+

Theorem (the TLS solution from the smallest singular triple). Let , , , and let be a singular value decomposition with . Suppose the genericity condition holds and the last right singular vector has . Then the total-least-squares problem has the unique solution , the minimal perturbation has Frobenius norm , and satisfies the closed form . [Golub, G. H. & Van Loan, C. F. — An analysis of the total least squares problem]

Proof. The constraint says the vector lies in the kernel of . A matrix has a nonzero kernel vector exactly when it is rank-deficient, so the TLS problem is: minimise over all for which has rank at most (one less than its columns), subject to the kernel containing a vector with last entry .

By the Eckart-Young theorem 01.01.12, among all rank- matrices the one nearest in Frobenius norm is

obtained by setting , and the minimal distance is . The genericity condition makes this nearest rank- matrix unique. Its kernel is one-dimensional, spanned by the right singular vector associated with the zeroed singular value, since .

A kernel vector usable as a solution must have its last coordinate normalisable to , which requires . Under that hypothesis, scale by to obtain the kernel vector . The first coordinates give , and the constraint is met with of Frobenius norm .

For the closed form, the eigen-relation written in block form with reads

The top block gives , that is . Dividing by and using yields . The genericity condition forces , so the singular values of all exceed , hence is positive definite and invertible, giving the stated closed form.

Bridge. This theorem builds toward the unified view of fitting as low-rank approximation, and it appears again in 43.06.10, where the same pencil reappears as the generalised eigenproblem the GSVD diagonalises. The foundational reason TLS reduces to an SVD is the Eckart-Young theorem of 01.01.12: minimising a Frobenius perturbation subject to a rank drop is exactly the optimal low-rank approximation problem, and the constraint that the kernel carry a vector with last entry is what converts the abstract kernel direction into an affine solution . Putting these together, ordinary least squares of 43.04.01 and total least squares are two faces of one optimisation — column-space projection versus nearest-rank-deficient augmentation — and the closed forms differ only by the term , which is the central insight that TLS is a deregularised normal-equations solve. The TLS route generalises the single-matrix SVD to the augmented matrix, and the GSVD generalises it again to a matrix pair, so the bridge is the same low-rank-approximation principle widening from one matrix to two.

Exercises Intermediate+

Advanced results Master

Theorem (orthogonal-distance optimality of TLS). Let the rows of be the data points . The TLS fit minimises the sum of squared orthogonal (perpendicular) distances from the points to the hyperplane through the origin with unit normal , whereas ordinary least squares minimises the sum of squared distances measured only along the -coordinate axis. Concretely, the orthogonal distance from to the hyperplane is , and

minimised over unit vectors by the smallest right singular vector. [Van Huffel, S. & Vandewalle, J. — The Total Least Squares Problem: Computational Aspects and Analysis] The vector is the unit normal that the data cloud is thinnest along; the fitted relation rearranges, after extracting the -coordinate, into . This is the precise sense in which TLS is orthogonal distance regression: it is principal-component fitting of a hyperplane to the augmented data, the dual of ordinary least squares' axis-aligned residual.

Theorem (the GSVD via the CS decomposition; Paige-Saunders form). Let , with of column rank . Form a thin QR (or any orthonormal basis) with having orthonormal columns and nonsingular. The CS decomposition of the column-orthonormal supplies orthogonal and an orthogonal with

Setting recovers and likewise , with , . [Paige, C. C. & Saunders, M. A. — Towards a generalized singular value decomposition] The CS decomposition guarantees , so the generalised singular values are well-defined whenever , with encoding the directions in . The Paige-Saunders construction needs no rank assumption on or individually — only on the stack — which is why it is the numerically stable route, computing the GSVD without ever forming the cross-products or that would square the conditioning as in 43.04.01.

Theorem (GSVD diagonalises the symmetric-definite pencil). With the GSVD , and of full column rank, the columns of are generalised eigenvectors of the pencil :

so and are simultaneously diagonal. [Van Loan, C. F. — Generalizing the singular value decomposition] This is the constructive, floating-point-stable form of the simultaneous diagonalisation of two quadratic forms whose existence theory the corpus carries in 01.01.19, and it is exactly the symmetric-definite specialisation of the generalised eigenproblem and QZ algorithm of 43.06.10. The GSVD thereby solves constrained and weighted least squares — minimising subject to fixed, or the Tikhonov problem — by decoupling the two quadratic forms into the GSVD coordinates, where the regularised solution reads off termwise as .

Synthesis. The total-least-squares problem and the generalised SVD are one widening of the singular value decomposition, and the foundational reason both reduce to it is the Eckart-Young low-rank principle of 01.01.12: TLS is the nearest-rank-deficient augmentation of a single matrix, while the GSVD is the simultaneous orthogonal reduction of a pair. The central insight is that perpendicular-distance fitting is principal-component analysis of the augmented data — the residual that ordinary least squares of 43.04.01 measures along one axis becomes, in TLS, the thinnest direction of the whole cloud, and this is exactly the smallest singular triple of . Putting these together, the deregularising closed form is dual to Tikhonov regularisation's stabilising , and the GSVD unifies both by diagonalising the pencil — the same pencil whose general (non-definite) case the QZ algorithm of 43.06.10 resolves and whose existence theory 01.01.19 supplies. The bridge is the CS decomposition: it computes the GSVD from an orthonormal basis of the stacked range without forming a single cross-product, which generalises the conditioning lesson of 43.04.01 from one matrix to a pair, so the whole subject is the SVD applied first to an augmented matrix and then to a pencil, with orthogonal transformations protecting the conditioning at every step.

Full proof set Master

Proposition (genericity controls existence and uniqueness of the TLS solution). Let have singular values and write . The TLS problem has a unique solution if and only if and the last right singular vector has ; a sufficient condition for the latter is the strict interlacing .

Proof. Uniqueness of the nearest rank- matrix in Frobenius norm holds exactly when the singular value being zeroed is strictly separated from the one above it, (Eckart-Young uniqueness, 01.01.12); otherwise the minimising rank- matrix lies in a positive-dimensional family obtained by rotating within the singular subspace, and the kernel direction is not unique. Given , the kernel of the nearest rank- matrix is the line , and a finite TLS solution exists iff this line is transverse to the hyperplane , i.e. iff . For the sufficient condition, the singular values of interlace those of (a column-bordered matrix), giving ; in particular . If this interlacing is strict, , then is not a singular value of , so cannot have : a vector in the smallest right singular subspace of would satisfy , contradicting that achieves the minimum . Hence and the solution exists uniquely.

Proposition (the minimal perturbation is explicit and attains ). Under the genericity condition, the optimal correction is the rank-one outer product , with , and .

Proof. The nearest rank- matrix is , so the correction is , a rank-one matrix. Its Frobenius norm is since the singular vectors are unit. The kernel relation scales, on dividing by , to , i.e. , which is the consistency constraint . No smaller correction works, because any with leaves of full column rank by the Eckart-Young distance-to-singularity bound, hence with kernel reduced to the zero vector, hence inconsistent for every .

Proposition (the GSVD generalised singular values are the pencil eigenvalues). For a pair with of full column rank and of full column rank, the scalars are exactly the eigenvalues of the symmetric-definite pencil , with the GSVD columns of as generalised eigenvectors.

Proof. From , , and similarly . Multiplying both by on the right and on the left gives and , simultaneously diagonal. For the -th column of , the relation follows by comparing the -th columns of and : each equals times resp. , so . Full column rank of makes positive definite, so the pencil is regular and its finite eigenvalues are precisely the .

Connections Master

  • The entire low-rank machinery this unit rests on — the existence and uniqueness of the SVD, the Eckart-Young best-rank- approximation theorem that turns TLS into a smallest-singular-triple computation, and the dyadic expansion — is proved in 01.01.12; this unit specialises that theory to the augmented matrix and to a matrix pair, neither of which the foundational SVD unit treats.

  • The ordinary least-squares problem that TLS reframes — the normal equations, the three solver algorithms, and the conditioning analysis — is the subject of 43.04.01, and the deregularising closed form is exactly that unit's normal-equations solve with the smallest augmented singular value subtracted from the diagonal, making explicit why TLS amplifies noise more than ordinary least squares.

  • The GSVD diagonalises the symmetric-definite pencil , whose existence theory as simultaneous diagonalisation of two quadratic forms is 01.01.19 and whose general, possibly non-definite, numerical resolution by the QZ algorithm is 43.06.10; this unit supplies the SVD-based constructive form that sits between the abstract existence statement and the full pencil algorithm.

Historical & philosophical context Master

The errors-in-variables idea predates its modern numerical form: orthogonal-distance line fitting was studied by Adcock in 1878 and by Pearson in his 1901 paper on lines and planes of closest fit to systems of points, where the principal-axis construction that underlies total least squares first appears in statistics. The numerical-linear-algebra treatment is due to Golub and Van Loan, whose 1980 paper An analysis of the total least squares problem (SIAM Journal on Numerical Analysis 17, 883-893) gave the SVD solution from the smallest singular triple, the closed form, and the genericity and non-existence conditions in the form used here [Golub, G. H. & Van Loan, C. F. — An analysis of the total least squares problem]. The name total least squares is theirs.

The generalised singular value decomposition was introduced by Van Loan in 1976 as a simultaneous diagonalisation of a matrix pair, and recast by Paige and Saunders in 1981 into the CS-decomposition form that requires no rank assumption on the individual blocks and computes without cross-products [Paige, C. C. & Saunders, M. A. — Towards a generalized singular value decomposition]. Van Huffel and Vandewalle's 1991 monograph collected the computational theory of total least squares, including its multiple-right-hand-side and degenerate variants, and made the link between the TLS problem and the GSVD of the pair explicit.

Bibliography Master

@book{golub-vanloan-2013-tls,
  author    = {Golub, Gene H. and Van Loan, Charles F.},
  title     = {Matrix Computations},
  edition   = {4th},
  publisher = {Johns Hopkins University Press},
  year      = {2013},
  address   = {Baltimore},
  note      = {Sec. 6.3 (total least squares), Sec. 8.7.3-8.7.4 (the generalized SVD)}
}

@article{golub-vanloan-1980,
  author  = {Golub, Gene H. and Van Loan, Charles F.},
  title   = {An analysis of the total least squares problem},
  journal = {SIAM Journal on Numerical Analysis},
  volume  = {17},
  number  = {6},
  pages   = {883--893},
  year    = {1980}
}

@book{vanhuffel-vandewalle-1991,
  author    = {Van Huffel, Sabine and Vandewalle, Joos},
  title     = {The Total Least Squares Problem: Computational Aspects and Analysis},
  series    = {Frontiers in Applied Mathematics},
  volume    = {9},
  publisher = {Society for Industrial and Applied Mathematics},
  year      = {1991},
  address   = {Philadelphia}
}

@article{paige-saunders-1981,
  author  = {Paige, Christopher C. and Saunders, Michael A.},
  title   = {Towards a generalized singular value decomposition},
  journal = {SIAM Journal on Numerical Analysis},
  volume  = {18},
  number  = {3},
  pages   = {398--405},
  year    = {1981}
}

@article{vanloan-1976,
  author  = {Van Loan, Charles F.},
  title   = {Generalizing the singular value decomposition},
  journal = {SIAM Journal on Numerical Analysis},
  volume  = {13},
  number  = {1},
  pages   = {76--83},
  year    = {1976}
}

@article{pearson-1901,
  author  = {Pearson, Karl},
  title   = {On lines and planes of closest fit to systems of points in space},
  journal = {Philosophical Magazine, Series 6},
  volume  = {2},
  number  = {11},
  pages   = {559--572},
  year    = {1901}
}