01.01.09 · foundations / linear-algebra

Gram-Schmidt orthonormalisation and finite-dim inner-product space

shipped3 tiersLean: none

Anchor (Master): Shilov — *Linear Algebra* Ch. 8; Halmos — *Finite-Dimensional Vector Spaces* §I.6 and §II.13; Axler — *Linear Algebra Done Right* Ch. 6; Golub-Van Loan — *Matrix Computations* (4th ed.) Ch. 5; Trefethen-Bau — *Numerical Linear Algebra* Lectures 7–10; Björck — *Numerical Methods for Least Squares Problems* (SIAM, 1996) Ch. 2

Intuition Beginner

An inner product on a vector space is a way of measuring two things at once: the length of a vector, and the angle between two vectors. The basic example is the dot product on the plane, where . The length of comes from dotting it with itself; the angle between two vectors comes from the same dot product divided by the lengths. Any space with this kind of two-vector measurement is called an inner-product space.

A set of vectors is orthonormal when every vector has length one and every two distinct vectors are perpendicular. Orthonormal bases are the most pleasant bases. Coordinates are computed by a single dot product, not by solving a system. Distance is computed by the sum of squared coordinates. Every familiar geometry fact carries through without modification.

The Gram-Schmidt procedure is the recipe for turning any linearly independent set of vectors into an orthonormal one with the same span. Start with the first vector and scale it to length one. For each next vector, remove the part that already points along the previous orthonormal vectors, then scale what is left to length one. The output is an orthonormal set, and the partial spans match the partial spans of the original list.

Visual Beginner

The figure shows the three stages of Gram-Schmidt for a pair of vectors in the plane. On the left, the input pair and . In the middle, has been decomposed into its component along (the projection) and the perpendicular complement . On the right, both vectors have been rescaled to length one, producing an orthonormal pair and with a right angle between them.

The geometry is the whole story. Subtracting the projection along an already-chosen direction kills the overlap and leaves only the perpendicular component. Repeating direction by direction produces an orthonormal frame, and the order in which vectors were chosen determines which subspaces the frame spans at each stage.

Worked example Beginner

Apply Gram-Schmidt to the pair and in the plane with the ordinary dot product.

Step 1. Scale to length one. The length of is . Divide by to get .

Step 2. Compute the dot product of with . We get . The component of along is .

Step 3. Subtract the component along from . The perpendicular part is .

Step 4. Scale to length one. The length of is . Divide by to get .

The output is the orthonormal pair . Check: each vector has length one, and their dot product is . The two of them span the same plane as , since each is in the span of the first inputs and vice versa.

What this tells us: Gram-Schmidt converts a slanted pair of vectors into the standard horizontal-and-vertical frame whenever the first input already points along the horizontal axis. The procedure does not pick the simplest frame in general; it picks the frame that respects the order of the inputs.

Check your understanding Beginner

Formal definition Intermediate+

Let denote either or and let be a finite-dimensional vector space over in the sense of 01.01.03. An inner product on is a map $$ \langle \cdot, \cdot \rangle : V \times V \to F $$ satisfying three axioms.

(i) Linearity in the first slot. For all and , $$ \langle \alpha u + \beta v, w \rangle = \alpha \langle u, w \rangle + \beta \langle v, w \rangle. $$

(ii) Conjugate symmetry. For all , $$ \langle v, w \rangle = \overline{\langle w, v \rangle}, $$ where the bar denotes complex conjugation (which is the identity in the real case). Combined with (i), this forces conjugate-linearity in the second slot: .

(iii) Positive definiteness. For all , is a non-negative real number, with iff .

The pair is a finite-dimensional inner-product space. The norm induced by the inner product is $$ |v| = \sqrt{\langle v, v \rangle}, $$ a non-negative real number by axiom (iii). Two vectors are orthogonal, written , when . A subset is orthogonal if every pair of distinct vectors in is orthogonal; is orthonormal if it is orthogonal and every vector in has norm one. An orthonormal set is automatically linearly independent: if , then inner-producting with gives for every .

The standard examples. (a) with the standard inner product . (b) (real polynomials of degree at most ) with . (c) with the Frobenius inner product , where is the conjugate transpose.

Counterexamples to common slips

  • The bilinear form on is symmetric but not positive definite: the vector has . This is a non-degenerate indefinite form (the Lorentz form), not an inner product. The positive-definiteness axiom is what distinguishes an inner product from a general symmetric bilinear form, in the sense of 01.01.15.
  • In the complex case, the form (without conjugation) is symmetric and bilinear but not positive definite: the vector has even though . Conjugation in the second slot is what makes a non-negative real.
  • The bilinear pairing on the larger space of continuous functions is positive definite, but the same pairing on the larger space of (equivalence classes of) Lebesgue-integrable functions is positive definite only modulo functions vanishing almost everywhere. The completion of is the Hilbert space , which is the natural infinite-dimensional setting in which the Legendre polynomials become a complete orthonormal basis.

Key theorem with proof Intermediate+

Theorem (Gram-Schmidt orthonormalisation; Shilov §8.3). Let be a finite-dimensional inner-product space over or , and let be a linearly independent sequence in . Define inductively, for , $$ u_k = v_k - \sum_{j < k} \langle v_k, e_j \rangle , e_j, \qquad e_k = \frac{u_k}{|u_k|}. $$ Then is an orthonormal sequence, and $$ \mathrm{span}{e_1, \ldots, e_k} = \mathrm{span}{v_1, \ldots, v_k} $$ for every . In particular, every finite-dimensional inner-product space admits an orthonormal basis.

Proof. We argue by induction on .

Base case . Linear independence of forces , so and is well-defined. Then by the homogeneity , and since and are non-zero scalar multiples of one another.

Inductive step. Suppose is orthonormal and . Define by the formula. Two facts to check.

First, . If , then , contradicting linear independence of . So and is well-defined and has norm one.

Second, is orthogonal to each with . Compute $$ \langle u_k, e_i \rangle = \langle v_k, e_i \rangle - \sum_{j < k} \langle v_k, e_j \rangle , \langle e_j, e_i \rangle = \langle v_k, e_i \rangle - \langle v_k, e_i \rangle = 0, $$ where the middle equality uses from the inductive orthonormality of . Dividing by preserves the vanishing, so .

Third, span preservation. By construction, is a linear combination of and , which is a linear combination of and by the inductive span equality. So , giving . Conversely, , and the earlier for are in by the inductive span equality. So , and the two spans coincide.

The induction terminates at with an orthonormal sequence whose span matches that of the input. When the input is a basis of , the output is an orthonormal basis. Since every finite-dimensional vector space has a basis (Steinitz exchange, 01.01.04), every finite-dimensional inner-product space has an orthonormal basis.

Bridge. Gram-Schmidt builds toward every theorem in finite-dimensional inner-product geometry. The foundational reason it works is exactly the iterated projection picture: subtracting the projection onto an already-chosen orthonormal partial frame kills the overlap and leaves the perpendicular component, which is then normalised to length one. This is exactly the construction that appears again in 01.01.12 (singular value decomposition), where the spectral theorem on produces an orthonormal eigenbasis whose role is identical to the orthonormal basis produced here. The central insight is that finite-dimensional inner-product spaces are flexible enough to always admit an orthonormal basis, and Gram-Schmidt is the constructive proof. Putting these together, the existence of an orthonormal basis identifies with as an inner-product space, generalises to the Iwasawa decomposition of the general linear group, and is dual to the polar decomposition of operators. The bridge is the QR factorisation , which writes Gram-Schmidt at the matrix level and supplies the algorithmic content for least-squares regression and numerical conditioning of linear systems.

Exercises Intermediate+

Advanced results Master

Theorem (orthogonal projection; Shilov §8.5). Let be a subspace of a finite-dimensional inner-product space and an orthonormal basis of . The map $$ P_W : V \to W, \qquad P_W(v) = \sum_{i=1}^k \langle v, e_i \rangle , e_i $$ is the unique linear map satisfying , , and for all . The orthogonal complement is a subspace, and as inner-product spaces, with the splitting realised by . The map minimises over — the closest point in to is , and the minimum distance squared is .

The projection-and-complement decomposition is the geometric content of Gram-Schmidt: each in the recursion is precisely the projection of onto the orthogonal complement of , and the recursion successively builds the projector onto the running partial span.

Theorem (Bessel's inequality and Parseval's identity). Let be a finite-dimensional inner-product space with orthonormal basis . For every , $$ \sum_{i=1}^n |\langle v, e_i \rangle|^2 = |v|^2, $$ the Parseval identity. For every orthonormal subset with , $$ \sum_{i=1}^k |\langle v, e_i \rangle|^2 \le |v|^2, $$ the Bessel inequality, with equality iff . Each inner product is the -th Fourier coefficient of in the orthonormal basis.

Parseval gives the unitary identification as inner-product spaces via the orthonormal-basis isomorphism , the finite-dimensional analogue of the unitary identification underlying classical Fourier analysis.

Theorem (QR decomposition; Golub-Van Loan Ch. 5). Let with and full column rank. There exist a matrix with orthonormal columns ($Q^ Q = I_nR \in F^{n \times n}A = Q RAQ\langle v_k, e_j \ranglej < k|u_k|R$.*

The QR decomposition is the algorithmic embodiment of Gram-Schmidt and is the foundation of every linear least-squares method: has unique solution , computed by one back-substitution. Numerical-analysis variants (modified Gram-Schmidt, Householder reflectors, Givens rotations) produce the same in exact arithmetic but with very different numerical behaviour in finite precision.

Theorem (modified Gram-Schmidt and numerical stability; Björck 1967). Classical Gram-Schmidt loses orthogonality in finite-precision arithmetic in proportion to the condition number : after one pass through the algorithm in double precision, the computed may have $|\hat Q^ \hat Q - I| \sim \kappa(A) \cdot \epsilon_{\mathrm{machine}}Ak\langle v_k, e_j \rangleu_k^{(0)} = v_ku_k^{(j+1)} = u_k^{(j)} - \langle u_k^{(j)}, e_{j+1} \rangle e_{j+1}|\hat Q^* \hat Q - I| \sim \sqrt{\kappa(A)} \cdot \epsilon_{\mathrm{machine}}A$. Reorthogonalisation (one additional pass) restores full orthogonality at the cost of a factor of two in work.*

The modified procedure is mathematically identical to the classical one in exact arithmetic and produces the same and , but the order of operations matters in floating point. Modern numerical-linear-algebra codes default to Householder QR (which is backwards-stable independent of conditioning) for general problems and to modified Gram-Schmidt for problems where the orthonormal vectors must be produced incrementally (Krylov-subspace methods, Arnoldi iteration).

Theorem (Householder reflectors as QR; Householder 1958). For every non-zero vector there exists a unitary matrix (the Householder reflector associated to ) of the form $H = I - 2 (w w^)/(w^* w)wxH_x x = \pm |x| e_1e_1AH_n \cdots H_1H_n \cdots H_1 AQ = H_1^* \cdots H_n^\epsilon_{\mathrm{machine}}\kappa(A)$.

Householder QR is the algorithm of choice for dense least-squares problems. Givens rotations (Givens 1958) are the analogous tool for sparse problems, where the zeroing is done one entry at a time by a rotation acting on a chosen pair of rows; the workload is per zero rather than , advantageous when most entries are already zero and only a few sub-diagonals need annihilating.

Theorem (Cholesky factorisation as right-looking Gram-Schmidt). Let be positive definite (Hermitian with for all ). There exists a unique lower triangular matrix with positive diagonal entries such that $A = L L^A(\langle v_i, v_j \rangle){v_1, \ldots, v_n}L{v_i}{e_i}$.*

Cholesky is "Gram-Schmidt without the vectors": it operates on the Gram matrix directly without ever materialising the underlying vectors . In numerical analysis, the Cholesky factorisation is the preferred route to solving positive-definite linear systems (twice as fast as on a general matrix) and is the workhorse of Gaussian-elimination-based estimation in statistics, optimisation, and finite-element methods.

Theorem (separable Hilbert space and the orthonormal basis). Let be a separable Hilbert space (an inner-product space whose induced metric is complete and that admits a countable dense subset). Then admits a countable orthonormal basis in the sense that every has a Fourier expansion , convergent in norm, with (Parseval). The Gram-Schmidt procedure applied to any countable linearly independent dense set produces such an orthonormal basis.

The infinite-dimensional case is the natural setting for classical Fourier analysis and quantum mechanics. The classical-orthogonal-polynomial families — Legendre on with constant weight, Hermite on with weight, Laguerre on with weight, Chebyshev on with weight — are all Gram-Schmidt outputs on the monomial basis under the corresponding weighted inner products, and they form complete orthonormal bases of the relevant spaces.

Synthesis. Gram-Schmidt is the foundational reason every finite-dimensional inner-product space admits an orthonormal basis, and the QR decomposition is exactly the matrix-level packaging of that procedure. The central insight is that orthonormality is constructively reachable from any linearly independent input by iterating projection-and-rescale, and the order in which inputs are processed determines the span filtration of the output. Putting these together, the QR decomposition, the Cholesky factorisation, the Householder and Givens algorithms, and the modified Gram-Schmidt of Björck are different presentations of one structural fact: the matrix group admits the Iwasawa decomposition with the unitary group, the diagonal-positive subgroup, and the strictly-upper-triangular unipotent subgroup, and Gram-Schmidt is the constructive proof of this decomposition. This is exactly the bridge that appears again in 01.01.12 (singular value decomposition), where the spectral theorem on produces the orthonormal singular-vector frame that plays the role of the Gram-Schmidt output, and the resulting generalises to arbitrary rectangular matrices. The bridge is that QR identifies with the homogeneous space on which the unitary group acts simply transitively from the left; SVD does the same for as a double coset .

The duality between the geometric statement (orthonormal basis) and the algebraic statement (QR factorisation) is a recurring pattern. The geometric statement says every finite-dimensional inner-product space is unitarily isomorphic to with the standard inner product. The algebraic statement says every full-rank rectangular matrix factors as orthonormal-times-triangular. Both are content-free in the language of abstract algebra, both have constructive proofs via Gram-Schmidt, and both have numerically realised algorithms in computational linear algebra. The numerical and abstract presentations diverge in finite precision: classical Gram-Schmidt is unstable, the modified version is better, and Householder is best, even though all three compute the same factorisation in exact arithmetic. The numerical-analysis side of this story — Björck's backward-error analysis, the loss-of-orthogonality bound , the rounding-error theory of Wilkinson — is the foundational reason QR-based least squares replaced normal-equation least squares in scientific computing in the 1960s.

Full proof set Master

Proposition (Cauchy-Schwarz with full equality clause). In an inner-product space over or , $$ |\langle v, w \rangle| \le |v| , |w| $$ for all , with equality iff is linearly dependent.

Proof. If , both sides vanish and is dependent. Assume . Set and compute $$ 0 \le |v - \lambda w|^2 = \langle v - \lambda w, v - \lambda w \rangle = \langle v, v \rangle - \lambda \langle w, v \rangle - \overline{\lambda} \langle v, w \rangle + |\lambda|^2 \langle w, w \rangle. $$ Substituting and analogously for the third term gives $$ 0 \le \langle v, v \rangle - 2 \frac{|\langle v, w \rangle|^2}{\langle w, w \rangle} + \frac{|\langle v, w \rangle|^2}{\langle w, w \rangle} = \langle v, v \rangle - \frac{|\langle v, w \rangle|^2}{\langle w, w \rangle}. $$ Multiply by to obtain . Taking square roots gives the inequality. Equality holds iff , iff is a scalar multiple of , iff is linearly dependent.

Proposition (triangle inequality from Cauchy-Schwarz). For all in an inner-product space, .

Proof. Compute $$ |v + w|^2 = \langle v + w, v + w \rangle = |v|^2 + \langle v, w \rangle + \langle w, v \rangle + |w|^2 = |v|^2 + 2 , \mathrm{Re}\langle v, w \rangle + |w|^2. $$ The real part bound (Cauchy-Schwarz) gives $$ |v + w|^2 \le |v|^2 + 2 |v| |w| + |w|^2 = (|v| + |w|)^2. $$ Taking non-negative square roots gives the triangle inequality.

Proposition (Gram-Schmidt theorem; full statement). Let be a finite-dimensional inner-product space and linearly independent. The recursion , produces an orthonormal with for each .

Proof. Given in the Intermediate-tier key theorem. The induction on uses linear independence at each step to ensure , computes for by direct expansion using orthonormality of the previous , and verifies the span equality by writing each side as a linear combination of the other's generators.

Proposition (QR existence and uniqueness). For with and full column rank, there exist unique with $Q^ Q = I_nR \in F^{n \times n}A = Q R$.*

Proof. Existence. Apply Gram-Schmidt to the columns of . By the previous proposition, the orthonormal output satisfies . Stack the as columns of and define by on the diagonal, for , and for . Then column-by-column, by orthonormality, and by linear independence.

Uniqueness. Suppose are two such factorisations. Then . The right side is upper triangular with positive diagonal (product of two upper triangulars with positive diagonal). The left side has orthonormal columns: (using , which holds because is the orthogonal projector onto and has columns in that subspace). So is an unitary matrix that is upper triangular with positive real diagonal. Lemma: such a matrix is the identity. Proof of lemma: write . Unitarity gives for each row . Upper-triangularity gives for , so and . Column-unitarity gives , and the same upper-triangular constraint forces , so . Combined: each , hence (positive real diagonal). Then , so for , and the matrix is the identity. From we get and then .

Proposition (existence of orthonormal basis; corollary). Every finite-dimensional inner-product space admits an orthonormal basis.

Proof. Pick any basis of (existence by 01.01.04). Apply Gram-Schmidt. The output is orthonormal and spans since its span equals . A spanning orthonormal set in an -dimensional space is a basis (orthonormal implies linearly independent, vectors in an -space).

Connections Master

  • Vector space 01.01.03. A finite-dimensional inner-product space is a finite-dimensional vector space carrying the extra structure of a positive-definite symmetric (or conjugate-symmetric) bilinear pairing. Every inner-product-space construction in this unit reduces to vector-space operations together with one extra map , and the orthonormal basis produced by Gram-Schmidt is in particular a basis in the sense of 01.01.04.

  • Subspace, basis, dimension 01.01.04. Gram-Schmidt is the constructive proof that every finite-dimensional inner-product space admits an orthonormal basis. The existence of some basis is the Steinitz exchange theorem from 01.01.04; Gram-Schmidt upgrades the existence statement to a constructive recipe in the presence of an inner product, and the span-preservation property at each step refines the bare basis statement to a flag-of-subspaces statement.

  • Bilinear and quadratic forms 01.01.15. An inner product is a positive-definite symmetric (or conjugate-symmetric) bilinear form, and the norm is the square root of the associated quadratic form. The classification of bilinear forms over by signature (Sylvester's law of inertia) singles out the positive-definite case as the unique signature class, and Gram-Schmidt is the diagonalisation procedure for that class.

  • Singular value decomposition 01.01.12. The SVD generalises the QR decomposition from full-column-rank matrices to arbitrary rectangular matrices and replaces the upper-triangular factor by the diagonal . The orthonormal singular-vector frames and play the same structural role as the Gram-Schmidt output, and the existence proof of SVD uses the spectral theorem on , which itself depends on the existence of an orthonormal eigenbasis — produced by Gram-Schmidt applied to the eigenspace decomposition.

  • Eigenvalue, eigenvector, characteristic polynomial 01.01.08. The finite-dimensional spectral theorem says every self-adjoint operator on a finite-dimensional inner-product space admits an orthonormal eigenbasis. The standard proof picks a normalised eigenvector, splits the space as (orthogonal-complement decomposition supplied by Gram-Schmidt applied to a basis containing ), and iterates. Gram-Schmidt is the structural input.

Historical & philosophical context Master

The orthogonalisation procedure now associated with Gram and Schmidt has roots in the least-squares method of Legendre 1805 (Nouvelles méthodes pour la détermination des orbites des comètes) and Gauss 1809 (Theoria motus corporum coelestium). Pierre-Simon Laplace's 1820 Théorie analytique des probabilités used what is recognisably a Gram-Schmidt-like recursion to orthogonalise observation equations in the analysis of geodetic data, though the procedure was not isolated as an independent object. Jørgen Pedersen Gram's 1883 Journal für die reine und angewandte Mathematik paper "Ueber die Entwickelung reeller Functionen in Reihen mittelst der Methode der kleinsten Quadrate" (94, 41–73) [Gram 1883] gave the procedure in the context of orthogonal polynomial expansions and the moment problem on the real line, with the explicit formula as the iterated-projection recursion. Gram's motivation was the least-squares fitting of polynomial approximations to empirical data; the orthogonalisation step was the device that decoupled the normal equations.

Erhard Schmidt's 1907 Mathematische Annalen paper "Zur Theorie der linearen und nichtlinearen Integralgleichungen, I. Teil" (63, 433–476) [Schmidt 1907] reframed the procedure in the abstract Hilbert-space setting that David Hilbert was then developing for the spectral theory of integral equations. Schmidt's paper introduced the orthonormal basis of as a foundational object, applied Gram-Schmidt to produce orthonormal bases out of arbitrary linearly independent sequences in Hilbert space, and used the resulting Fourier-coefficient expansion to diagonalise compact self-adjoint integral operators (the Hilbert-Schmidt theorem). The textbook-standard formulation of the procedure dates from this paper. The attribution "Gram-Schmidt" is conventional; both Gram and Schmidt acknowledged Laplace as a precursor, and the German-language tradition often called the procedure simply "Orthogonalisierungsverfahren."

The numerical-analysis story is twentieth-century. James H. Wilkinson's 1965 The Algebraic Eigenvalue Problem identified the loss-of-orthogonality phenomenon in floating-point Gram-Schmidt as a serious obstacle to scientific computation. Alston Householder's 1958 Journal of the ACM paper "Unitary triangularization of a nonsymmetric matrix" (5, 339–342) [Householder 1958] introduced the reflector-based QR algorithm whose backward-stability bounds are independent of the conditioning of . Wallace Givens's 1958 Journal of the SIAM paper "Computation of plane unitary rotations transforming a general matrix to triangular form" (6, 26–50) [Givens 1958] introduced the rotation-based QR for sparse matrices. Åke Björck's 1967 BIT Numerical Mathematics paper "Solving linear least squares problems by Gram-Schmidt orthogonalization" (7, 1–21) [Björck 1967] presented the modified Gram-Schmidt procedure together with its backward-error analysis, establishing the loss-of-orthogonality bound that justifies its use in Krylov-subspace methods. Golub-Van Loan's Matrix Computations (first edition 1983; fourth edition 2013) [GolubVanLoan2013] became the canonical synthesis. The QR algorithm for eigenvalue computation (Francis 1961, Kublanovskaya 1961) — distinct from the QR decomposition but sharing the orthogonal-triangular factorisation as its inner step — completed the place of orthonormalisation as the foundational primitive of numerical linear algebra.

Bibliography Master

@article{Gram1883,
  author  = {Gram, J. P.},
  title   = {Ueber die Entwickelung reeller Functionen in Reihen mittelst der Methode der kleinsten Quadrate},
  journal = {Journal f\"ur die reine und angewandte Mathematik},
  volume  = {94},
  year    = {1883},
  pages   = {41--73}
}

@article{Schmidt1907,
  author  = {Schmidt, Erhard},
  title   = {Zur Theorie der linearen und nichtlinearen Integralgleichungen, I. Teil},
  journal = {Mathematische Annalen},
  volume  = {63},
  year    = {1907},
  pages   = {433--476}
}

@article{Householder1958,
  author  = {Householder, A. S.},
  title   = {Unitary triangularization of a nonsymmetric matrix},
  journal = {Journal of the ACM},
  volume  = {5},
  year    = {1958},
  pages   = {339--342}
}

@article{Givens1958,
  author  = {Givens, W.},
  title   = {Computation of plane unitary rotations transforming a general matrix to triangular form},
  journal = {Journal of the Society for Industrial and Applied Mathematics},
  volume  = {6},
  year    = {1958},
  pages   = {26--50}
}

@article{Bjorck1967,
  author  = {Bj\"orck, {\AA}ke},
  title   = {Solving linear least squares problems by Gram-Schmidt orthogonalization},
  journal = {BIT Numerical Mathematics},
  volume  = {7},
  year    = {1967},
  pages   = {1--21}
}

@book{GolubVanLoan2013,
  author    = {Golub, Gene H. and Van Loan, Charles F.},
  title     = {Matrix Computations},
  edition   = {4},
  publisher = {Johns Hopkins University Press},
  year      = {2013}
}

@book{HalmosFDVS,
  author    = {Halmos, Paul R.},
  title     = {Finite-Dimensional Vector Spaces},
  edition   = {2},
  publisher = {Van Nostrand},
  year      = {1958}
}

@book{ShilovLinearAlgebra,
  author    = {Shilov, Georgi E.},
  title     = {Linear Algebra},
  publisher = {Dover Publications},
  year      = {1977},
  note      = {Reprint of the 1971 Prentice-Hall edition; original Russian edition 1969}
}

@book{Axler2015,
  author    = {Axler, Sheldon},
  title     = {Linear Algebra Done Right},
  edition   = {3},
  publisher = {Springer},
  year      = {2015}
}

@book{TrefethenBau1997,
  author    = {Trefethen, Lloyd N. and Bau, David},
  title     = {Numerical Linear Algebra},
  publisher = {SIAM},
  year      = {1997}
}

@book{Bjorck1996,
  author    = {Bj\"orck, {\AA}ke},
  title     = {Numerical Methods for Least Squares Problems},
  publisher = {SIAM},
  year      = {1996}
}

@book{Wilkinson1965,
  author    = {Wilkinson, James H.},
  title     = {The Algebraic Eigenvalue Problem},
  publisher = {Clarendon Press},
  year      = {1965}
}

@article{Laplace1820,
  author  = {Laplace, Pierre-Simon},
  title   = {Th\'eorie analytique des probabilit\'es},
  journal = {3rd edition},
  publisher = {Courcier},
  year    = {1820}
}