43.06.03 · numerical-analysis / 06-eigenvalue-algorithms

The QR algorithm for eigenvalues, with shifts

shipped3 tiersLean: none

Anchor (Master): Trefethen-Bau *Numerical Linear Algebra* Lectures 28–29; Golub-Van Loan *Matrix Computations* (4th ed.) §7.3 (the unsymmetric QR algorithm, the Francis double-shift), §8.3 (the symmetric tridiagonal QR with the Wilkinson shift); Watkins *The Matrix Eigenvalue Problem: GR and Krylov Subspace Methods* (SIAM, 2007) Ch. 4 (the implicit GR/QR framework and bulge-chasing); Parlett *The Symmetric Eigenvalue Problem* (SIAM Classics, 1998) §8.10–§8.14 (the QL/QR algorithm with shifts)

Intuition Beginner

Suppose you want every eigenvalue of a square matrix at once, not just the biggest one. There is a startlingly simple recipe that delivers them all. Split the matrix into two pieces, a rotation piece and a staircase piece, then multiply those two pieces back together in the opposite order. You get a new matrix. Repeat.

The splitting is the QR factorisation you already know: a matrix becomes an orthonormal rotation times an upper-triangular staircase. The twist is that after splitting, you swap the order and multiply them the other way. This produces a fresh matrix with the same eigenvalues as the one you started with. Do this over and over and something remarkable happens: the matrix slowly settles into triangular shape, and once it is triangular the eigenvalues are sitting right there along the diagonal.

Why does swapping the factors help? Each round is secretly running power iteration on every direction at once, so the matrix sorts its own eigenvalues onto the diagonal from largest to smallest. Plain swapping is slow, though. To speed it up you subtract a clever guess, a shift, before each split and add it back afterward. A good shift makes the bottom corner of the matrix lock onto one eigenvalue almost instantly, after which you peel that eigenvalue off and shrink the problem. This is the workhorse that real software uses.

Visual Beginner

The picture shows one full round of the recipe and then the long-run trend. On the left, a matrix is split into a rotation and a triangular staircase . In the middle, the two are multiplied back in swapped order, times , to make the next matrix. On the right, after many rounds, the entries below the diagonal have faded toward zero, leaving the eigenvalues lined up on the diagonal.

   One QR step                       Long-run behaviour
                                     (entries below diagonal fade)

   A  =  Q  ·  R                      x x x x x        L1  *  *  *  *
                                      x x x x x         . L2  *  *  *
   next  =  R  ·  Q                   x x x x x   ===>   .  . L3  *  *
                                      x x x x x          .  .  . L4 *
   (same eigenvalues as A)           x x x x x           .  .  .  . L5

Two things are worth seeing. First, swapping the factors never changes the eigenvalues, because is just a balanced reshuffle of that keeps the spectrum fixed. Second, the diagonal entries sort themselves so that the largest eigenvalue settles into the top-left and the smallest into the bottom-right; the below-diagonal entries shrink fastest where two neighbouring eigenvalues are far apart in size.

Worked example Beginner

Take the symmetric matrix

whose eigenvalues are and . We run one unshifted QR step by hand.

Step 1. Factor with a rotation and upper-triangular. The first column has length . The rotation that aligns this column with the first axis is

so that . You can check the first column: times returns .

Step 2. Swap and multiply: form . Computing the product gives

Step 3. Read off the progress. The top-left entry moved from toward , the bottom-right from toward , and the off-diagonal entry shrank from to .

What this tells us: a single factor-and-swap pushed the diagonal toward the true eigenvalues and and shrank the off-diagonal entry. A few more rounds drive that off-diagonal entry to nearly zero, at which point the diagonal reads the eigenvalues directly.

Check your understanding Beginner

Formal definition Intermediate+

Let with . The QR factorisation of an invertible matrix is the decomposition with unitary and upper-triangular with positive real diagonal, unique under that normalisation, in the sense of 01.01.09.

Definition (unshifted QR iteration). Set . For , compute the QR factorisation and form

Because , each iterate is a unitary similarity of the previous one, so all share the spectrum of . Writing and , the iteration accumulates .

Definition (shifted QR iteration). Given a sequence of shifts , set and for each factor the shifted matrix and undo the shift after swapping:

Again , so the spectrum is preserved while the shift steers the convergence. The Rayleigh-quotient shift takes , the trailing diagonal entry. The Wilkinson shift takes to be the eigenvalue of the trailing submatrix

that is closer to the corner entry (with a fixed tie-breaking rule), which keeps the iteration real and convergent even when the corner entry alone would stall.

Definition (deflation). When a subdiagonal entry falls below a tolerance proportional to , it is set to zero, splitting the matrix into two smaller blocks A_k = \begin{psmallmatrix} B & * \\ 0 & C \end{psmallmatrix} whose eigenvalues are computed independently; the algorithm continues on the smaller trailing block.

Notation: denote the accumulated unitary and triangular factors; is the entry of the -th iterate; a matrix is upper-Hessenberg when its entries vanish below the first subdiagonal, in the sense of 43.06.02; is the unit roundoff. Throughout, the iteration is understood to run on the Hessenberg (or symmetric tridiagonal) reduction of , not on the dense .

Counterexamples to common slips

  • The QR algorithm is not Gram-Schmidt applied once: forming a single time and reading the diagonal of gives the pivots of a factorisation, not the eigenvalues. The eigenvalues appear only in the limit of the iterated factor-and-swap, and on the diagonal of the converging , never on the diagonal of any single .
  • An orthogonal matrix with eigenvalues on the unit circle, for example a rotation by an irrational angle, has for every pair, so the unshifted iteration does not converge to triangular form without shifts; the equal-modulus eigenvalues stall exactly as power iteration stalls.
  • A real matrix with a genuinely complex-conjugate eigenvalue pair cannot converge to a real triangular matrix; the real QR algorithm converges instead to real quasi-triangular (real Schur) form, with a block on the diagonal carrying the conjugate pair. Insisting on a fully triangular real limit is the error the Francis double-shift is designed to avoid.

Key theorem with proof Intermediate+

Theorem (QR iteration is simultaneous power iteration; convergence to Schur form; Trefethen-Bau Lecture 28 [source pending]; Golub-Van Loan §7.3 [source pending]). Let and run the unshifted QR iteration , . Then for every ,

the QR factorisation of the matrix power . Consequently the orthonormal columns of are the orthonormalised iterates of simultaneous power iteration applied to . If is diagonalisable with eigenvalues of distinct moduli and the eigenvector and inverse-eigenvector matrices admit LU factorisations without pivoting, then converges to upper-triangular (Schur) form, with for at the rate , and .

Proof. The identity is proved by induction. For , . Assume . Using (the accumulated-similarity identity) and ,

Multiply on the left by and substitute :

since . As is unitary and upper-triangular, this is the QR factorisation of , completing the induction.

For convergence, write the eigendecomposition with ordered by descending modulus. Then . Factor and (the assumed pivot-free LU), so

The matrix has entry ; since is unit-lower-triangular and for , every below-diagonal entry decays like , so with . Hence . The QR factorisation of is therefore, up to a unitary diagonal phase, , and the corresponding , which is upper-triangular because and are. The off-diagonal entry for inherits the decay rate of , and the diagonal entries converge to the in order.

Bridge. This theorem builds toward the shifted algorithm of the Advanced results, where the same simultaneous-power-iteration mechanism is accelerated: subtracting a shift replaces each ratio by , and choosing near the smallest eigenvalue drives the corner subdiagonal to zero superlinearly. The convergence rate that governed power iteration in 43.06.01 appears again here, now controlling every off-diagonal entry at once, which is exactly why the QR algorithm computes the whole spectrum at the cost of finding one eigenvalue. The foundational reason the factor-and-swap works is the power identity : the accumulated is the orthonormalised basis of the Krylov-like image , so the QR iteration is power iteration on a whole flag of subspaces simultaneously. This is the central insight that unifies the single-vector methods with the production eigensolver, and putting these together, the Hessenberg reduction of 43.06.02 is what makes each sweep affordable, so the bridge is the recognition that one cheap finite reduction plus one cheap-per-step iteration computes the Schur form that the static spectral theory of 01.01.08 only asserts to exist.

Exercises Intermediate+

Advanced results Master

Theorem (cubic local convergence of the Wilkinson-shifted symmetric tridiagonal QR; Wilkinson Ch. 8 [source pending]; Parlett §8.13 [source pending]). Let be unreduced symmetric tridiagonal, and run the QR iteration with the Wilkinson shift equal to the eigenvalue of the trailing block nearest the corner entry . Then the trailing subdiagonal entry converges to zero cubically: there is a constant with

once the iteration enters a neighbourhood of a simple eigenvalue, and the iteration is globally convergent.

The mechanism is the connection to Rayleigh-quotient iteration of 43.06.01: a shifted QR step performs one step of inverse iteration with shift on the trailing coordinate direction, and the Wilkinson shift is a second-order-accurate estimate of the corner eigenvalue, the symmetric analogue of the Rayleigh-quotient shift. The corner subdiagonal entry measures the sine of the angle between the last coordinate direction and the converging invariant subspace; the second-order accuracy of the shift contributes one factor of that sine, and the inverse-iteration contraction contributes the square, so the angle, and with it the subdiagonal entry, contracts cubically. The plain Rayleigh-quotient shift also gives cubic local convergence but can fail to converge globally — it stalls on matrices such as \begin{psmallmatrix} 0 & 1 \\ 1 & 0 \end{psmallmatrix} whose corner entry is equidistant from both eigenvalues — whereas the Wilkinson shift, by taking a true eigenvalue of the trailing block, resolves the tie and converges in every case.

Theorem (the Francis implicit double-shift; Francis 1961 [source pending]; Watkins Ch. 4 [source pending]). Let be unreduced upper-Hessenberg, and let be a complex-conjugate shift pair (the eigenvalues of the trailing block when they are complex). The real matrix is real, and the double QR step , $H' = Q^ H QMe_1H3 \times 3n-2H'$ equals the explicit double-shifted iterate.*

The implicit double-shift is what makes the unsymmetric QR algorithm practical in real arithmetic: it applies a complex-conjugate shift pair using only real operations, never forming the complex matrix , and it costs the same per sweep as a single real shift. Its correctness rests on the implicit-Q theorem of 43.06.02: an unreduced Hessenberg form is determined up to a diagonal unitary by its first column, so reproducing the correct first transformation and then restoring the band by bulge-chasing yields exactly the matrix the explicit double-step would produce. The bulge — a small triangle of nonzeros protruding below the subdiagonal after the first reflector — is pushed down and off the bottom of the matrix one row at a time, each reflector reintroducing the bulge one position lower until it falls off, leaving a clean Hessenberg matrix.

Theorem (backward stability of the QR algorithm). The QR algorithm, run on the Hessenberg reduction in floating-point arithmetic, is backward stable: the computed eigenvalues are the exact eigenvalues of a matrix with . Each QR sweep is a composition of orthogonal transformations, each applied in floating point with a backward error bounded by a low-degree polynomial in times ; orthogonal transformations have unit condition number, so the errors accumulate additively rather than multiplicatively across sweeps, and the computed eigenvalues are accurate to the conditioning of the eigenproblem, the Bauer-Fike bound of 43.06.04 for non-normal and the unit-conditioning of 43.01.02 for the Hermitian case.

Synthesis. The QR algorithm is the simultaneous, orthogonal, and shifted descendant of the single-vector methods, and the power identity is the foundational reason it computes the entire Schur form at once: the accumulated unitary factor is the orthonormalised basis of the iterated subspaces , so the unshifted iteration is simultaneous power iteration on a full flag, sorting the eigenvalues onto the diagonal at the spectral-gap rates . This is exactly the rate that governed the single-vector power iteration of 43.06.01, now acting on every coordinate, and the shifted iteration generalises it by replacing each ratio with ; the central insight is that a shifted QR step is one step of inverse iteration on the trailing direction, so a second-order-accurate shift — the Rayleigh-quotient shift, sharpened to the Wilkinson shift for global convergence — buys cubic local convergence by compounding the shift accuracy with the inverse-iteration contraction. Putting these together, the Hessenberg reduction of 43.06.02 supplies a band that every sweep preserves and the implicit-Q theorem makes rigid enough to drive the iteration implicitly, so the Francis double-shift applies a complex-conjugate shift pair in real arithmetic by bulge-chasing; the bridge is the recognition that the static existence of the Schur form in 01.01.13 becomes a constructive, backward-stable, cubically-convergent algorithm precisely by iterating an orthogonal factor-and-swap whose convergence is read from the same spectral gaps that drive every method in this chapter.

Full proof set Master

Proposition (the power-iteration identity). For the unshifted QR iteration, with unitary and upper-triangular.

Proof. Induct on . For , . Assume . The accumulated similarity holds because telescopes. From ,

Then , with upper-triangular as a product of upper-triangular factors and unitary as a product of unitaries.

Proposition (convergence rate of the unshifted iteration). Under the distinct-modulus and pivot-free-LU hypotheses of the Key theorem, the below-diagonal entries of satisfy for , so tends to upper-triangular Schur form.

Proof. With , , and , the proof of the Key theorem gives with for and zero on and above the diagonal, so where . The unitary factor of the QR factorisation of is , where is the unitary factor of , and because and the QR factorisation depends continuously and to first order linearly on the matrix near the identity. Then , and is upper-triangular, so , with the correction in entry scaling like . The below-diagonal entries, zero in the limit, decay at that rate.

Proposition (shifted step preserves spectrum and Hessenberg form). The shifted update with is a unitary similarity of ; if is upper-Hessenberg then so is .

Proof. , so , a unitary similarity preserving the spectrum. For the band structure, is upper-Hessenberg (subtracting touches only the diagonal). Its QR factor is upper-Hessenberg, being upper-Hessenberg times upper-triangular; and is upper-triangular times upper-Hessenberg, hence upper-Hessenberg, by the index argument of Exercise 7. Adding affects only the diagonal, so is upper-Hessenberg.

Proposition (the implicit-Q reduction is well-defined). Let be unreduced upper-Hessenberg and let be the first column of a real polynomial . The Householder reflector mapping to a multiple of , followed by the bulge-chasing reflectors restoring Hessenberg form, produces $H' = (Q_0 \cdots Q_{n-2})^ H (Q_0 \cdots Q_{n-2})$ equal to the explicit double-shifted iterate, up to a unitary diagonal.*

Proof. Let be the accumulated orthogonal matrix of the implicit sweep. By construction is a multiple of , since and each subsequent bulge-chasing reflector fixes the first coordinate. The explicit sweep factors and forms ; its first column is a multiple of as well, because . Both and are unreduced upper-Hessenberg (the bulge-chase restores the band; the explicit is Hessenberg because is a polynomial in and the QR sweep preserves the band). Their first columns of the transforming matrices agree up to scale: . By the implicit-Q (uniqueness) theorem of 43.06.02, two unitary matrices reducing to unreduced Hessenberg form whose first columns agree up to a unimodular scalar differ by a diagonal unitary, so for a diagonal unitary , and equals the explicit iterate up to that diagonal similarity.

Connections Master

The QR algorithm is the production endpoint of the eigenvalue chapter and rests directly on the Hessenberg/tridiagonal reduction of 43.06.02: the iteration is never run on the dense but on its band reduction, where the band-invariance proposition guarantees every sweep returns another Hessenberg (or tridiagonal) matrix, holding the per-sweep cost at — or in the symmetric tridiagonal case — and the implicit-Q theorem proved there is exactly what licenses the bulge-chasing implicit double-shift.

The convergence engine of the QR algorithm is the power iteration of 43.06.01: the unshifted iteration is simultaneous orthogonal power iteration via the identity , the off-diagonal decay rates are the same spectral-gap ratios that govern single-vector power iteration, and the Wilkinson shift that produces cubic local convergence is the Rayleigh-quotient shift of that unit transplanted into the orthogonal-similarity setting, with a shifted QR step acting as one step of inverse iteration on the trailing coordinate.

The QR factorisation at the heart of every sweep is the Gram-Schmidt / Householder factorisation built and analysed in 01.01.09, reused here inside an iteration rather than once; the unit upper-triangular limit the iteration converges to is a constructive realisation of the Schur form whose static existence is the spectral and triangularisation theory of 01.01.13 and 01.01.08.

The accuracy of the computed eigenvalues is governed by the eigenvalue-conditioning theory: for non-normal matrices the relevant bound is the Bauer-Fike theorem of 43.06.04, with the factor measuring how far from orthogonal the eigenvector basis is, while the backward-error framework of 43.01.02 turns the backward stability of the orthogonal sweeps into forward-error guarantees.

The generalised eigenproblem is solved by the QZ algorithm of 43.06.10, the direct descendant of the QR algorithm: rather than reducing (which can be unstable when is ill-conditioned), QZ reduces the pair simultaneously to generalised Schur (Hessenberg-triangular then quasi-triangular) form by two-sided orthogonal equivalences, applying the same implicit-shift bulge-chasing machinery to a matrix pencil; the present unit is the standard eigenproblem special case .

Historical & philosophical context Master

The QR algorithm was discovered independently around 1961 by John G. F. Francis in England and Vera N. Kublanovskaya in the Soviet Union. Francis, working at the National Research Development Corporation, published the QR transformation in two parts in The Computer Journal in 1961–62, and his second paper already contained the two decisive practical refinements: the use of shifts to accelerate convergence and the implicit double-shift bulge-chasing implementation on an upper-Hessenberg matrix that allows a complex-conjugate shift pair to be applied in real arithmetic [Francis 1961]. Kublanovskaya arrived at the unshifted iteration from the related LR transformation of Rutishauser, whose 1958 algorithm factored and formed using the (unstable) LU factorisation; replacing LU by the orthogonal QR factorisation was the change that made the method numerically sound [Rutishauser 1958].

Wilkinson, in The Algebraic Eigenvalue Problem (1965), supplied the convergence theory and the shift analysis, including the shift now bearing his name — the eigenvalue of the trailing block nearest the corner entry — which he showed restores global convergence where the plain corner-entry shift can stall, while retaining cubic local convergence for symmetric tridiagonal matrices [Wilkinson 1965]. The symmetric specialisation and its divide-and-conquer successors were developed by Parlett and others, and the modern understanding of the implicit shift as a nested subspace iteration, unifying the QR algorithm with the broader GR family, is due to Watkins [Watkins 2007]. The QR algorithm was named one of the ten algorithms of the twentieth century by Computing in Science and Engineering in 2000, and it remains the method underlying the eigenvalue routines of LAPACK and every major numerical library.

Bibliography Master

@article{Francis1961,
  author  = {Francis, John G. F.},
  title   = {The {QR} Transformation, Parts {I} and {II}},
  journal = {The Computer Journal},
  volume  = {4},
  number  = {3--4},
  year    = {1961--1962},
  pages   = {265--271, 332--345}
}

@article{Kublanovskaya1962,
  author  = {Kublanovskaya, Vera N.},
  title   = {On Some Algorithms for the Solution of the Complete Eigenvalue Problem},
  journal = {USSR Computational Mathematics and Mathematical Physics},
  volume  = {1},
  number  = {3},
  year    = {1962},
  pages   = {637--657}
}

@article{Rutishauser1958,
  author  = {Rutishauser, Heinz},
  title   = {Solution of Eigenvalue Problems with the {LR}-Transformation},
  journal = {National Bureau of Standards Applied Mathematics Series},
  volume  = {49},
  year    = {1958},
  pages   = {47--81}
}

@book{Wilkinson1965,
  author    = {Wilkinson, James H.},
  title     = {The Algebraic Eigenvalue Problem},
  publisher = {Oxford University Press},
  address   = {Oxford},
  year      = {1965}
}

@book{Parlett1998,
  author    = {Parlett, Beresford N.},
  title     = {The Symmetric Eigenvalue Problem},
  publisher = {SIAM},
  series    = {Classics in Applied Mathematics},
  year      = {1998}
}

@book{Watkins2007,
  author    = {Watkins, David S.},
  title     = {The Matrix Eigenvalue Problem: {GR} and {K}rylov Subspace Methods},
  publisher = {SIAM},
  address   = {Philadelphia},
  year      = {2007}
}

@book{TrefethenBau1997,
  author    = {Trefethen, Lloyd N. and Bau, David},
  title     = {Numerical Linear Algebra},
  publisher = {SIAM},
  address   = {Philadelphia},
  year      = {1997}
}

@book{GolubVanLoan2013,
  author    = {Golub, Gene H. and Van Loan, Charles F.},
  title     = {Matrix Computations},
  edition   = {4th},
  publisher = {Johns Hopkins University Press},
  address   = {Baltimore},
  year      = {2013}
}