43.06.03 · numerical-analysis / 06-eigenvalue-algorithms

The QR algorithm for eigenvalues, with shifts

shipped3 tiersLean: none

Anchor (Master): Trefethen-Bau *Numerical Linear Algebra* Lectures 28–29; Golub-Van Loan *Matrix Computations* (4th ed.) §7.3 (the unsymmetric QR algorithm, the Francis double-shift), §8.3 (the symmetric tridiagonal QR with the Wilkinson shift); Watkins *The Matrix Eigenvalue Problem: GR and Krylov Subspace Methods* (SIAM, 2007) Ch. 4 (the implicit GR/QR framework and bulge-chasing); Parlett *The Symmetric Eigenvalue Problem* (SIAM Classics, 1998) §8.10–§8.14 (the QL/QR algorithm with shifts)

Intuition Beginner

Suppose you want every eigenvalue of a square matrix at once, not just the biggest one. There is a startlingly simple recipe that delivers them all. Split the matrix into two pieces, a rotation piece and a staircase piece, then multiply those two pieces back together in the opposite order. You get a new matrix. Repeat.

The splitting is the QR factorisation you already know: a matrix becomes an orthonormal rotation times an upper-triangular staircase. The twist is that after splitting, you swap the order and multiply them the other way. This produces a fresh matrix with the same eigenvalues as the one you started with. Do this over and over and something remarkable happens: the matrix slowly settles into triangular shape, and once it is triangular the eigenvalues are sitting right there along the diagonal.

Why does swapping the factors help? Each round is secretly running power iteration on every direction at once, so the matrix sorts its own eigenvalues onto the diagonal from largest to smallest. Plain swapping is slow, though. To speed it up you subtract a clever guess, a shift, before each split and add it back afterward. A good shift makes the bottom corner of the matrix lock onto one eigenvalue almost instantly, after which you peel that eigenvalue off and shrink the problem. This is the workhorse that real software uses.

Visual Beginner

The picture shows one full round of the recipe and then the long-run trend. On the left, a matrix is split into a rotation $Q$ and a triangular staircase $R$ . In the middle, the two are multiplied back in swapped order, $R$ times $Q$ , to make the next matrix. On the right, after many rounds, the entries below the diagonal have faded toward zero, leaving the eigenvalues lined up on the diagonal.

   One QR step                       Long-run behaviour
                                     (entries below diagonal fade)

   A  =  Q  ·  R                      x x x x x        L1  *  *  *  *
                                      x x x x x         . L2  *  *  *
   next  =  R  ·  Q                   x x x x x   ===>   .  . L3  *  *
                                      x x x x x          .  .  . L4 *
   (same eigenvalues as A)           x x x x x           .  .  .  . L5

Two things are worth seeing. First, swapping the factors never changes the eigenvalues, because $R Q$ is just a balanced reshuffle of $QR$ that keeps the spectrum fixed. Second, the diagonal entries sort themselves so that the largest eigenvalue settles into the top-left and the smallest into the bottom-right; the below-diagonal entries shrink fastest where two neighbouring eigenvalues are far apart in size.

Worked example Beginner

Take the symmetric matrix

A = (2112),

whose eigenvalues are $3$ and $1$ . We run one unshifted QR step by hand.

Step 1. Factor $A = QR$ with $Q$ a rotation and $R$ upper-triangular. The first column $(2, 1)$ has length $2^{2} + 1^{2} = 5 \approx 2.236$ . The rotation that aligns this column with the first axis is

Q = \frac{1}{5} (21 - 1 2), R = \frac{1}{5} (5043),

so that $QR = A$ . You can check the first column: $Q$ times $(5, 0) / 5$ returns $(2, 1)$ .

Step 2. Swap and multiply: form $A_{1} = R Q$ . Computing the product gives

A_{1} = R Q = \frac{1}{5} (5043) (21 - 1 2) = (2.8 0.6 0.6 1.2) .

Step 3. Read off the progress. The top-left entry moved from $2$ toward $3$ , the bottom-right from $2$ toward $1$ , and the off-diagonal entry shrank from $1$ to $0.6$ .

What this tells us: a single factor-and-swap pushed the diagonal toward the true eigenvalues $3$ and $1$ and shrank the off-diagonal entry. A few more rounds drive that off-diagonal entry to nearly zero, at which point the diagonal reads the eigenvalues directly.

Check your understanding Beginner

Formal definition Intermediate+

Let $A \in M_{n} (F)$ with $F \in {R, C}$ . The QR factorisation of an invertible matrix $B$ is the decomposition $B = QR$ with $Q$ unitary and $R$ upper-triangular with positive real diagonal, unique under that normalisation, in the sense of 01.01.09.

Definition (unshifted QR iteration). Set $A_{0} = A$ . For $k = 0, 1, 2, \dots$ , compute the QR factorisation $A_{k} = Q_{k} R_{k}$ and form

A_{k + 1} = R_{k} Q_{k} .

Because $A_{k + 1} = R_{k} Q_{k} = Q_{k}^{*} (Q_{k} R_{k}) Q_{k} = Q_{k}^{*} A_{k} Q_{k}$ , each iterate is a unitary similarity of the previous one, so all $A_{k}$ share the spectrum of $A$ . Writing $\hat{Q}_{k} = Q_{0} Q_{1} \dots Q_{k - 1}$ and $\hat{R}_{k} = R_{k - 1} \dots R_{1} R_{0}$ , the iteration accumulates $A_{k} = \hat{Q}_{k}^{*} A \hat{Q}_{k}$ .

Definition (shifted QR iteration). Given a sequence of shifts $μ_{k} \in F$ , set $A_{0} = A$ and for each $k$ factor the shifted matrix and undo the shift after swapping:

A_{k} - μ_{k} I = Q_{k} R_{k}, A_{k + 1} = R_{k} Q_{k} + μ_{k} I .

Again $A_{k + 1} = Q_{k}^{*} A_{k} Q_{k}$ , so the spectrum is preserved while the shift steers the convergence. The Rayleigh-quotient shift takes $μ_{k} = (A_{k})_{nn}$ , the trailing diagonal entry. The Wilkinson shift takes $μ_{k}$ to be the eigenvalue of the trailing $2 \times 2$ submatrix

((A_{k})_{n - 1, n - 1} (A_{k})_{n, n - 1} (A_{k})_{n - 1, n} (A_{k})_{n, n})

that is closer to the corner entry $(A_{k})_{nn}$ (with a fixed tie-breaking rule), which keeps the iteration real and convergent even when the corner entry alone would stall.

Definition (deflation). When a subdiagonal entry $(A_{k})_{j + 1, j}$ falls below a tolerance proportional to $ϵ_{mach} (∣ (A_{k})_{j j} ∣ + ∣ (A_{k})_{j + 1, j + 1} ∣)$ , it is set to zero, splitting the matrix into two smaller blocks $A_k = \begin{psmallmatrix} B & * \\ 0 & C \end{psmallmatrix}$ whose eigenvalues are computed independently; the algorithm continues on the smaller trailing block.

Notation: $\hat{Q}_{k}, \hat{R}_{k}$ denote the accumulated unitary and triangular factors; $(A_{k})_{ij}$ is the $(i, j)$ entry of the $k$ -th iterate; a matrix is upper-Hessenberg when its entries vanish below the first subdiagonal, in the sense of 43.06.02; $ϵ_{mach}$ is the unit roundoff. Throughout, the iteration is understood to run on the Hessenberg (or symmetric tridiagonal) reduction of $A$ , not on the dense $A$ .

Counterexamples to common slips

The QR algorithm is not Gram-Schmidt applied once: forming $A = QR$ a single time and reading the diagonal of $R$ gives the pivots of a factorisation, not the eigenvalues. The eigenvalues appear only in the limit of the iterated factor-and-swap, and on the diagonal of the converging $A_{k}$ , never on the diagonal of any single $R_{k}$ .
An orthogonal matrix $A$ with eigenvalues on the unit circle, for example a rotation by an irrational angle, has $∣ λ_{i + 1} / λ_{i} ∣ = 1$ for every pair, so the unshifted iteration does not converge to triangular form without shifts; the equal-modulus eigenvalues stall exactly as power iteration stalls.
A real matrix with a genuinely complex-conjugate eigenvalue pair cannot converge to a real triangular matrix; the real QR algorithm converges instead to real quasi-triangular (real Schur) form, with a $2 \times 2$ block on the diagonal carrying the conjugate pair. Insisting on a fully triangular real limit is the error the Francis double-shift is designed to avoid.

Key theorem with proof Intermediate+

Theorem (QR iteration is simultaneous power iteration; convergence to Schur form; Trefethen-Bau Lecture 28 ^{[source pending]}; Golub-Van Loan §7.3 ^{[source pending]}). Let $A \in M_{n} (F)$ and run the unshifted QR iteration $A_{k} = Q_{k} R_{k}$ , $A_{k + 1} = R_{k} Q_{k}$ . Then for every $k$ ,

A^{k} = (Q_{0} Q_{1} \dots Q_{k - 1}) (R_{k - 1} \dots R_{1} R_{0}) = \hat{Q}_{k} \hat{R}_{k},

the QR factorisation of the matrix power $A^{k}$ . Consequently the orthonormal columns of $\hat{Q}_{k}$ are the orthonormalised iterates of simultaneous power iteration applied to $A$ . If $A$ is diagonalisable with eigenvalues of distinct moduli $∣ λ_{1} ∣ > ∣ λ_{2} ∣ > \dots > ∣ λ_{n} ∣$ and the eigenvector and inverse-eigenvector matrices admit LU factorisations without pivoting, then $A_{k}$ converges to upper-triangular (Schur) form, with $(A_{k})_{ij} \to 0$ for $i > j$ at the rate $∣ λ_{i} / λ_{j} ∣^{k}$ , and $(A_{k})_{ii} \to λ_{i}$ .

Proof. The identity $A^{k} = \hat{Q}_{k} \hat{R}_{k}$ is proved by induction. For $k = 1$ , $A = A_{0} = Q_{0} R_{0} = \hat{Q}_{1} \hat{R}_{1}$ . Assume $A^{k} = \hat{Q}_{k} \hat{R}_{k}$ . Using $A_{k} = \hat{Q}_{k}^{*} A \hat{Q}_{k}$ (the accumulated-similarity identity) and $A_{k} = Q_{k} R_{k}$ ,

A \hat{Q}_{k} = \hat{Q}_{k} A_{k} = \hat{Q}_{k} Q_{k} R_{k} = \hat{Q}_{k + 1} R_{k} .

Multiply $A^{k} = \hat{Q}_{k} \hat{R}_{k}$ on the left by $A$ and substitute $A \hat{Q}_{k} = \hat{Q}_{k + 1} R_{k}$ :

A^{k + 1} = A \hat{Q}_{k} \hat{R}_{k} = \hat{Q}_{k + 1} R_{k} \hat{R}_{k} = \hat{Q}_{k + 1} \hat{R}_{k + 1},

since $\hat{R}_{k + 1} = R_{k} \hat{R}_{k}$ . As $\hat{Q}_{k + 1}$ is unitary and $\hat{R}_{k + 1}$ upper-triangular, this is the QR factorisation of $A^{k + 1}$ , completing the induction.

For convergence, write the eigendecomposition $A = X Λ X^{- 1}$ with $Λ = diag (λ_{1}, \dots, λ_{n})$ ordered by descending modulus. Then $A^{k} = X Λ^{k} X^{- 1}$ . Factor $X = Q_{X} R_{X}$ and $X^{- 1} = LU$ (the assumed pivot-free LU), so

A^{k} = Q_{X} R_{X} Λ^{k} LU = Q_{X} R_{X} (Λ^{k} L Λ^{- k}) Λ^{k} U .

The matrix $Λ^{k} L Λ^{- k}$ has $(i, j)$ entry $L_{ij} (λ_{i} / λ_{j})^{k}$ ; since $L$ is unit-lower-triangular and $∣ λ_{i} ∣ < ∣ λ_{j} ∣$ for $i > j$ , every below-diagonal entry decays like $∣ λ_{i} / λ_{j} ∣^{k} \to 0$ , so $Λ^{k} L Λ^{- k} = I + E_{k}$ with $∥ E_{k} ∥ = O (max_{i > j} ∣ λ_{i} / λ_{j} ∣^{k})$ . Hence $A^{k} = Q_{X} R_{X} (I + E_{k}) Λ^{k} U$ . The QR factorisation of $A^{k}$ is therefore, up to a unitary diagonal phase, $\hat{Q}_{k} = Q_{X} \cdot (unitary part of I + E_{k}) \to Q_{X}$ , and the corresponding $A_{k} = \hat{Q}_{k}^{*} A \hat{Q}_{k} \to Q_{X}^{*} A Q_{X} = Q_{X}^{*} X Λ X^{- 1} Q_{X} = R_{X} Λ R_{X}^{- 1}$ , which is upper-triangular because $R_{X}$ and $Λ$ are. The off-diagonal entry $(A_{k})_{ij}$ for $i > j$ inherits the decay rate $∣ λ_{i} / λ_{j} ∣^{k}$ of $E_{k}$ , and the diagonal entries converge to the $λ_{i}$ in order. $□$

Bridge. This theorem builds toward the shifted algorithm of the Advanced results, where the same simultaneous-power-iteration mechanism is accelerated: subtracting a shift $μ$ replaces each ratio $∣ λ_{i} / λ_{j} ∣$ by $∣ (λ_{i} - μ) / (λ_{j} - μ) ∣$ , and choosing $μ$ near the smallest eigenvalue drives the corner subdiagonal to zero superlinearly. The convergence rate $∣ λ_{2} / λ_{1} ∣^{k}$ that governed power iteration in 43.06.01 appears again here, now controlling every off-diagonal entry at once, which is exactly why the QR algorithm computes the whole spectrum at the cost of finding one eigenvalue. The foundational reason the factor-and-swap works is the power identity $A^{k} = \hat{Q}_{k} \hat{R}_{k}$ : the accumulated $\hat{Q}_{k}$ is the orthonormalised basis of the Krylov-like image $A^{k}$ , so the QR iteration is power iteration on a whole flag of subspaces simultaneously. This is the central insight that unifies the single-vector methods with the production eigensolver, and putting these together, the Hessenberg reduction of 43.06.02 is what makes each sweep affordable, so the bridge is the recognition that one cheap finite reduction plus one cheap-per-step iteration computes the Schur form that the static spectral theory of 01.01.08 only asserts to exist.

Exercises Intermediate+

Exercise 3 (medium, symbolic).

Show that each iterate of the unshifted QR algorithm is unitarily similar to $A$ , and conclude that all $A_{k}$ have the same eigenvalues.

Hint

From $A_{k} = Q_{k} R_{k}$ , solve for $R_{k}$ and substitute into $A_{k + 1} = R_{k} Q_{k}$ .

Answer

From $A_{k} = Q_{k} R_{k}$ with $Q_{k}$ unitary, $R_{k} = Q_{k}^{*} A_{k}$ . Substituting into the update, $A_{k + 1} = R_{k} Q_{k} = Q_{k}^{*} A_{k} Q_{k}$ . Since $Q_{k}$ is unitary, $Q_{k}^{*} = Q_{k}^{- 1}$ , so $A_{k + 1} = Q_{k}^{- 1} A_{k} Q_{k}$ is a similarity of $A_{k}$ . By induction $A_{k} = (\hat{Q}_{k})^{*} A \hat{Q}_{k}$ with $\hat{Q}_{k} = Q_{0} \dots Q_{k - 1}$ unitary, so $A_{k}$ is unitarily similar to $A$ . Similar matrices share the characteristic polynomial $det (A_{k} - λ I) = det (\hat{Q}_{k}^{*} (A - λ I) \hat{Q}_{k}) = det (A - λ I)$ , hence the same eigenvalues with the same multiplicities. Rubric: full credit for the substitution, the similarity, and the spectral-invariance conclusion.

Exercise 4 (medium, symbolic).

Show that the shifted update $A_{k + 1} = R_{k} Q_{k} + μ_{k} I$ , where $A_{k} - μ_{k} I = Q_{k} R_{k}$ , is a unitary similarity of $A_{k}$ , so the shift does not change the spectrum.

Hint

Express $R_{k} Q_{k}$ as $Q_{k}^{*} (A_{k} - μ_{k} I) Q_{k}$ and then add back $μ_{k} I$ .

Answer

From $A_{k} - μ_{k} I = Q_{k} R_{k}$ , $R_{k} = Q_{k}^{*} (A_{k} - μ_{k} I)$ , so $R_{k} Q_{k} = Q_{k}^{*} (A_{k} - μ_{k} I) Q_{k} = Q_{k}^{*} A_{k} Q_{k} - μ_{k} I$ , using $Q_{k}^{*} Q_{k} = I$ . Adding back the shift, $A_{k + 1} = R_{k} Q_{k} + μ_{k} I = Q_{k}^{*} A_{k} Q_{k} - μ_{k} I + μ_{k} I = Q_{k}^{*} A_{k} Q_{k}$ . This is a unitary similarity, so $A_{k + 1}$ has the eigenvalues of $A_{k}$ , hence of $A$ . The shift accelerates convergence without altering the spectrum because it is subtracted before factoring and added back after swapping. Rubric: full credit for showing $A_{k + 1} = Q_{k}^{*} A_{k} Q_{k}$ and the spectral conclusion.

Exercise 7 (hard, symbolic).

Show that one unshifted QR step preserves upper-Hessenberg structure: if $H$ is upper-Hessenberg and invertible with $H = QR$ , then $\hat{H} = R Q$ is upper-Hessenberg. (This is why the iteration stays cheap.)

Hint

$Q = H R^{- 1}$ with $R^{- 1}$ upper-triangular; an upper-Hessenberg times an upper-triangular matrix is upper-Hessenberg, and so is upper-triangular times upper-Hessenberg.

Answer

From $H = QR$ with $R$ invertible upper-triangular, $Q = H R^{- 1}$ . For the product of upper-Hessenberg $H$ and upper-triangular $T = R^{- 1}$ , entry $(i, j)$ is $\sum_{m} h_{im} t_{mj}$ ; here $t_{mj} = 0$ for $m > j$ and $h_{im} = 0$ for $i > m + 1$ , so a nonzero term needs $i - 1 \leq m \leq j$ , impossible when $i > j + 1$ . Hence $Q = H R^{- 1}$ is upper-Hessenberg. Now $\hat{H} = R Q$ is upper-triangular times upper-Hessenberg: entry $(i, j)$ is $\sum_{m} r_{im} q_{mj}$ with $r_{im} = 0$ for $m < i$ and $q_{mj} = 0$ for $m > j + 1$ , so a nonzero term needs $i \leq m \leq j + 1$ , again impossible when $i > j + 1$ . Therefore $\hat{H}$ is upper-Hessenberg, and the band structure persists across the whole iteration, keeping each sweep $O (n^{2})$ . Rubric: full credit for both products and the index bookkeeping.

Exercise 8 (hard, symbolic).

Show that a shifted QR step with shift $μ$ is equivalent to one step of inverse iteration along the last coordinate. Specifically, prove that the last column of $\hat{H} = R Q + μ I$ minus $μ$ times the last basis vector relates to $(H - μ I)^{-*}$ acting on $e_{n}$ , explaining why a shift near an eigenvalue drives the trailing subdiagonal entry to zero quickly.

Hint

From $H - μ I = QR$ , the last row of $Q^{*}$ is proportional to $e_{n}^{*} (H - μ I)^{- 1} R$ ; track how the last column of $Q$ aligns with the eigenvector of the eigenvalue nearest $μ$ .

Answer

Write $H - μ I = QR$ , so $Q = (H - μ I) R^{- 1}$ and $Q^{*} = R^{-*} (H - μ I)^{*}$ . The first column of $Q$ is proportional to $(H - μ I)$ applied to $e_{1}$ (forward power iteration on $H - μ I$ ), while the last row of $Q^{*}$ , equivalently the conjugate of the last column of $Q$ , satisfies $e_{n}^{*} Q^{*} = e_{n}^{*} R^{-*} (H - μ I)^{*}$ ; since $R^{-*}$ is lower-triangular, $e_{n}^{*} R^{-*} = r_{nn}^{- 1} e_{n}^{*}$ , giving $e_{n}^{*} Q^{*} = r_{nn}^{- 1} e_{n}^{*} (H - μ I)^{*}$ . Thus the last column $q_{n}$ of $Q$ obeys $q_{n} = \overline{r_{nn}^{- 1}} (H - μ I)^{- 1}^{*}^{- 1} \dots$ ; more cleanly, the QR sweep moves $e_{n}$ under $(H - μ I)^{-*}$ , which is inverse iteration with shift $μ$ on the trailing direction. By the inverse-iteration theorem of 43.06.01, that direction converges to the left eigenvector of the eigenvalue $λ_{J}$ nearest $μ$ at rate $∣ λ_{J} - μ ∣/ δ$ , so the trailing subdiagonal entry $(\hat{H})_{n, n - 1}$ , which measures the misalignment of the last coordinate with an invariant subspace, decays at that rate; taking $μ$ very close to $λ_{J}$ makes the rate tiny and gives the superlinear deflation of the corner. Rubric: full credit for identifying the last-column action as inverse iteration with shift $μ$ and connecting the rate to $∣ λ_{J} - μ ∣/ δ$ .

Advanced results Master

Theorem (cubic local convergence of the Wilkinson-shifted symmetric tridiagonal QR; Wilkinson Ch. 8 ^{[source pending]}; Parlett §8.13 ^{[source pending]}). Let $T \in M_{n} (R)$ be unreduced symmetric tridiagonal, and run the QR iteration with the Wilkinson shift $μ_{k}$ equal to the eigenvalue of the trailing $2 \times 2$ block nearest the corner entry $(T_{k})_{nn}$ . Then the trailing subdiagonal entry $t_{n, n - 1}^{(k)}$ converges to zero cubically: there is a constant $c$ with

t_{n, n - 1}^{(k + 1)} \leq c t_{n, n - 1}^{(k)}^{3}

once the iteration enters a neighbourhood of a simple eigenvalue, and the iteration is globally convergent.

The mechanism is the connection to Rayleigh-quotient iteration of 43.06.01: a shifted QR step performs one step of inverse iteration with shift $μ_{k}$ on the trailing coordinate direction, and the Wilkinson shift is a second-order-accurate estimate of the corner eigenvalue, the symmetric analogue of the Rayleigh-quotient shift. The corner subdiagonal entry $t_{n, n - 1}^{(k)}$ measures the sine of the angle between the last coordinate direction and the converging invariant subspace; the second-order accuracy of the shift contributes one factor of that sine, and the inverse-iteration contraction contributes the square, so the angle, and with it the subdiagonal entry, contracts cubically. The plain Rayleigh-quotient shift $μ_{k} = (T_{k})_{nn}$ also gives cubic local convergence but can fail to converge globally — it stalls on matrices such as $\begin{psmallmatrix} 0 & 1 \\ 1 & 0 \end{psmallmatrix}$ whose corner entry is equidistant from both eigenvalues — whereas the Wilkinson shift, by taking a true eigenvalue of the $2 \times 2$ trailing block, resolves the tie and converges in every case.

Theorem (the Francis implicit double-shift; Francis 1961 ^{[source pending]}; Watkins Ch. 4 ^{[source pending]}). Let $H \in M_{n} (R)$ be unreduced upper-Hessenberg, and let $μ, \overset{μ}{ˉ}$ be a complex-conjugate shift pair (the eigenvalues of the trailing $2 \times 2$ block when they are complex). The real matrix $M = (H - μ I) (H - \overset{μ}{ˉ} I) = H^{2} - 2 Re (μ) H + ∣ μ ∣^{2} I$ is real, and the double QR step $M = QR$ , $H' = Q^ H Q $c anb ec a r r i e d o u t im pl i c i tl y : i t s u f f i ces t oco m p u t e t h e f i r s t co l u mn o f$ M $, b u i l d t h eH o u se h o l d er r e f l ec t or ma pp in g i tt o am u l t i pl eo f$ e_1 $, a ppl y t h er es u l t in g tw o - s hi f t s - a t - o n ce p er t u r ba t i o n t o$ H $, an d t h e n r es t or eH esse nb er g f or mb y c ha s in g t h e in t r o d u ce d$ 3 \times 3 $b u l g e d o w n t h es u b d ia g o na l w i t h$ n-2 $f u r t h er H o u se h o l d er r e f l ec t or s . T h er es u l t in g$ H'$ equals the explicit double-shifted iterate.*

The implicit double-shift is what makes the unsymmetric QR algorithm practical in real arithmetic: it applies a complex-conjugate shift pair using only real operations, never forming the complex matrix $H - μ I$ , and it costs the same $O (n^{2})$ per sweep as a single real shift. Its correctness rests on the implicit-Q theorem of 43.06.02: an unreduced Hessenberg form is determined up to a diagonal unitary by its first column, so reproducing the correct first transformation and then restoring the band by bulge-chasing yields exactly the matrix the explicit double-step would produce. The bulge — a small triangle of nonzeros protruding below the subdiagonal after the first reflector — is pushed down and off the bottom of the matrix one row at a time, each reflector reintroducing the bulge one position lower until it falls off, leaving a clean Hessenberg matrix.

Theorem (backward stability of the QR algorithm). The QR algorithm, run on the Hessenberg reduction in floating-point arithmetic, is backward stable: the computed eigenvalues are the exact eigenvalues of a matrix $A + E$ with $∥ E ∥ = O (ϵ_{mach} ∥ A ∥)$ . Each QR sweep is a composition of orthogonal transformations, each applied in floating point with a backward error bounded by a low-degree polynomial in $n$ times $ϵ_{mach} ∥ A ∥$ ; orthogonal transformations have unit condition number, so the errors accumulate additively rather than multiplicatively across sweeps, and the computed eigenvalues are accurate to the conditioning of the eigenproblem, the Bauer-Fike bound of 43.06.04 for non-normal $A$ and the unit-conditioning of 43.01.02 for the Hermitian case.

Synthesis. The QR algorithm is the simultaneous, orthogonal, and shifted descendant of the single-vector methods, and the power identity $A^{k} = \hat{Q}_{k} \hat{R}_{k}$ is the foundational reason it computes the entire Schur form at once: the accumulated unitary factor is the orthonormalised basis of the iterated subspaces $A^{k}$ , so the unshifted iteration is simultaneous power iteration on a full flag, sorting the eigenvalues onto the diagonal at the spectral-gap rates $∣ λ_{i} / λ_{j} ∣^{k}$ . This is exactly the rate that governed the single-vector power iteration of 43.06.01, now acting on every coordinate, and the shifted iteration generalises it by replacing each ratio with $∣ (λ_{i} - μ) / (λ_{j} - μ) ∣$ ; the central insight is that a shifted QR step is one step of inverse iteration on the trailing direction, so a second-order-accurate shift — the Rayleigh-quotient shift, sharpened to the Wilkinson shift for global convergence — buys cubic local convergence by compounding the shift accuracy with the inverse-iteration contraction. Putting these together, the Hessenberg reduction of 43.06.02 supplies a band that every sweep preserves and the implicit-Q theorem makes rigid enough to drive the iteration implicitly, so the Francis double-shift applies a complex-conjugate shift pair in real arithmetic by bulge-chasing; the bridge is the recognition that the static existence of the Schur form in 01.01.13 becomes a constructive, backward-stable, cubically-convergent algorithm precisely by iterating an orthogonal factor-and-swap whose convergence is read from the same spectral gaps that drive every method in this chapter.

Full proof set Master

Proposition (the power-iteration identity). For the unshifted QR iteration, $A^{k} = \hat{Q}_{k} \hat{R}_{k}$ with $\hat{Q}_{k} = Q_{0} \dots Q_{k - 1}$ unitary and $\hat{R}_{k} = R_{k - 1} \dots R_{0}$ upper-triangular.

Proof. Induct on $k$ . For $k = 1$ , $A = Q_{0} R_{0}$ . Assume $A^{k} = \hat{Q}_{k} \hat{R}_{k}$ . The accumulated similarity $A_{k} = \hat{Q}_{k}^{*} A \hat{Q}_{k}$ holds because $A_{j + 1} = Q_{j}^{*} A_{j} Q_{j}$ telescopes. From $A_{k} = Q_{k} R_{k}$ ,

A \hat{Q}_{k} = \hat{Q}_{k} A_{k} = \hat{Q}_{k} Q_{k} R_{k} = \hat{Q}_{k + 1} R_{k} .

Then $A^{k + 1} = A (A^{k}) = A \hat{Q}_{k} \hat{R}_{k} = \hat{Q}_{k + 1} R_{k} \hat{R}_{k} = \hat{Q}_{k + 1} \hat{R}_{k + 1}$ , with $\hat{R}_{k + 1} = R_{k} \hat{R}_{k}$ upper-triangular as a product of upper-triangular factors and $\hat{Q}_{k + 1}$ unitary as a product of unitaries. $□$

Proposition (convergence rate of the unshifted iteration). Under the distinct-modulus and pivot-free-LU hypotheses of the Key theorem, the below-diagonal entries of $A_{k}$ satisfy $(A_{k})_{ij} = O (∣ λ_{i} / λ_{j} ∣^{k})$ for $i > j$ , so $A_{k}$ tends to upper-triangular Schur form.

Proof. With $A = X Λ X^{- 1}$ , $X = Q_{X} R_{X}$ , and $X^{- 1} = LU$ , the proof of the Key theorem gives $A^{k} = Q_{X} R_{X} (I + E_{k}) Λ^{k} U$ with $(E_{k})_{ij} = L_{ij} (λ_{i} / λ_{j})^{k}$ for $i > j$ and zero on and above the diagonal, so $∥ E_{k} ∥ = O (ρ^{k})$ where $ρ = max_{i > j} ∣ λ_{i} / λ_{j} ∣ < 1$ . The unitary factor of the QR factorisation of $A^{k}$ is $\hat{Q}_{k} = Q_{X} W_{k}$ , where $W_{k}$ is the unitary factor of $R_{X} (I + E_{k}) R_{X}^{- 1}$ , and $W_{k} = I + O (ρ^{k})$ because $I + E_{k} = I + O (ρ^{k})$ and the QR factorisation depends continuously and to first order linearly on the matrix near the identity. Then $A_{k} = \hat{Q}_{k}^{*} A \hat{Q}_{k} = W_{k}^{*} (Q_{X}^{*} A Q_{X}) W_{k}$ , and $Q_{X}^{*} A Q_{X} = R_{X} Λ R_{X}^{- 1}$ is upper-triangular, so $A_{k} = R_{X} Λ R_{X}^{- 1} + O (ρ^{k})$ , with the $O (ρ^{k})$ correction in entry $(i, j)$ scaling like $∣ λ_{i} / λ_{j} ∣^{k}$ . The below-diagonal entries, zero in the limit, decay at that rate. $□$

Proposition (shifted step preserves spectrum and Hessenberg form). The shifted update $A_{k + 1} = R_{k} Q_{k} + μ_{k} I$ with $A_{k} - μ_{k} I = Q_{k} R_{k}$ is a unitary similarity of $A_{k}$ ; if $A_{k}$ is upper-Hessenberg then so is $A_{k + 1}$ .

Proof. $R_{k} Q_{k} = Q_{k}^{*} (A_{k} - μ_{k} I) Q_{k} = Q_{k}^{*} A_{k} Q_{k} - μ_{k} I$ , so $A_{k + 1} = Q_{k}^{*} A_{k} Q_{k}$ , a unitary similarity preserving the spectrum. For the band structure, $A_{k} - μ_{k} I$ is upper-Hessenberg (subtracting $μ_{k} I$ touches only the diagonal). Its QR factor $Q_{k} = (A_{k} - μ_{k} I) R_{k}^{- 1}$ is upper-Hessenberg, being upper-Hessenberg times upper-triangular; and $R_{k} Q_{k}$ is upper-triangular times upper-Hessenberg, hence upper-Hessenberg, by the index argument of Exercise 7. Adding $μ_{k} I$ affects only the diagonal, so $A_{k + 1}$ is upper-Hessenberg. $□$

Proposition (the implicit-Q reduction is well-defined). Let $H$ be unreduced upper-Hessenberg and let $p (H) e_{1}$ be the first column of a real polynomial $p (H) = (H - μ I) (H - \overset{μ}{ˉ} I)$ . The Householder reflector $Q_{0}$ mapping $p (H) e_{1}$ to a multiple of $e_{1}$ , followed by the bulge-chasing reflectors $Q_{1}, \dots, Q_{n - 2}$ restoring Hessenberg form, produces $H' = (Q_0 \cdots Q_{n-2})^ H (Q_0 \cdots Q_{n-2})$ equal to the explicit double-shifted iterate, up to a unitary diagonal.*

Proof. Let $Z = Q_{0} \dots Q_{n - 2}$ be the accumulated orthogonal matrix of the implicit sweep. By construction $Z e_{1}$ is a multiple of $p (H) e_{1}$ , since $Q_{0} e_{1} \propto p (H) e_{1}$ and each subsequent bulge-chasing reflector fixes the first coordinate. The explicit sweep factors $p (H) = QR$ and forms $H_{exp}^{'} = Q^{*} H Q$ ; its first column $Q e_{1}$ is a multiple of $p (H) e_{1}$ as well, because $Q e_{1} = p (H) R^{- 1} e_{1} \propto p (H) e_{1}$ . Both $Z^{*} H Z$ and $Q^{*} H Q$ are unreduced upper-Hessenberg (the bulge-chase restores the band; the explicit $Q^{*} H Q$ is Hessenberg because $p (H)$ is a polynomial in $H$ and the QR sweep preserves the band). Their first columns of the transforming matrices agree up to scale: $Z e_{1} \propto Q e_{1}$ . By the implicit-Q (uniqueness) theorem of 43.06.02, two unitary matrices reducing $H$ to unreduced Hessenberg form whose first columns agree up to a unimodular scalar differ by a diagonal unitary, so $Z = Q D$ for a diagonal unitary $D$ , and $H^{'} = Z^{*} H Z = D^{*} (Q^{*} H Q) D$ equals the explicit iterate up to that diagonal similarity. $□$

Connections Master

The QR algorithm is the production endpoint of the eigenvalue chapter and rests directly on the Hessenberg/tridiagonal reduction of 43.06.02: the iteration is never run on the dense $A$ but on its band reduction, where the band-invariance proposition guarantees every sweep returns another Hessenberg (or tridiagonal) matrix, holding the per-sweep cost at $O (n^{2})$ — or $O (n)$ in the symmetric tridiagonal case — and the implicit-Q theorem proved there is exactly what licenses the bulge-chasing implicit double-shift.

The convergence engine of the QR algorithm is the power iteration of 43.06.01: the unshifted iteration is simultaneous orthogonal power iteration via the identity $A^{k} = \hat{Q}_{k} \hat{R}_{k}$ , the off-diagonal decay rates $∣ λ_{i} / λ_{j} ∣^{k}$ are the same spectral-gap ratios that govern single-vector power iteration, and the Wilkinson shift that produces cubic local convergence is the Rayleigh-quotient shift of that unit transplanted into the orthogonal-similarity setting, with a shifted QR step acting as one step of inverse iteration on the trailing coordinate.

The QR factorisation $A_{k} = Q_{k} R_{k}$ at the heart of every sweep is the Gram-Schmidt / Householder factorisation built and analysed in 01.01.09, reused here inside an iteration rather than once; the unit upper-triangular limit the iteration converges to is a constructive realisation of the Schur form whose static existence is the spectral and triangularisation theory of 01.01.13 and 01.01.08.

The accuracy of the computed eigenvalues is governed by the eigenvalue-conditioning theory: for non-normal matrices the relevant bound is the Bauer-Fike theorem of 43.06.04, with the factor $κ (V)$ measuring how far from orthogonal the eigenvector basis is, while the backward-error framework of 43.01.02 turns the $O (ϵ_{mach} ∥ A ∥)$ backward stability of the orthogonal sweeps into forward-error guarantees.

The generalised eigenproblem $A x = λ B x$ is solved by the QZ algorithm of 43.06.10, the direct descendant of the QR algorithm: rather than reducing $B^{- 1} A$ (which can be unstable when $B$ is ill-conditioned), QZ reduces the pair $(A, B)$ simultaneously to generalised Schur (Hessenberg-triangular then quasi-triangular) form by two-sided orthogonal equivalences, applying the same implicit-shift bulge-chasing machinery to a matrix pencil; the present unit is the standard eigenproblem special case $B = I$ .

Historical & philosophical context Master

The QR algorithm was discovered independently around 1961 by John G. F. Francis in England and Vera N. Kublanovskaya in the Soviet Union. Francis, working at the National Research Development Corporation, published the QR transformation in two parts in The Computer Journal in 1961–62, and his second paper already contained the two decisive practical refinements: the use of shifts to accelerate convergence and the implicit double-shift bulge-chasing implementation on an upper-Hessenberg matrix that allows a complex-conjugate shift pair to be applied in real arithmetic ^{[Francis 1961]}. Kublanovskaya arrived at the unshifted iteration from the related LR transformation of Rutishauser, whose 1958 algorithm factored $A = L R$ and formed $R L$ using the (unstable) LU factorisation; replacing LU by the orthogonal QR factorisation was the change that made the method numerically sound ^{[Rutishauser 1958]}.

Wilkinson, in The Algebraic Eigenvalue Problem (1965), supplied the convergence theory and the shift analysis, including the shift now bearing his name — the eigenvalue of the trailing $2 \times 2$ block nearest the corner entry — which he showed restores global convergence where the plain corner-entry shift can stall, while retaining cubic local convergence for symmetric tridiagonal matrices ^{[Wilkinson 1965]}. The symmetric specialisation and its divide-and-conquer successors were developed by Parlett and others, and the modern understanding of the implicit shift as a nested subspace iteration, unifying the QR algorithm with the broader GR family, is due to Watkins ^{[Watkins 2007]}. The QR algorithm was named one of the ten algorithms of the twentieth century by Computing in Science and Engineering in 2000, and it remains the method underlying the eigenvalue routines of LAPACK and every major numerical library.

Bibliography Master

@article{Francis1961,
  author  = {Francis, John G. F.},
  title   = {The {QR} Transformation, Parts {I} and {II}},
  journal = {The Computer Journal},
  volume  = {4},
  number  = {3--4},
  year    = {1961--1962},
  pages   = {265--271, 332--345}
}

@article{Kublanovskaya1962,
  author  = {Kublanovskaya, Vera N.},
  title   = {On Some Algorithms for the Solution of the Complete Eigenvalue Problem},
  journal = {USSR Computational Mathematics and Mathematical Physics},
  volume  = {1},
  number  = {3},
  year    = {1962},
  pages   = {637--657}
}

@article{Rutishauser1958,
  author  = {Rutishauser, Heinz},
  title   = {Solution of Eigenvalue Problems with the {LR}-Transformation},
  journal = {National Bureau of Standards Applied Mathematics Series},
  volume  = {49},
  year    = {1958},
  pages   = {47--81}
}

@book{Wilkinson1965,
  author    = {Wilkinson, James H.},
  title     = {The Algebraic Eigenvalue Problem},
  publisher = {Oxford University Press},
  address   = {Oxford},
  year      = {1965}
}

@book{Parlett1998,
  author    = {Parlett, Beresford N.},
  title     = {The Symmetric Eigenvalue Problem},
  publisher = {SIAM},
  series    = {Classics in Applied Mathematics},
  year      = {1998}
}

@book{Watkins2007,
  author    = {Watkins, David S.},
  title     = {The Matrix Eigenvalue Problem: {GR} and {K}rylov Subspace Methods},
  publisher = {SIAM},
  address   = {Philadelphia},
  year      = {2007}
}

@book{TrefethenBau1997,
  author    = {Trefethen, Lloyd N. and Bau, David},
  title     = {Numerical Linear Algebra},
  publisher = {SIAM},
  address   = {Philadelphia},
  year      = {1997}
}

@book{GolubVanLoan2013,
  author    = {Golub, Gene H. and Van Loan, Charles F.},
  title     = {Matrix Computations},
  edition   = {4th},
  publisher = {Johns Hopkins University Press},
  address   = {Baltimore},
  year      = {2013}
}

Prerequisites

43.06.02
43.06.01
01.01.09

Tier anchors

beginner: Factor a matrix into a rotation part and a triangular part, multiply them back in the other order, and repeat until the diagonal reveals the eigenvalues — Strang *Introduction to Linear Algebra* Ch. 6 (eigenvalues and similar matrices); 3Blue1Brown *Essence of Linear Algebra* Ch. 14 (eigenvectors and eigenvalues)
intermediate: Trefethen-Bau *Numerical Linear Algebra* Lectures 28–29 (the QR algorithm without and with shifts) and Lecture 25 (orthogonal projectors, the QR factorisation); Golub-Van Loan *Matrix Computations* (4th ed.) §7.3 (the practical QR algorithm) and §8.3 (the symmetric QR algorithm)
master: Trefethen-Bau *Numerical Linear Algebra* Lectures 28–29; Golub-Van Loan *Matrix Computations* (4th ed.) §7.3 (the unsymmetric QR algorithm, the Francis double-shift), §8.3 (the symmetric tridiagonal QR with the Wilkinson shift); Watkins *The Matrix Eigenvalue Problem: GR and Krylov Subspace Methods* (SIAM, 2007) Ch. 4 (the implicit GR/QR framework and bulge-chasing); Parlett *The Symmetric Eigenvalue Problem* (SIAM Classics, 1998) §8.10–§8.14 (the QL/QR algorithm with shifts)

References

images/Shilov-Linear-Algebra__4cbdee00cc.jpg · Shilov *Linear Algebra* — Fast Track archive cover; Ch. 6 linear operators and the triangular (Schur) form of an operator under an orthonormal change of basis, the static target the QR iteration computes by repeated factor-and-swap
Trefethen, L. N. & Bau, D. — Numerical Linear Algebra (SIAM, 1997) · Lectures 28–29 — the unshifted QR iteration $A_k = Q_k R_k$, $A_{k+1} = R_k Q_k$ and its equivalence to simultaneous (orthogonal) power iteration; the Rayleigh-quotient and Wilkinson shifts; deflation
Golub, G. H. & Van Loan, C. F. — Matrix Computations (4th ed., Johns Hopkins, 2013) · §7.3 and §8.3 — the practical implicitly-shifted (bulge-chasing) QR algorithm on a Hessenberg matrix, the Francis double-shift for real matrices, and the symmetric tridiagonal QR with the Wilkinson shift
Francis, J. G. F. — The QR Transformation, Parts I and II (The Computer Journal, 1961–62) · Parts I–II — the QR transformation, the use of shifts, and the implicit double-shift bulge-chasing implementation on an upper-Hessenberg matrix
Wilkinson, J. H. — The Algebraic Eigenvalue Problem (Oxford, 1965) · Ch. 8 — the QR algorithm, the choice of shift (the Wilkinson shift as the eigenvalue of the trailing $2\times 2$ block nearest the corner entry), and the global and local convergence analysis
Watkins, D. S. — The Matrix Eigenvalue Problem: GR and Krylov Subspace Methods (SIAM, 2007) · Ch. 4 — the implicit-Q/GR framework, bulge-chasing as a nested subspace iteration, and the connection between the shifted QR step and simultaneous inverse iteration

Estimated time

beginner: 18m
intermediate: 46m
master: 92m