43.06.01 · numerical-analysis / 06-eigenvalue-algorithms

Power iteration, inverse iteration, and Rayleigh quotient iteration

shipped3 tiersLean: none

Anchor (Master): Trefethen-Bau *Numerical Linear Algebra* Lectures 27–28; Golub-Van Loan *Matrix Computations* (4th ed.) §8.2; Parlett *The Symmetric Eigenvalue Problem* Ch. 4 (power methods) and §4.6–§4.7 (Rayleigh quotient iteration, cubic convergence); Wilkinson *The Algebraic Eigenvalue Problem* Ch. 9 (inverse iteration)

Intuition Beginner

A matrix has a set of special directions, the eigenvectors, and each one comes with a number, its eigenvalue, that says how much the matrix stretches that direction. The largest eigenvalue belongs to the direction the matrix stretches the most. A natural question is how to find that dominant direction and its stretch factor without solving for every eigenvalue at once.

Power iteration answers with a one-line idea: pick any starting vector, multiply it by the matrix, and repeat. Each multiplication amplifies the dominant direction more than any other, because that direction is stretched the most. After enough rounds the vector points almost exactly along the dominant eigenvector, and you rescale it each step so it does not grow without bound. The leftover directions fade away at a rate set by how much the largest stretch factor beats the second largest.

Inverse iteration flips the trick to target any eigenvalue you like. You pick a guess near the eigenvalue you want, called a shift, and apply the inverse of the matrix minus that guess. This new matrix stretches most along the eigenvector whose eigenvalue sits closest to your guess, so the same repeated-multiplication idea now homes in on that one. Rayleigh quotient iteration takes the next step: after each round it uses the current vector to form a sharper guess, and for symmetric matrices this feedback makes the accuracy improve at a breathtaking pace.

Visual Beginner

The picture shows a starting vector being multiplied by a matrix again and again. At first the vector points in some arbitrary direction. After one multiplication it tilts toward the dominant eigendirection; after two it is closer still; after a handful of steps it sits almost exactly along that direction. The off-direction part shrinks by a constant factor every step, so the convergence looks like a vector swinging steadily into place.

Two facts are visible. The vector turns toward the dominant direction and the angle to it shrinks by the same ratio every step, the ratio of the second stretch factor to the first. If those two factors were close together the curve would drop slowly, and if the first dwarfs the second the curve would plunge fast.

Worked example Beginner

Take the symmetric matrix

A = (3113),

whose eigenvalues are $4$ (along the direction $(1, 1)$ ) and $2$ (along $(1, - 1)$ ). Start power iteration from the vector $x_{0} = (1, 0)$ , which is not aligned with either eigendirection.

Step 1. Multiply: $A x_{0} = (3, 1)$ . Rescale by dividing by the larger entry $3$ to keep numbers tidy: $x_{1} = (1, 0.333)$ .

Step 2. Multiply: $A x_{1} = (3 + 0.333, 1 + 1) = (3.333, 2)$ . Rescale by $3.333$ : $x_{2} = (1, 0.6)$ .

Step 3. Multiply: $A x_{2} = (3 + 0.6, 1 + 1.8) = (3.6, 2.8)$ . Rescale by $3.6$ : $x_{3} = (1, 0.778)$ .

Step 4. Multiply: $A x_{3} = (3 + 0.778, 1 + 2.333) = (3.778, 3.333)$ . Rescale by $3.778$ : $x_{4} = (1, 0.882)$ .

The second coordinate is climbing toward $1$ , which is the dominant eigendirection $(1, 1)$ . The gap to $1$ shrinks roughly by half each step, matching the ratio $2/4 = 0.5$ of the two eigenvalues.

Step 5. Estimate the eigenvalue. Take the Rayleigh quotient of $x_{4} = (1, 0.882)$ : compute $A x_{4} = (3.882, 3.647)$ , then the ratio $(A x_{4}) \cdot x_{4} / (x_{4} \cdot x_{4}) = (3.882 + 3.216) / (1 + 0.778) = 7.098/1.778 = 3.99$ .

What this tells us: a few rounds of plain matrix multiplication drove an arbitrary vector almost onto the dominant eigenvector, and the Rayleigh quotient turned that vector into the estimate $3.99$ for the dominant eigenvalue $4$ . No characteristic polynomial was solved; the matrix did the work by stretching.

Check your understanding Beginner

Exercise (easy, multiple choice).

Power iteration converges fastest when the largest and second-largest eigenvalues (in size) satisfy which relationship?

A. They are equal. B. The largest is much bigger than the second. C. The second is negative. D. Both are zero.

Hint

The off-direction part shrinks by the ratio of the second size to the first each step. A small ratio means fast shrinking.

Answer

B. The largest is much bigger than the second. The error shrinks by the ratio of the second-largest size to the largest each step, so a large gap makes that ratio small and convergence fast. Feedback-correct: a dominant eigenvalue that dwarfs the rest is the ideal case. Feedback-wrong: if the two are equal the ratio is $1$ and the method stalls; the sign of the second eigenvalue does not by itself set the speed; both being zero leaves nothing to find.

Formal definition Intermediate+

Let $A \in M_{n} (F)$ with $F \in {R, C}$ , and suppose $A$ is diagonalisable with eigenvalues $λ_{1}, \dots, λ_{n}$ and a basis of eigenvectors $u_{1}, \dots, u_{n}$ , where $A u_{i} = λ_{i} u_{i}$ and $∥ u_{i} ∥ = 1$ in the Euclidean norm of 01.01.08. Order the eigenvalues by modulus, $∣ λ_{1} ∣ \geq ∣ λ_{2} ∣ \geq \dots \geq ∣ λ_{n} ∣$ . The eigenvalue $λ_{1}$ is dominant when $∣ λ_{1} ∣ > ∣ λ_{2} ∣$ ; in that strict case the dominant eigenspace is one-dimensional and $u_{1}$ is determined up to a scalar.

Definition (power iteration). Given a start vector $x_{0}$ with $∥ x_{0} ∥ = 1$ , power iteration generates the sequence

y_{k} = A x_{k - 1}, x_{k} = \frac{y _{k}}{∥ y _{k} ∥}, k = 1, 2, \dots,

together with the eigenvalue estimate by the Rayleigh quotient $ρ_{k} = R (x_{k}) = ⟨ A x_{k}, x_{k} ⟩ / ⟨ x_{k}, x_{k} ⟩$ , the optimal scalar estimate of the eigenvalue associated with the direction $x_{k}$ , in the sense of 01.01.14. Each step is one matrix-vector product followed by a normalisation.

Definition (inverse iteration with a shift). Fix a shift $μ \in F$ that is not an eigenvalue, so $A - μ I$ is invertible. Inverse iteration generates

(A - μ I) y_{k} = x_{k - 1}, x_{k} = \frac{y _{k}}{∥ y _{k} ∥},

that is, power iteration applied to the matrix $(A - μ I)^{- 1}$ . The vector $y_{k}$ is computed by solving a linear system with the fixed matrix $A - μ I$ — a single factorisation reused across iterations — rather than by forming the inverse explicitly.

Definition (Rayleigh quotient iteration). Replace the fixed shift by the current Rayleigh quotient at every step. Starting from $x_{0}$ with $∥ x_{0} ∥ = 1$ and $μ_{0} = R (x_{0})$ ,

(A - μ_{k - 1} I) y_{k} = x_{k - 1}, x_{k} = \frac{y _{k}}{∥ y _{k} ∥}, μ_{k} = R (x_{k}) .

The shift $μ_{k - 1}$ now varies, so each step solves a system with a different matrix, but the adaptively improving shift produces convergence far faster than a fixed shift.

Notation: $∠ (v, w)$ is the acute angle between the lines spanned by nonzero $v$ and $w$ , with $sin ∠ (v, w)$ the standard measure of misalignment; $σ (A)$ is the spectrum; $R (v) = ⟨ A v, v ⟩ / ⟨ v, v ⟩$ is the Rayleigh quotient. Throughout, "convergence of $x_{k}$ to $u$ " means $sin ∠ (x_{k}, u) \to 0$ , since an eigenvector is determined only up to a scalar.

Counterexamples to common slips

Power iteration needs a strictly dominant eigenvalue. For $A = diag (1, - 1)$ the two eigenvalues have equal modulus, and a generic start vector oscillates between two directions without settling; the ratio $∣ λ_{2} / λ_{1} ∣ = 1$ gives no contraction.
The method fails if the start vector has zero component along $u_{1}$ . For $A = diag (2, 1)$ and $x_{0} = (0, 1)$ , every iterate stays $(0, 1)$ and converges to the subdominant eigenvector. In exact arithmetic this is a measure-zero accident; in floating point, rounding usually reintroduces a $u_{1}$ -component and the method recovers.
A complex dominant pair $λ, \overset{ˉ}{λ}$ of a real matrix has equal modulus, so real power iteration does not converge to a single vector; the dominant invariant subspace is two-dimensional and requires a block method.

Key theorem with proof Intermediate+

Theorem (linear convergence of power iteration; Trefethen-Bau Lecture 27 ^{[source pending]}; Golub-Van Loan §8.2 ^{[source pending]}). Let $A \in M_{n} (F)$ be diagonalisable with eigenpairs $(λ_{i}, u_{i})$ , $∥ u_{i} ∥ = 1$ , and a strictly dominant eigenvalue, $∣ λ_{1} ∣ > ∣ λ_{2} ∣ \geq \dots \geq ∣ λ_{n} ∣$ . Write the start vector as $x_{0} = \sum_{i = 1}^{n} c_{i} u_{i}$ and assume $c_{1} \neq = 0$ . Then the power-iteration sequence $x_{k} = A^{k} x_{0} /∥ A^{k} x_{0} ∥$ satisfies

sin ∠ (x_{k}, u_{1}) = O (\frac{λ _{2}}{λ _{1}}^{k}) as k \to \infty,

and the Rayleigh quotient $ρ_{k} = R (x_{k})$ converges to $λ_{1}$ . If in addition $A$ is self-adjoint, the eigenvalue error sharpens to $ρ_{k} - λ_{1} = O (∣ λ_{2} / λ_{1} ∣^{2 k})$ .

Proof. Expand in the eigenbasis: $A^{k} x_{0} = \sum_{i = 1}^{n} c_{i} λ_{i}^{k} u_{i}$ . Factor out the dominant term,

A^{k} x_{0} = c_{1} λ_{1}^{k} (u_{1} + i = 2 \sum n \frac{c _{i}}{c _{1}} (\frac{λ _{i}}{λ _{1}})^{k} u_{i}) = c_{1} λ_{1}^{k} (u_{1} + r_{k}),

where the remainder $r_{k} = \sum_{i \geq 2} (c_{i} / c_{1}) (λ_{i} / λ_{1})^{k} u_{i}$ has norm bounded by

∥ r_{k} ∥ \leq i \geq 2 \sum \frac{c _{i}}{c _{1}} \frac{λ _{i}}{λ _{1}}^{k} \leq (i \geq 2 \sum \frac{c _{i}}{c _{1}}) \frac{λ _{2}}{λ _{1}}^{k} = C \frac{λ _{2}}{λ _{1}}^{k},

because $∣ λ_{i} / λ_{1} ∣ \leq ∣ λ_{2} / λ_{1} ∣$ for every $i \geq 2$ . Normalising kills the scalar $c_{1} λ_{1}^{k}$ , so $x_{k}$ is a unit vector in the direction of $u_{1} + r_{k}$ . The sine of the angle to $u_{1}$ is the norm of the component of $x_{k}$ orthogonal to $u_{1}$ , which is at most $∥ r_{k} ∥/∥ u_{1} + r_{k} ∥$ ; since $∥ u_{1} + r_{k} ∥ \to 1$ , this is $O (∥ r_{k} ∥) = O (∣ λ_{2} / λ_{1} ∣^{k})$ .

For the eigenvalue estimate, the Rayleigh quotient is continuous and $R (u_{1}) = λ_{1}$ , so $ρ_{k} = R (x_{k}) \to λ_{1}$ as $x_{k} \to u_{1}$ in direction. When $A = A^{*}$ , the eigenbasis is orthonormal, so $r_{k} ⊥ u_{1}$ to leading order and $R (x_{k}) - λ_{1}$ is a quadratic form in the small quantity $r_{k}$ : writing $x_{k} \propto u_{1} + r_{k}$ with $r_{k} ⊥ u_{1}$ , one computes $R (x_{k}) - λ_{1} = ⟨(A - λ_{1} I) r_{k}, r_{k} ⟩ / (1 + ∥ r_{k} ∥^{2})$ , which is $O (∥ r_{k} ∥^{2}) = O (∣ λ_{2} / λ_{1} ∣^{2 k})$ . This is the same second-order stationarity that makes the Rayleigh quotient the optimal eigenvalue estimate at an approximate eigenvector. $□$

Bridge. This convergence analysis builds toward the inverse-iteration and Rayleigh-quotient-iteration theorems of the Advanced results, where the very same eigenbasis expansion is applied to $(A - μ I)^{- 1}$ , whose eigenvalues $(λ_{i} - μ)^{- 1}$ are dominated by the one nearest the shift; the ratio $∣ λ_{2} / λ_{1} ∣$ that governs the rate here appears again in 43.06.03 as the convergence engine of the QR algorithm, which is simultaneous power iteration on an orthonormal basis. The foundational reason the method works is the spectral expansion $A^{k} x_{0} = \sum_{i} c_{i} λ_{i}^{k} u_{i}$ : raising $A$ to a power raises each eigenvalue to that power, and the largest in modulus eventually swamps the rest. This is exactly the dynamical content of the dominant eigenvalue, and the quadratic sharpening in the self-adjoint case generalises the stationarity of the Rayleigh quotient established in 01.01.14. Putting these together, plain matrix multiplication is a convergence process whose speed is read off the spectral gap, and the bridge is the observation that a faster method must reshape the spectrum so that the targeted eigenvalue dominates more sharply — precisely what shifting and inverting accomplish.

Exercises Intermediate+

Exercise 4 (medium, symbolic).

Show that inverse iteration with shift $μ$ is power iteration on $B = (A - μ I)^{- 1}$ , and that the eigenvalues of $B$ are $(λ_{i} - μ)^{- 1}$ with the same eigenvectors as $A$ . Deduce which eigenvector becomes dominant for $B$ .

Hint

If $A u_{i} = λ_{i} u_{i}$ , apply $(A - μ I)$ and then its inverse to $u_{i}$ .

Answer

From $A u_{i} = λ_{i} u_{i}$ , $(A - μ I) u_{i} = (λ_{i} - μ) u_{i}$ , so applying $(A - μ I)^{- 1}$ gives $B u_{i} = (λ_{i} - μ)^{- 1} u_{i}$ . Thus $B$ has the same eigenvectors $u_{i}$ with eigenvalues $(λ_{i} - μ)^{- 1}$ . The step $(A - μ I) y_{k} = x_{k - 1}$ is exactly $y_{k} = B x_{k - 1}$ , so the iteration is power iteration on $B$ . The dominant eigenvalue of $B$ is the largest of $∣ (λ_{i} - μ)^{- 1} ∣$ , attained when $∣ λ_{i} - μ ∣$ is smallest — the eigenvalue $λ_{i}$ nearest the shift $μ$ . So inverse iteration converges to the eigenvector of the eigenvalue closest to $μ$ . Rubric: full credit for the eigenvalue computation and the nearest-to- $μ$ conclusion.

Exercise 7 (hard, short-answer).

Prove that for a self-adjoint $A$ with simple eigenvalue $λ$ and unit eigenvector $u$ , if $x = (cos θ) u + (sin θ) z$ with $z ⊥ u$ , $∥ z ∥ = 1$ , then $R (x) - λ = O (sin^{2} θ)$ .

Hint

Expand $R (x) = ⟨ A x, x ⟩$ for the unit vector $x$ , using $A u = λ u$ and self-adjointness to drop the cross terms.

Answer

Since $∥ x ∥ = 1$ , $R (x) = ⟨ A x, x ⟩$ . Expand: $$ \langle A x, x \rangle = \cos^2\theta, \langle A u, u \rangle + 2\cos\theta\sin\theta, \mathrm{Re}\langle A u, z\rangle + \sin^2\theta, \langle A z, z\rangle. $$ Now $⟨ A u, z ⟩ = λ ⟨ u, z ⟩ = 0$ because $z ⊥ u$ , so the cross term vanishes — this uses $A u = λ u$ . Hence $R (x) = λ cos^{2} θ + ⟨ A z, z ⟩ sin^{2} θ = λ + (⟨ A z, z ⟩ - λ) sin^{2} θ$ , using $cos^{2} θ = 1 - sin^{2} θ$ . The factor $⟨ A z, z ⟩ - λ$ is bounded by $∥ A ∥ + ∣ λ ∣$ , so $R (x) - λ = O (sin^{2} θ)$ . The vanishing of the cross term — a consequence of self-adjointness — is exactly why the Rayleigh-quotient error is second order. Rubric: full credit for the expansion, the vanishing cross term, and the $O (sin^{2} θ)$ conclusion.

Exercise 8 (hard, symbolic).

Deflation: let $A$ be self-adjoint with dominant eigenpair $(λ_{1}, u_{1})$ , $∥ u_{1} ∥ = 1$ , already found. Show that $A^{'} = A - λ_{1} u_{1} u_{1}^{*}$ has the same eigenpairs as $A$ except that the eigenvalue at $u_{1}$ is moved to $0$ , so power iteration on $A^{'}$ finds $λ_{2}$ .

Hint

Apply $A^{'}$ to $u_{1}$ and to any eigenvector $u_{j}$ orthogonal to $u_{1}$ , using the orthonormality of the eigenbasis.

Answer

The self-adjoint $A$ has an orthonormal eigenbasis $u_{1}, \dots, u_{n}$ with $A u_{i} = λ_{i} u_{i}$ and $⟨ u_{i}, u_{j} ⟩ = δ_{ij}$ . Apply $A^{'} = A - λ_{1} u_{1} u_{1}^{*}$ . On $u_{1}$ : $A^{'} u_{1} = A u_{1} - λ_{1} u_{1} (u_{1}^{*} u_{1}) = λ_{1} u_{1} - λ_{1} u_{1} = 0$ , so $u_{1}$ now has eigenvalue $0$ . On $u_{j}$ with $j \geq 2$ : since $u_{1}^{*} u_{j} = ⟨ u_{j}, u_{1} ⟩ = 0$ , $A^{'} u_{j} = A u_{j} - λ_{1} u_{1} (u_{1}^{*} u_{j}) = λ_{j} u_{j} - 0 = λ_{j} u_{j}$ , so every other eigenpair is unchanged. Thus the spectrum of $A^{'}$ is ${0, λ_{2}, \dots, λ_{n}}$ with the same eigenvectors. If $∣ λ_{2} ∣ > ∣ λ_{3} ∣$ and $∣ λ_{2} ∣ > 0$ , the dominant eigenvalue of $A^{'}$ is $λ_{2}$ , so power iteration on $A^{'}$ converges to $u_{2}$ and recovers $λ_{2}$ . Rubric: full credit for both cases of the action of $A^{'}$ and the conclusion that $λ_{2}$ is now dominant.

Advanced results Master

Theorem (inverse iteration converges to the eigenvalue nearest the shift; Wilkinson Ch. 9 ^{[source pending]}; Trefethen-Bau Lecture 27 ^{[source pending]}). Let $A \in M_{n} (F)$ be diagonalisable with eigenpairs $(λ_{i}, u_{i})$ , and let $μ \in / σ (A)$ be a shift such that a unique eigenvalue $λ_{J}$ minimises $∣ λ_{i} - μ ∣$ , with the runner-up distance $δ = min_{i \neq = J} ∣ λ_{i} - μ ∣ > ∣ λ_{J} - μ ∣$ . For any start vector with nonzero $u_{J}$ -component, inverse iteration $x_{k} = (A - μ I)^{- 1} x_{k - 1} /∥ \dots ∥$ satisfies

sin ∠ (x_{k}, u_{J}) = O ((\frac{∣ λ _{J} - μ ∣}{δ})^{k}) .

The mechanism is the eigenvalue transformation $λ_{i} \mapsto (λ_{i} - μ)^{- 1}$ , which makes the eigenvalue nearest $μ$ the dominant eigenvalue of $(A - μ I)^{- 1}$ ; the convergence theorem of the Intermediate tier then applies verbatim to $B = (A - μ I)^{- 1}$ . The closer the shift is to $λ_{J}$ , the smaller the ratio $∣ λ_{J} - μ ∣/ δ$ and the faster the convergence — the practical reason inverse iteration is the standard tool for refining an eigenvector once an approximate eigenvalue is known. The system $(A - μ I) y_{k} = x_{k - 1}$ becomes severely ill-conditioned as $μ \to λ_{J}$ , yet Wilkinson's analysis shows the computed $y_{k}$ points accurately in the direction of $u_{J}$ : the large error introduced by the near-singular solve lies almost entirely along $u_{J}$ itself, so after normalisation it does no harm.

Theorem (cubic convergence of Rayleigh quotient iteration for self-adjoint $A$ ; Parlett §4.6 ^{[source pending]}). Let $A = A^ $ha v e a s im pl ee i g e n v a l u e$ \lambda $w i t h u ni t e i g e n v ec t or$ u $. T h er e i s an e i g hb o u r h oo d o f$ u $s u c h t ha t, s t a r t e df r o man y$ x_0 $ini tw i t h$ \sin\angle(x_0, u)$ small, Rayleigh quotient iteration produces iterates with*

sin ∠ (x_{k + 1}, u) = O (sin^{3} ∠ (x_{k}, u)) .

For a general (non-normal) diagonalisable $A$ the convergence is quadratic, $sin ∠ (x_{k + 1}, u) = O (sin^{2} ∠ (x_{k}, u))$ .

Cubic convergence means the number of correct digits roughly triples per step; in practice Rayleigh quotient iteration on a symmetric matrix reaches machine precision in three or four iterations once it enters the basin of attraction. The cube — rather than the square — is bought by the second-order accuracy of the Rayleigh quotient: when $A = A^{*}$ , the shift error $∣ μ_{k} - λ ∣ = O (sin^{2} ∠ (x_{k}, u))$ is one power smaller than for a generic shift, and inverse iteration with that shift contracts the eigenvector error by an extra factor of the shift error, compounding the second-order shift accuracy with the first-order inverse-iteration contraction to give the third power.

Theorem (deflation and the full symmetric spectrum). Let $A = A^ $w i t h or t h o n or ma l e i g e nba s i s$ u_1, \ldots, u_n $an d e i g e n v a l u esor d er e d b y m o d u l u s . H a v in g co m p u t e d$ (\lambda_1, u_1) $, t h e d e f l a t e d o p er a t or$ A^{(1)} = A - \lambda_1 u_1 u_1^ $ha ss p ec t r u m$ {0} \cup {\lambda_2, \ldots, \lambda_n}$ with the same eigenvectors; iterating the deflation recovers the entire spectrum one eigenpair at a time. The deflation is the Hermitian rank-one correction that annihilates the found eigenvalue while preserving orthogonal eigendirections, and it is the conceptual seed of the block and subspace methods that compute several eigenpairs at once. Numerically, explicit deflation accumulates rounding error across eigenpairs, which is why the production algorithm of the section is instead the QR algorithm of 43.06.03, computing the full spectrum simultaneously and stably.

Theorem (shifted power iteration and spectral relocation). For any scalar $s$ , the eigenvalues of $A - s I$ are $λ_{i} - s$ with unchanged eigenvectors. Choosing $s$ to maximise the gap $∣ λ_{1} - s ∣/∣ λ_{2} - s ∣$ accelerates power iteration without changing the target eigenvector; for a self-adjoint $A$ with spectrum in $[α, β]$ and dominant $λ_{1} = β$ , the optimal real shift is $s = (α + λ_{2}) /2$ , centring the unwanted spectrum about zero. This spectral-relocation principle is the bridge from the single-vector methods of this unit to polynomial acceleration: replacing $A$ by a polynomial $p (A)$ chosen to amplify the wanted eigenvalue and damp the rest is the idea behind Chebyshev acceleration and, ultimately, the Krylov methods of chapter $43.07$ .

Synthesis. The four algorithms of this unit are one idea viewed through successively sharper lenses, and the spectral expansion $A^{k} x_{0} = \sum_{i} c_{i} λ_{i}^{k} u_{i}$ is the foundational reason they work: power iteration amplifies the eigenvalue of largest modulus, inverse iteration reshapes the spectrum by $λ_{i} \mapsto (λ_{i} - μ)^{- 1}$ so that the eigenvalue nearest a chosen shift becomes dominant, and Rayleigh quotient iteration closes the loop by feeding the current Rayleigh-quotient estimate back as the shift. This is exactly where the variational theory of 01.01.14 pays off: the Rayleigh quotient is the optimal eigenvalue estimate at an approximate eigenvector, stationary there, so for a self-adjoint operator its error is second order in the eigenvector error, and that second-order shift accuracy compounds with the first-order contraction of inverse iteration to produce cubic convergence. The convergence rate $∣ λ_{2} / λ_{1} ∣$ that governs power iteration generalises to the gap ratio of any reshaped spectrum, and the same ratio appears again as the engine of the QR algorithm, which is simultaneous orthonormal power iteration.

Putting these together, the single-vector eigenvalue algorithms occupy the place between the static spectral theory — eigenvalues as roots, as critical values of the Rayleigh quotient, as min-max over subspaces — and the production eigensolvers: the central insight is that a good eigenvalue algorithm is a dynamical system whose fixed points are the eigenvectors and whose contraction rate is set by how sharply the targeted eigenvalue can be made to dominate the rest. Deflation and shifting are the two levers on that domination, and the bridge is the recognition that the QR algorithm and the Krylov methods are these same levers applied to a whole basis at once.

Full proof set Master

Proposition (power iteration, full statement and rate). Under the hypotheses of the Intermediate theorem — $A$ diagonalisable, $∣ λ_{1} ∣ > ∣ λ_{2} ∣ \geq \dots$ , start vector $x_{0} = \sum_{i} c_{i} u_{i}$ with $c_{1} \neq = 0$ — the normalised iterates satisfy $sin ∠ (x_{k}, u_{1}) \leq C ∣ λ_{2} / λ_{1} ∣^{k}$ for a constant $C$ depending only on the $c_{i}$ and the eigenbasis conditioning.

Proof. From $A^{k} x_{0} = c_{1} λ_{1}^{k} (u_{1} + r_{k})$ with $r_{k} = \sum_{i \geq 2} (c_{i} / c_{1}) (λ_{i} / λ_{1})^{k} u_{i}$ , decompose any unit iterate $x_{k}$ as its $u_{1}$ -component plus an orthogonal remainder. Writing $P_{1}$ for the spectral projection onto $u_{1}$ along the other eigenvectors, $x_{k}$ is the unit vector in the direction of $u_{1} + r_{k}$ , so

sin ∠ (x_{k}, u_{1}) = \frac{∥ ( I - P _{1} ) ( u _{1} + r _{k} ) ∥}{∥ u _{1} + r _{k} ∥} = \frac{∥ ( I - P _{1} ) r _{k} ∥}{∥ u _{1} + r _{k} ∥},

since $(I - P_{1}) u_{1} = 0$ . The numerator is at most $∥ I - P_{1} ∥ ∥ r_{k} ∥$ , and $∥ r_{k} ∥ \leq C_{0} ∣ λ_{2} / λ_{1} ∣^{k}$ from the Intermediate bound. For $k$ large, $∥ u_{1} + r_{k} ∥ \geq 1 - ∥ r_{k} ∥ \geq 1/2$ . Hence $sin ∠ (x_{k}, u_{1}) \leq 2∥ I - P_{1} ∥ C_{0} ∣ λ_{2} / λ_{1} ∣^{k} = C ∣ λ_{2} / λ_{1} ∣^{k}$ . The projection norm $∥ I - P_{1} ∥$ is the eigenbasis conditioning, equal to $1$ when $A$ is normal and larger for non-normal $A$ . $□$

Proposition (inverse iteration rate). With $μ \in / σ (A)$ and a unique nearest eigenvalue $λ_{J}$ , inverse iteration converges to $u_{J}$ at rate $∣ λ_{J} - μ ∣/ δ$ , where $δ = min_{i \neq = J} ∣ λ_{i} - μ ∣$ .

Proof. Set $B = (A - μ I)^{- 1}$ . By the computation $B u_{i} = (λ_{i} - μ)^{- 1} u_{i}$ , the eigenvalues of $B$ are $β_{i} = (λ_{i} - μ)^{- 1}$ with the eigenvectors of $A$ . The largest in modulus is $β_{J}$ because $∣ β_{i} ∣ = 1/∣ λ_{i} - μ ∣$ is largest when $∣ λ_{i} - μ ∣$ is smallest, which the hypothesis pins to $i = J$ . The second-largest is $1/ δ$ . Inverse iteration is power iteration on $B$ , so the previous proposition gives $sin ∠ (x_{k}, u_{J}) \leq C ∣ β_{2} (B) / β_{1} (B) ∣^{k} = C (∣ λ_{J} - μ ∣/ δ)^{k}$ , where $β_{1} (B) = β_{J}$ and $β_{2} (B)$ is the runner-up. The uniqueness of the nearest eigenvalue ensures $B$ has a strictly dominant eigenvalue, the condition the power-iteration proposition requires. $□$

Proposition (cubic convergence of Rayleigh quotient iteration, self-adjoint case). Let $A = A^ $ha v es im pl ee i g e n v a l u e$ \lambda $w i t h u ni t e i g e n v ec t or$ u $an d s p ec t r a l g a p$ \gamma = \mathrm{dist}(\lambda, \sigma(A) \setminus {\lambda}) > 0 $. F or$ x_k $w i t h$ \theta_k = \angle(x_k, u) $s ma l l, t h e R a y l e i g h q u o t i e n t i t er a t i o n s a t i s f i es$ \sin\theta_{k+1} = O(\sin^3\theta_k)$.*

Proof. Write $x_{k} = (cos θ_{k}) u + (sin θ_{k}) z_{k}$ with $z_{k} ⊥ u$ , $∥ z_{k} ∥ = 1$ . By the self-adjoint Rayleigh-quotient expansion (Exercise 7), $μ_{k} = R (x_{k}) = λ + (⟨ A z_{k}, z_{k} ⟩ - λ) sin^{2} θ_{k}$ , so the shift error is

∣ μ_{k} - λ ∣ \leq (∥ A ∥ + ∣ λ ∣) sin^{2} θ_{k} = O (sin^{2} θ_{k}) .

The next iterate is $x_{k + 1} \propto (A - μ_{k} I)^{- 1} x_{k}$ . Decompose along $u$ and $u^{⊥}$ : applying $(A - μ_{k} I)^{- 1}$ scales the $u$ -component $cos θ_{k}$ by $(λ - μ_{k})^{- 1}$ and the $z_{k}$ -component $sin θ_{k}$ by an operator of norm at most $(γ - ∣ μ_{k} - λ ∣)^{- 1} = O (1)$ , because $A$ restricted to the invariant subspace $u^{⊥}$ has spectrum at distance $\geq γ$ from $λ$ and hence $\geq γ - ∣ μ_{k} - λ ∣$ from $μ_{k}$ . Therefore

tan θ_{k + 1} = \frac{∥ component ⊥ u ∥}{∥ component along u ∥} \leq \frac{O ( 1 ) sin θ _{k}}{∣ λ - μ _{k} ∣ ^{- 1} cos θ _{k}} = O (∣ μ_{k} - λ ∣) tan θ_{k} = O (sin^{2} θ_{k}) tan θ_{k} .

Since $tan θ_{k} = O (sin θ_{k})$ for small angles, $tan θ_{k + 1} = O (sin^{3} θ_{k})$ , and $sin θ_{k + 1} \leq tan θ_{k + 1} = O (sin^{3} θ_{k})$ . Self-adjointness enters twice: to make $u^{⊥}$ an $A$ -invariant orthogonal complement, and to make $μ_{k} - λ$ second order in $θ_{k}$ . For non-normal $A$ the shift error is only first order, $∣ μ_{k} - λ ∣ = O (sin θ_{k})$ , and the same argument yields the quadratic rate $sin θ_{k + 1} = O (sin^{2} θ_{k})$ . $□$

Proposition (deflation preserves the orthogonal spectrum, self-adjoint case). For $A = A^ $w i t h u ni t e i g e n v ec t or$ u_1 $an d e i g e n v a l u e$ \lambda_1 $, t h eo p er a t or$ A' = A - \lambda_1 u_1 u_1^ $i sse l f - a d j o in tw i t h$ A' u_1 = 0 $an d$ A' u_j = \lambda_j u_j $f or e v er y e i g e n v ec t or$ u_j \perp u_1$.

Proof. The correction $λ_{1} u_{1} u_{1}^{*}$ is Hermitian ( $λ_{1}$ real, $(u_{1} u_{1}^{*})^{*} = u_{1} u_{1}^{*}$ ), so $A^{'}$ is self-adjoint. The eigenbasis of $A$ is orthonormal, so for $j \geq 2$ , $u_{1}^{*} u_{j} = ⟨ u_{j}, u_{1} ⟩ = 0$ , giving $A^{'} u_{j} = A u_{j} - λ_{1} u_{1} (u_{1}^{*} u_{j}) = λ_{j} u_{j}$ . On $u_{1}$ , $A^{'} u_{1} = λ_{1} u_{1} - λ_{1} u_{1} (u_{1}^{*} u_{1}) = λ_{1} u_{1} - λ_{1} u_{1} = 0$ since $∥ u_{1} ∥ = 1$ . The spectrum is therefore ${0, λ_{2}, \dots, λ_{n}}$ with eigenvectors unchanged. $□$

Connections Master

The single-vector eigenvalue algorithms are the dynamical counterpart of the variational eigenvalue theory of 01.01.14: where the Rayleigh quotient there is characterised statically as the optimal eigenvalue estimate and the critical values of $R$ , here it is the adaptive shift that drives Rayleigh quotient iteration, and its second-order stationarity at an eigenvector is precisely what upgrades inverse iteration from linear to cubic convergence in the self-adjoint case.

Power iteration is the convergence engine of the QR algorithm of 43.06.03: the unshifted QR iteration is simultaneous power iteration applied to a full orthonormal basis rather than a single vector, and the Wilkinson shift that gives the QR algorithm its cubic local convergence on symmetric tridiagonals is the Rayleigh-quotient shift of this unit transplanted into the orthogonal-similarity setting.

The inverse-iteration step $(A - μ I) y_{k} = x_{k - 1}$ is a linear solve with a fixed, reused matrix, so its efficiency rests on the direct factorisations of the chapter — Gaussian elimination and the LU factorisation are computed once and applied at every iteration; the preliminary reduction to Hessenberg or tridiagonal form of 43.06.02 is what makes each of those solves cheap, which is why the practical pipeline reduces first and iterates second.

The spectral-relocation idea — replacing $A$ by $A - s I$ or by a polynomial $p (A)$ to reshape which eigenvalue dominates — is the seed of the Krylov-subspace methods, where the iterate lies in $span {b, A b, \dots, A^{m - 1} b}$ and a polynomial in $A$ is chosen to amplify the wanted part of the spectrum; the conjugate gradient and GMRES convergence bounds rest on exactly the polynomial-acceleration principle introduced here in its single-vector form.

Historical & philosophical context Master

The power method entered numerical practice through the 1929 paper of von Mises and Pollaczek-Geiringer, who set out the Potenzmethode as a systematic iterative procedure for the dominant eigenvalue of the matrices arising in elasticity and structural mechanics, where the characteristic polynomial was intractable to solve directly ^{[von Mises 1929]}. The idea of repeatedly applying a linear operator to expose its dominant mode was older in the analytic theory of integral equations, but von Mises gave it the matrix-iteration form that the digital era inherited.

The shifted inverse variant is due to Helmut Wielandt, who in a 1944 Göttingen aerodynamics report introduced gebrochene Iteration — fractional iteration — applying $(A - μ I)^{- 1}$ to target an eigenvalue near a chosen point $μ$ , the technique now called inverse iteration or Wielandt iteration ^{[Wielandt 1944]}. Wilkinson, in The Algebraic Eigenvalue Problem (1965), supplied the rounding-error analysis that resolved the apparent paradox of the method: the linear system $(A - μ I) y = x$ becomes arbitrarily ill-conditioned as the shift approaches an eigenvalue, yet the computed solution is an excellent eigenvector, because the large error is aligned with the very eigenvector being sought ^{[Wilkinson 1965]}. The Rayleigh quotient as a self-correcting shift, and the resulting cubic convergence for symmetric matrices, was analysed in detail by Parlett, whose The Symmetric Eigenvalue Problem established the cubic local rate and its dependence on the second-order accuracy of the Rayleigh quotient ^{[Parlett 1998]}. These single-vector methods were the state of the art for the dominant eigenpair until the QR algorithm of Francis and Kublanovskaya (1961) made the full spectrum computable at comparable cost.

Bibliography Master

@article{vonMises1929,
  author  = {von Mises, Richard and Pollaczek-Geiringer, Hilda},
  title   = {Praktische Verfahren der Gleichungsaufl{\"o}sung},
  journal = {Zeitschrift f{\"u}r Angewandte Mathematik und Mechanik},
  volume  = {9},
  year    = {1929},
  pages   = {58--77, 152--164}
}

@techreport{Wielandt1944,
  author      = {Wielandt, Helmut},
  title       = {Beitr{\"a}ge zur mathematischen Behandlung komplexer Eigenwertprobleme},
  institution = {Aerodynamische Versuchsanstalt G{\"o}ttingen},
  number      = {Bericht B 44/J/37},
  year        = {1944}
}

@book{Wilkinson1965,
  author    = {Wilkinson, James H.},
  title     = {The Algebraic Eigenvalue Problem},
  publisher = {Oxford University Press},
  address   = {Oxford},
  year      = {1965}
}

@book{Parlett1998,
  author    = {Parlett, Beresford N.},
  title     = {The Symmetric Eigenvalue Problem},
  publisher = {SIAM},
  series    = {Classics in Applied Mathematics},
  year      = {1998}
}

@book{TrefethenBau1997,
  author    = {Trefethen, Lloyd N. and Bau, David},
  title     = {Numerical Linear Algebra},
  publisher = {SIAM},
  address   = {Philadelphia},
  year      = {1997}
}

@book{GolubVanLoan2013,
  author    = {Golub, Gene H. and Van Loan, Charles F.},
  title     = {Matrix Computations},
  edition   = {4th},
  publisher = {Johns Hopkins University Press},
  address   = {Baltimore},
  year      = {2013}
}

@article{Francis1961,
  author  = {Francis, John G. F.},
  title   = {The QR Transformation, I},
  journal = {The Computer Journal},
  volume  = {4},
  number  = {3},
  year    = {1961},
  pages   = {265--271}
}

Prerequisites

01.01.08
01.01.14

Tier anchors

beginner: Repeatedly multiplying by a matrix and watching the result line up with the dominant direction — Strang *Introduction to Linear Algebra* Ch. 6 (powers of a matrix); 3Blue1Brown *Essence of Linear Algebra* Ch. 14 (eigenvectors as the directions the matrix stretches most)
intermediate: Trefethen-Bau *Numerical Linear Algebra* Lectures 27–28 (Rayleigh quotient, power iteration, inverse iteration, Rayleigh quotient iteration); Golub-Van Loan *Matrix Computations* §8.2 (the power method and its variants)
master: Trefethen-Bau *Numerical Linear Algebra* Lectures 27–28; Golub-Van Loan *Matrix Computations* (4th ed.) §8.2; Parlett *The Symmetric Eigenvalue Problem* Ch. 4 (power methods) and §4.6–§4.7 (Rayleigh quotient iteration, cubic convergence); Wilkinson *The Algebraic Eigenvalue Problem* Ch. 9 (inverse iteration)

References

images/Shilov-Linear-Algebra__4cbdee00cc.jpg · Shilov *Linear Algebra* — Fast Track archive cover; Ch. 10 self-adjoint operators, the Rayleigh quotient and the extremal characterisation that the eigenvalue algorithms of this unit approximate
Trefethen, L. N. & Bau, D. — Numerical Linear Algebra (SIAM, 1997) · Lectures 27–28 — the Rayleigh quotient, power iteration with rate $|\lambda_2/\lambda_1|$, inverse iteration with a shift, and Rayleigh quotient iteration with cubic convergence for symmetric matrices
Golub, G. H. & Van Loan, C. F. — Matrix Computations (4th ed., Johns Hopkins, 2013) · §8.2 — the power method, inverse iteration, Rayleigh quotient iteration, and deflation
Parlett, B. N. — The Symmetric Eigenvalue Problem (SIAM Classics, 1998) · Ch. 4 and §4.6–§4.7 — the convergence theory of power methods and the cubic convergence of Rayleigh quotient iteration for symmetric matrices
Wilkinson, J. H. — The Algebraic Eigenvalue Problem (Oxford, 1965) · Ch. 9 — inverse iteration with a shift and the accuracy of the computed eigenvector despite the near-singular solve
von Mises, R. & Pollaczek-Geiringer, H. — Praktische Verfahren der Gleichungsauflösung · Zeitschrift für Angewandte Mathematik und Mechanik 9 (1929), 58–77 and 152–164 — the systematic introduction of the power method (Potenzmethode) for the numerical eigenvalue problem
Wielandt, H. — Beiträge zur mathematischen Behandlung komplexer Eigenwertprobleme · Bericht B 44/J/37, Aerodynamische Versuchsanstalt Göttingen (1944) — fractional (inverse) iteration with a shift for the targeted computation of eigenvalues near a chosen point

Estimated time

beginner: 18m
intermediate: 45m
master: 90m