Power iteration, inverse iteration, and Rayleigh quotient iteration
Anchor (Master): Trefethen-Bau *Numerical Linear Algebra* Lectures 27–28; Golub-Van Loan *Matrix Computations* (4th ed.) §8.2; Parlett *The Symmetric Eigenvalue Problem* Ch. 4 (power methods) and §4.6–§4.7 (Rayleigh quotient iteration, cubic convergence); Wilkinson *The Algebraic Eigenvalue Problem* Ch. 9 (inverse iteration)
Intuition Beginner
A matrix has a set of special directions, the eigenvectors, and each one comes with a number, its eigenvalue, that says how much the matrix stretches that direction. The largest eigenvalue belongs to the direction the matrix stretches the most. A natural question is how to find that dominant direction and its stretch factor without solving for every eigenvalue at once.
Power iteration answers with a one-line idea: pick any starting vector, multiply it by the matrix, and repeat. Each multiplication amplifies the dominant direction more than any other, because that direction is stretched the most. After enough rounds the vector points almost exactly along the dominant eigenvector, and you rescale it each step so it does not grow without bound. The leftover directions fade away at a rate set by how much the largest stretch factor beats the second largest.
Inverse iteration flips the trick to target any eigenvalue you like. You pick a guess near the eigenvalue you want, called a shift, and apply the inverse of the matrix minus that guess. This new matrix stretches most along the eigenvector whose eigenvalue sits closest to your guess, so the same repeated-multiplication idea now homes in on that one. Rayleigh quotient iteration takes the next step: after each round it uses the current vector to form a sharper guess, and for symmetric matrices this feedback makes the accuracy improve at a breathtaking pace.
Visual Beginner
The picture shows a starting vector being multiplied by a matrix again and again. At first the vector points in some arbitrary direction. After one multiplication it tilts toward the dominant eigendirection; after two it is closer still; after a handful of steps it sits almost exactly along that direction. The off-direction part shrinks by a constant factor every step, so the convergence looks like a vector swinging steadily into place.
Two facts are visible. The vector turns toward the dominant direction and the angle to it shrinks by the same ratio every step, the ratio of the second stretch factor to the first. If those two factors were close together the curve would drop slowly, and if the first dwarfs the second the curve would plunge fast.
Worked example Beginner
Take the symmetric matrix
whose eigenvalues are (along the direction ) and (along ). Start power iteration from the vector , which is not aligned with either eigendirection.
Step 1. Multiply: . Rescale by dividing by the larger entry to keep numbers tidy: .
Step 2. Multiply: . Rescale by : .
Step 3. Multiply: . Rescale by : .
Step 4. Multiply: . Rescale by : .
The second coordinate is climbing toward , which is the dominant eigendirection . The gap to shrinks roughly by half each step, matching the ratio of the two eigenvalues.
Step 5. Estimate the eigenvalue. Take the Rayleigh quotient of : compute , then the ratio .
What this tells us: a few rounds of plain matrix multiplication drove an arbitrary vector almost onto the dominant eigenvector, and the Rayleigh quotient turned that vector into the estimate for the dominant eigenvalue . No characteristic polynomial was solved; the matrix did the work by stretching.
Check your understanding Beginner
Formal definition Intermediate+
Let with , and suppose is diagonalisable with eigenvalues and a basis of eigenvectors , where and in the Euclidean norm of 01.01.08. Order the eigenvalues by modulus, . The eigenvalue is dominant when ; in that strict case the dominant eigenspace is one-dimensional and is determined up to a scalar.
Definition (power iteration). Given a start vector with , power iteration generates the sequence
together with the eigenvalue estimate by the Rayleigh quotient , the optimal scalar estimate of the eigenvalue associated with the direction , in the sense of 01.01.14. Each step is one matrix-vector product followed by a normalisation.
Definition (inverse iteration with a shift). Fix a shift that is not an eigenvalue, so is invertible. Inverse iteration generates
that is, power iteration applied to the matrix . The vector is computed by solving a linear system with the fixed matrix — a single factorisation reused across iterations — rather than by forming the inverse explicitly.
Definition (Rayleigh quotient iteration). Replace the fixed shift by the current Rayleigh quotient at every step. Starting from with and ,
The shift now varies, so each step solves a system with a different matrix, but the adaptively improving shift produces convergence far faster than a fixed shift.
Notation: is the acute angle between the lines spanned by nonzero and , with the standard measure of misalignment; is the spectrum; is the Rayleigh quotient. Throughout, "convergence of to " means , since an eigenvector is determined only up to a scalar.
Counterexamples to common slips
- Power iteration needs a strictly dominant eigenvalue. For the two eigenvalues have equal modulus, and a generic start vector oscillates between two directions without settling; the ratio gives no contraction.
- The method fails if the start vector has zero component along . For and , every iterate stays and converges to the subdominant eigenvector. In exact arithmetic this is a measure-zero accident; in floating point, rounding usually reintroduces a -component and the method recovers.
- A complex dominant pair of a real matrix has equal modulus, so real power iteration does not converge to a single vector; the dominant invariant subspace is two-dimensional and requires a block method.
Key theorem with proof Intermediate+
Theorem (linear convergence of power iteration; Trefethen-Bau Lecture 27 [source pending]; Golub-Van Loan §8.2 [source pending]). Let be diagonalisable with eigenpairs , , and a strictly dominant eigenvalue, . Write the start vector as and assume . Then the power-iteration sequence satisfies
and the Rayleigh quotient converges to . If in addition is self-adjoint, the eigenvalue error sharpens to .
Proof. Expand in the eigenbasis: . Factor out the dominant term,
where the remainder has norm bounded by
because for every . Normalising kills the scalar , so is a unit vector in the direction of . The sine of the angle to is the norm of the component of orthogonal to , which is at most ; since , this is .
For the eigenvalue estimate, the Rayleigh quotient is continuous and , so as in direction. When , the eigenbasis is orthonormal, so to leading order and is a quadratic form in the small quantity : writing with , one computes , which is . This is the same second-order stationarity that makes the Rayleigh quotient the optimal eigenvalue estimate at an approximate eigenvector.
Bridge. This convergence analysis builds toward the inverse-iteration and Rayleigh-quotient-iteration theorems of the Advanced results, where the very same eigenbasis expansion is applied to , whose eigenvalues are dominated by the one nearest the shift; the ratio that governs the rate here appears again in 43.06.03 as the convergence engine of the QR algorithm, which is simultaneous power iteration on an orthonormal basis. The foundational reason the method works is the spectral expansion : raising to a power raises each eigenvalue to that power, and the largest in modulus eventually swamps the rest. This is exactly the dynamical content of the dominant eigenvalue, and the quadratic sharpening in the self-adjoint case generalises the stationarity of the Rayleigh quotient established in 01.01.14. Putting these together, plain matrix multiplication is a convergence process whose speed is read off the spectral gap, and the bridge is the observation that a faster method must reshape the spectrum so that the targeted eigenvalue dominates more sharply — precisely what shifting and inverting accomplish.
Exercises Intermediate+
Advanced results Master
Theorem (inverse iteration converges to the eigenvalue nearest the shift; Wilkinson Ch. 9 [source pending]; Trefethen-Bau Lecture 27 [source pending]). Let be diagonalisable with eigenpairs , and let be a shift such that a unique eigenvalue minimises , with the runner-up distance . For any start vector with nonzero -component, inverse iteration satisfies
The mechanism is the eigenvalue transformation , which makes the eigenvalue nearest the dominant eigenvalue of ; the convergence theorem of the Intermediate tier then applies verbatim to . The closer the shift is to , the smaller the ratio and the faster the convergence — the practical reason inverse iteration is the standard tool for refining an eigenvector once an approximate eigenvalue is known. The system becomes severely ill-conditioned as , yet Wilkinson's analysis shows the computed points accurately in the direction of : the large error introduced by the near-singular solve lies almost entirely along itself, so after normalisation it does no harm.
Theorem (cubic convergence of Rayleigh quotient iteration for self-adjoint ; Parlett §4.6 [source pending]). Let $A = A^\lambdauux_0\sin\angle(x_0, u)$ small, Rayleigh quotient iteration produces iterates with*
For a general (non-normal) diagonalisable the convergence is quadratic, .
Cubic convergence means the number of correct digits roughly triples per step; in practice Rayleigh quotient iteration on a symmetric matrix reaches machine precision in three or four iterations once it enters the basin of attraction. The cube — rather than the square — is bought by the second-order accuracy of the Rayleigh quotient: when , the shift error is one power smaller than for a generic shift, and inverse iteration with that shift contracts the eigenvector error by an extra factor of the shift error, compounding the second-order shift accuracy with the first-order inverse-iteration contraction to give the third power.
Theorem (deflation and the full symmetric spectrum). Let $A = A^u_1, \ldots, u_n(\lambda_1, u_1)A^{(1)} = A - \lambda_1 u_1 u_1^{0} \cup {\lambda_2, \ldots, \lambda_n}$ with the same eigenvectors; iterating the deflation recovers the entire spectrum one eigenpair at a time. The deflation is the Hermitian rank-one correction that annihilates the found eigenvalue while preserving orthogonal eigendirections, and it is the conceptual seed of the block and subspace methods that compute several eigenpairs at once. Numerically, explicit deflation accumulates rounding error across eigenpairs, which is why the production algorithm of the section is instead the QR algorithm of 43.06.03, computing the full spectrum simultaneously and stably.
Theorem (shifted power iteration and spectral relocation). For any scalar , the eigenvalues of are with unchanged eigenvectors. Choosing to maximise the gap accelerates power iteration without changing the target eigenvector; for a self-adjoint with spectrum in and dominant , the optimal real shift is , centring the unwanted spectrum about zero. This spectral-relocation principle is the bridge from the single-vector methods of this unit to polynomial acceleration: replacing by a polynomial chosen to amplify the wanted eigenvalue and damp the rest is the idea behind Chebyshev acceleration and, ultimately, the Krylov methods of chapter .
Synthesis. The four algorithms of this unit are one idea viewed through successively sharper lenses, and the spectral expansion is the foundational reason they work: power iteration amplifies the eigenvalue of largest modulus, inverse iteration reshapes the spectrum by so that the eigenvalue nearest a chosen shift becomes dominant, and Rayleigh quotient iteration closes the loop by feeding the current Rayleigh-quotient estimate back as the shift. This is exactly where the variational theory of 01.01.14 pays off: the Rayleigh quotient is the optimal eigenvalue estimate at an approximate eigenvector, stationary there, so for a self-adjoint operator its error is second order in the eigenvector error, and that second-order shift accuracy compounds with the first-order contraction of inverse iteration to produce cubic convergence. The convergence rate that governs power iteration generalises to the gap ratio of any reshaped spectrum, and the same ratio appears again as the engine of the QR algorithm, which is simultaneous orthonormal power iteration.
Putting these together, the single-vector eigenvalue algorithms occupy the place between the static spectral theory — eigenvalues as roots, as critical values of the Rayleigh quotient, as min-max over subspaces — and the production eigensolvers: the central insight is that a good eigenvalue algorithm is a dynamical system whose fixed points are the eigenvectors and whose contraction rate is set by how sharply the targeted eigenvalue can be made to dominate the rest. Deflation and shifting are the two levers on that domination, and the bridge is the recognition that the QR algorithm and the Krylov methods are these same levers applied to a whole basis at once.
Full proof set Master
Proposition (power iteration, full statement and rate). Under the hypotheses of the Intermediate theorem — diagonalisable, , start vector with — the normalised iterates satisfy for a constant depending only on the and the eigenbasis conditioning.
Proof. From with , decompose any unit iterate as its -component plus an orthogonal remainder. Writing for the spectral projection onto along the other eigenvectors, is the unit vector in the direction of , so
since . The numerator is at most , and from the Intermediate bound. For large, . Hence . The projection norm is the eigenbasis conditioning, equal to when is normal and larger for non-normal .
Proposition (inverse iteration rate). With and a unique nearest eigenvalue , inverse iteration converges to at rate , where .
Proof. Set . By the computation , the eigenvalues of are with the eigenvectors of . The largest in modulus is because is largest when is smallest, which the hypothesis pins to . The second-largest is . Inverse iteration is power iteration on , so the previous proposition gives , where and is the runner-up. The uniqueness of the nearest eigenvalue ensures has a strictly dominant eigenvalue, the condition the power-iteration proposition requires.
Proposition (cubic convergence of Rayleigh quotient iteration, self-adjoint case). Let $A = A^\lambdau\gamma = \mathrm{dist}(\lambda, \sigma(A) \setminus {\lambda}) > 0x_k\theta_k = \angle(x_k, u)\sin\theta_{k+1} = O(\sin^3\theta_k)$.*
Proof. Write with , . By the self-adjoint Rayleigh-quotient expansion (Exercise 7), , so the shift error is
The next iterate is . Decompose along and : applying scales the -component by and the -component by an operator of norm at most , because restricted to the invariant subspace has spectrum at distance from and hence from . Therefore
Since for small angles, , and . Self-adjointness enters twice: to make an -invariant orthogonal complement, and to make second order in . For non-normal the shift error is only first order, , and the same argument yields the quadratic rate .
Proposition (deflation preserves the orthogonal spectrum, self-adjoint case). For $A = A^u_1\lambda_1A' = A - \lambda_1 u_1 u_1^A' u_1 = 0A' u_j = \lambda_j u_ju_j \perp u_1$.
Proof. The correction is Hermitian ( real, ), so is self-adjoint. The eigenbasis of is orthonormal, so for , , giving . On , since . The spectrum is therefore with eigenvectors unchanged.
Connections Master
The single-vector eigenvalue algorithms are the dynamical counterpart of the variational eigenvalue theory of 01.01.14: where the Rayleigh quotient there is characterised statically as the optimal eigenvalue estimate and the critical values of , here it is the adaptive shift that drives Rayleigh quotient iteration, and its second-order stationarity at an eigenvector is precisely what upgrades inverse iteration from linear to cubic convergence in the self-adjoint case.
Power iteration is the convergence engine of the QR algorithm of 43.06.03: the unshifted QR iteration is simultaneous power iteration applied to a full orthonormal basis rather than a single vector, and the Wilkinson shift that gives the QR algorithm its cubic local convergence on symmetric tridiagonals is the Rayleigh-quotient shift of this unit transplanted into the orthogonal-similarity setting.
The inverse-iteration step is a linear solve with a fixed, reused matrix, so its efficiency rests on the direct factorisations of the chapter — Gaussian elimination and the LU factorisation are computed once and applied at every iteration; the preliminary reduction to Hessenberg or tridiagonal form of 43.06.02 is what makes each of those solves cheap, which is why the practical pipeline reduces first and iterates second.
The spectral-relocation idea — replacing by or by a polynomial to reshape which eigenvalue dominates — is the seed of the Krylov-subspace methods, where the iterate lies in and a polynomial in is chosen to amplify the wanted part of the spectrum; the conjugate gradient and GMRES convergence bounds rest on exactly the polynomial-acceleration principle introduced here in its single-vector form.
Historical & philosophical context Master
The power method entered numerical practice through the 1929 paper of von Mises and Pollaczek-Geiringer, who set out the Potenzmethode as a systematic iterative procedure for the dominant eigenvalue of the matrices arising in elasticity and structural mechanics, where the characteristic polynomial was intractable to solve directly [von Mises 1929]. The idea of repeatedly applying a linear operator to expose its dominant mode was older in the analytic theory of integral equations, but von Mises gave it the matrix-iteration form that the digital era inherited.
The shifted inverse variant is due to Helmut Wielandt, who in a 1944 Göttingen aerodynamics report introduced gebrochene Iteration — fractional iteration — applying to target an eigenvalue near a chosen point , the technique now called inverse iteration or Wielandt iteration [Wielandt 1944]. Wilkinson, in The Algebraic Eigenvalue Problem (1965), supplied the rounding-error analysis that resolved the apparent paradox of the method: the linear system becomes arbitrarily ill-conditioned as the shift approaches an eigenvalue, yet the computed solution is an excellent eigenvector, because the large error is aligned with the very eigenvector being sought [Wilkinson 1965]. The Rayleigh quotient as a self-correcting shift, and the resulting cubic convergence for symmetric matrices, was analysed in detail by Parlett, whose The Symmetric Eigenvalue Problem established the cubic local rate and its dependence on the second-order accuracy of the Rayleigh quotient [Parlett 1998]. These single-vector methods were the state of the art for the dominant eigenpair until the QR algorithm of Francis and Kublanovskaya (1961) made the full spectrum computable at comparable cost.
Bibliography Master
@article{vonMises1929,
author = {von Mises, Richard and Pollaczek-Geiringer, Hilda},
title = {Praktische Verfahren der Gleichungsaufl{\"o}sung},
journal = {Zeitschrift f{\"u}r Angewandte Mathematik und Mechanik},
volume = {9},
year = {1929},
pages = {58--77, 152--164}
}
@techreport{Wielandt1944,
author = {Wielandt, Helmut},
title = {Beitr{\"a}ge zur mathematischen Behandlung komplexer Eigenwertprobleme},
institution = {Aerodynamische Versuchsanstalt G{\"o}ttingen},
number = {Bericht B 44/J/37},
year = {1944}
}
@book{Wilkinson1965,
author = {Wilkinson, James H.},
title = {The Algebraic Eigenvalue Problem},
publisher = {Oxford University Press},
address = {Oxford},
year = {1965}
}
@book{Parlett1998,
author = {Parlett, Beresford N.},
title = {The Symmetric Eigenvalue Problem},
publisher = {SIAM},
series = {Classics in Applied Mathematics},
year = {1998}
}
@book{TrefethenBau1997,
author = {Trefethen, Lloyd N. and Bau, David},
title = {Numerical Linear Algebra},
publisher = {SIAM},
address = {Philadelphia},
year = {1997}
}
@book{GolubVanLoan2013,
author = {Golub, Gene H. and Van Loan, Charles F.},
title = {Matrix Computations},
edition = {4th},
publisher = {Johns Hopkins University Press},
address = {Baltimore},
year = {2013}
}
@article{Francis1961,
author = {Francis, John G. F.},
title = {The QR Transformation, I},
journal = {The Computer Journal},
volume = {4},
number = {3},
year = {1961},
pages = {265--271}
}