43.07.02 · numerical-analysis / 07-iterative-krylov-methods

Arnoldi and Lanczos iterations

shipped3 tiersLean: none

Anchor (Master): Saad 2011 *Numerical Methods for Large Eigenvalue Problems* 2e (SIAM) Ch. 6 (the Arnoldi method, Ritz values, and convergence) and Ch. 4 (the Lanczos algorithm in exact and finite precision); Parlett 1998 *The Symmetric Eigenvalue Problem* (SIAM Classics) Ch. 13 (the Lanczos algorithm, Paige's loss-of-orthogonality theory, and reorthogonalisation); Trefethen-Bau 1997 *Numerical Linear Algebra* (SIAM) Lectures 33-36

Intuition Beginner

Suppose you have a matrix so enormous that you can never write it out — millions of rows — but you are allowed one cheap operation: hand it a vector, and it hands back the matrix times that vector. From this single power you want to learn the matrix's most important features: its largest eigenvalues, or the solution of a system it defines. The trick of this unit is to build, out of repeated uses of that one operation, a tiny matrix that behaves like the giant one on the part of space that matters.

Start with one vector. Multiply by the matrix to get a second; multiply again for a third; and so on. These vectors, the starting one and its successive products, span a growing space that captures more and more of how the matrix acts. That growing space is the Krylov space, named for the engineer who first used it.

There is a problem: the raw products quickly pile up almost on top of one another, all leaning toward the dominant direction, so they make a numerically useless basis. The fix is to tidy them as you go. Each time you form a new product, you strip off the parts already pointing along the earlier directions, leaving only the genuinely new direction, and you rescale it to unit length. This tidying is the Arnoldi iteration, and the leftover bookkeeping — how much of each old direction you stripped off — assembles into a small, almost-triangular matrix.

When the big matrix is symmetric, the tidying simplifies dramatically: almost every stripping coefficient turns out to be zero, so each new vector only needs to be cleaned against the previous two. That short three-term recipe is the Lanczos iteration, and it is one of the most efficient ideas in all of computation.

Visual Beginner

The picture is a single starting vector being fed through the matrix again and again, with each fresh output tidied against everything built so far, so that a neat orthogonal frame grows one column at a time alongside a small bookkeeping matrix.

Read the table top to bottom. Each row adds one new direction. The left column is the raw product the matrix hands back; the middle column is that product after the old directions are stripped away and it is rescaled to length one; the right column notes the shape of the small matrix recording the strippings.

step raw product tidied new direction small matrix so far
1 start vector (just rescale )
2 (strip off ) Hessenberg
3 (strip off ) Hessenberg
4 (strip off ) Hessenberg

The takeaway: from one vector and the ability to multiply by the matrix, you grow an orthogonal frame and a small matrix. The eigenvalues of that small matrix are excellent estimates of the big matrix's extreme eigenvalues, and the symmetric case is cheaper because the frame only needs short-term memory.

Worked example Beginner

Take the symmetric matrix and starting vector $$ A = \begin{pmatrix} 2 & 1 & 0 \ 1 & 2 & 1 \ 0 & 1 & 2 \end{pmatrix}, \qquad b = \begin{pmatrix} 1 \ 0 \ 0 \end{pmatrix}. $$ We run two steps of the symmetric (Lanczos) tidying by hand.

Step 1. The first direction is just rescaled. Since already has length one, .

Step 2. Multiply: . The amount of this lying along is the dot product ; call it . Strip it off: . This leftover has length , so and the new direction is .

Step 3. Multiply: . The amount along is . The amount along the previous is exactly the we already recorded. Strip both off: , of length , so .

The small matrix assembled from these numbers is the tridiagonal $$ T = \begin{pmatrix} 2 & 1 \ 1 & 2 \end{pmatrix}, $$ using on the diagonal and on the off-diagonals. Its eigenvalues are and .

What this tells us: after only two steps the little matrix already gives two of the three exact eigenvalues of (the full spectrum is , so and bracket the true extremes closely). Each step needed only one matrix-times-vector and a strip against the previous two directions — the short-memory pattern that makes the symmetric case so cheap.

Check your understanding Beginner

Formal definition Intermediate+

Let with , and let be nonzero. For the -th Krylov subspace of generated by is $$ \mathcal{K}_m(A, b) = \operatorname{span}{b, Ab, A^2 b, \dots, A^{m-1}b} = {, p(A),b : p \text{ a polynomial of degree} < m ,}. $$ The two descriptions agree because . The dimensions are nondecreasing, , and they strictly increase by one at each step until the first index at which ; at that point is an -invariant subspace and the sequence stalls. The least such is the grade of with respect to [Trefethen, L. N. & Bau, D. — Numerical Linear Algebra (SIAM, 1997)].

The Arnoldi iteration. The Arnoldi process orthonormalises the Krylov sequence by modified Gram-Schmidt, in the sense of 01.01.09. Set . For form , orthogonalise it against the existing basis by recording $$ h_{ij} = q_i^* w \quad (1 \le i \le j), \qquad w \leftarrow w - \sum_{i=1}^{j} h_{ij}, q_i, $$ then set and, if , . Collecting the orthonormal vectors as the columns of and the coefficients into an upper-Hessenberg matrix , the construction satisfies the Arnoldi relation $$ A Q_m = Q_{m+1}\tilde H_m = Q_m H_m + h_{m+1,m}, q_{m+1} e_m^, $$ where $H_m = Q_m^ A Q_mm \times m\tilde H_mQ_m\mathcal{K}_m(A,b)H_mA\mathcal{K}mh{m+1,m} = 0\mathcal{K}_mAH_mA$.

The Lanczos iteration. When is Hermitian, is both upper-Hessenberg and Hermitian, hence tridiagonal; write it with real diagonal and positive off-diagonal . The orthogonalisation then collapses to a three-term recurrence $$ \beta_{j+1}, q_{j+1} = A q_j - \alpha_j, q_j - \beta_j, q_{j-1}, \qquad \alpha_j = q_j^* A q_j, \quad \beta_{j+1} = |A q_j - \alpha_j q_j - \beta_j q_{j-1}|, $$ with , . Only the two previous vectors enter, so the Lanczos process needs storage and one matrix-vector product per step, against the storage and growing orthogonalisation cost of Arnoldi.

Ritz values and Ritz vectors. Let be an eigenpair of (or ): . The scalar is a Ritz value and a Ritz vector; together they are the Rayleigh-Ritz approximation of an eigenpair of from the subspace . The Ritz residual measures their quality: $$ |A y_i - \theta_i y_i| = |h_{m+1,m}|,|e_m^* s_i|, $$ so a Ritz pair is accurate precisely when the last entry of its eigenvector is small or the subdiagonal is small.

Counterexamples to common slips

  • The Krylov dimension need not reach . If lies in a proper -invariant subspace — for instance if is a single eigenvector — the sequence stalls early at the grade of , and the Arnoldi process breaks down (a lucky breakdown) having captured an invariant subspace exactly. Breakdown is a feature, not a failure: it signals exact invariance.
  • Arnoldi's is Hessenberg, not triangular. The projection has one nonzero subdiagonal because can have a component along the genuinely new direction but none beyond it; expecting a triangular projection confuses the Krylov structure with a Schur form.
  • The three-term recurrence is exact only for Hermitian . For a non-Hermitian matrix the coefficients with do not vanish, so a short recurrence does not hold; the two-sided (nonsymmetric) Lanczos process recovers a three-term recurrence only by sacrificing orthogonality, building bi-orthogonal bases instead.
  • Ritz values converge to the extremes first, not the interior. The Rayleigh-Ritz approximation captures the outlying eigenvalues of — the largest and smallest — long before the clustered interior ones; reading interior eigenvalues off a small is unreliable without a spectral transformation.

Key theorem with proof Intermediate+

The signature result is the Arnoldi relation itself: that modified Gram-Schmidt on the Krylov sequence produces an orthonormal basis together with a Hessenberg projection, and that this projection becomes tridiagonal exactly when is Hermitian.

Theorem (the Arnoldi factorisation and its Lanczos specialisation). Let , , and run the Arnoldi process. As long as no breakdown has occurred ( for ), the vectors are orthonormal and span , and $$ A Q_m = Q_m H_m + h_{m+1,m}, q_{m+1}, e_m^, \qquad H_m = Q_m^ A Q_m, $$ with unreduced upper-Hessenberg. If $A = A^H_m = T_m\beta_{j+1}q_{j+1} = Aq_j - \alpha_j q_j - \beta_j q_{j-1}$* [Trefethen, L. N. & Bau, D. — Numerical Linear Algebra (SIAM, 1997)].

Proof. Orthonormality and the span claim go together by induction on . For , is a unit vector spanning . Assume are orthonormal and span . The new vector is with , so by construction for each ; thus , and (when ) is a unit vector orthogonal to all earlier ones. For the span, , and since the corrected lies in ; conversely and , so . Hence are orthonormal and span .

The factorisation is the matrix form of the recurrence. The defining identity (rearranged from and ) says column of equals column of . Stacking columns gives . Splitting off the last row of , whose only nonzero entry is in column , yields . Left-multiplying by and using , gives . The matrix is upper-Hessenberg because whenever : indeed , so it is orthogonal to for . It is unreduced before breakdown because each .

When , satisfies , so is Hermitian as well as upper-Hessenberg; a Hermitian upper-Hessenberg matrix has for and, by conjugate symmetry, for , leaving only the diagonal and the two adjacent stripes — tridiagonal. Then for too, so the orthogonalisation sum retains only the and terms, which is the three-term recurrence with and .

Bridge. This factorisation is the foundational reason a single matrix-vector multiply suffices to probe a giant operator: it builds toward the Krylov solvers of the chapter by exposing, at step , an orthonormal basis and a small projected matrix whose eigenvalues are the Ritz approximations of the Advanced results. This is exactly the dense Hessenberg reduction of 43.06.02 read iteratively — there a finite sequence of Householder similarities produces Hessenberg on the whole space, here a growing sequence of Gram-Schmidt steps produces Hessenberg on a Krylov slice — and the tridiagonal Lanczos case generalises the symmetric tridiagonalisation to the Krylov setting with the same conjugate-symmetry collapse. The relation appears again in 43.07.03, where GMRES minimises the residual over by solving a small least-squares problem on , and in 43.07.04, where conjugate gradients is exactly Lanczos for symmetric positive-definite with the tridiagonal factored on the fly. Putting these together, the central insight is that the Arnoldi factorisation compresses the action of on a Krylov subspace into a Hessenberg matrix one tractable size smaller at each step, and the bridge is that every Krylov algorithm — for eigenvalues, for linear systems, symmetric or not — is a way of reading information off this one compression.

Exercises Intermediate+

Advanced results Master

Theorem 1 (exactness at the grade; finite termination). Let be the grade of with respect to — the least with . The Arnoldi process breaks down at step , the subspace is -invariant, and the eigenvalues of are exact eigenvalues of . The Krylov dimension increases strictly by one until the grade and then stalls, because is the first power expressible in lower ones; at that step the orthogonal remainder vanishes, , and the Arnoldi relation closes to . Any eigenpair lifts to , so the Ritz pairs become exact. The grade satisfies , the degree of the minimal polynomial of , with equality for a cyclic (non-derogatory) and a generic ; this is the exact-arithmetic sense in which Krylov methods terminate in at most steps [Saad, Y. — Numerical Methods for Large Eigenvalue Problems (2nd ed.)].

Theorem 2 (Kaniel-Paige-Saad convergence of extremal Ritz values). Let $A = A^\lambda_1 > \lambda_2 \ge \dots \ge \lambda_nu_1, \dots, u_nq_1\cos\phi_1 = |u_1^* q_1| > 0\theta_1^{(m)}m$ steps obeys* $$ 0 \le \lambda_1 - \theta_1^{(m)} \le (\lambda_1 - \lambda_n)\left(\frac{\tan\phi_1}{T_{m-1}(1 + 2\gamma)}\right)^2, \qquad \gamma = \frac{\lambda_1 - \lambda_2}{\lambda_2 - \lambda_n}, $$ where is the degree- Chebyshev polynomial of the first kind. Because grows like outside , the bound decays geometrically with a rate set by the relative gap between the dominant eigenvalue and the rest of the spectrum: a well-separated extreme eigenvalue is captured in a handful of steps, while a clustered one converges slowly. The same Chebyshev mechanism — a polynomial small on the bulk of the spectrum and large at the target — drives the conjugate-gradient error bound, and the optimal Krylov polynomial is the structural object both share [Saad, Y. — Numerical Methods for Large Eigenvalue Problems (2nd ed.)].

Theorem 3 (Paige; loss of orthogonality equals convergence). In finite-precision Lanczos with unit roundoff , the computed vectors satisfy a perturbed recurrence with , and the loss of orthogonality of against an earlier Ritz vector is inversely proportional to that Ritz value's residual: orthogonality is lost in the direction of a Ritz vector precisely as that Ritz value converges. Paige's analysis shows the rounding errors do not corrupt the computed Ritz values arbitrarily; instead they cause the Lanczos vectors to redevelop components along already-converged Ritz directions, producing duplicate (ghost) copies of converged eigenvalues rather than wrong ones. The tridiagonal computed in floating point is the exact of a slightly larger problem, so the eigenvalue information is preserved even as the orthonormal-basis property degrades [Parlett, B. N. — The Symmetric Eigenvalue Problem (SIAM Classics, 1998)].

Theorem 4 (reorthogonalisation strategies). Full reorthogonalisation — orthogonalising each new Lanczos vector against all previous ones — restores orthogonality to at the cost of Arnoldi, defeating the short recurrence. Selective reorthogonalisation orthogonalises only against converged Ritz vectors, of which there are few, restoring semi-orthogonality (inner products ) at cost plus the occasional Ritz solve; partial reorthogonalisation monitors the recurrence for the -estimates of lost orthogonality and reorthogonalises only when a threshold is crossed. Semi-orthogonality is enough: the computed Ritz values remain accurate to as long as inner products stay below , which is the design target of the selective and partial schemes that keep Lanczos both cheap and reliable [Parlett, B. N. — The Symmetric Eigenvalue Problem (SIAM Classics, 1998)].

Synthesis. The Arnoldi and Lanczos iterations are one construction — orthonormalise the Krylov sequence and read off the projected matrix — viewed under one invariant, the compression of onto a growing subspace, and the foundational reason the scheme works is that this compression is a Hessenberg (tridiagonal, when Hermitian) matrix whose eigenvalues converge to the extremes of 's spectrum. This is exactly the iterative counterpart of the dense reduction of 43.06.02: there a finite sequence of Householder similarities bands the full matrix, here a growing sequence of Gram-Schmidt steps bands a Krylov slice, and the central insight is that both routes reach the same band structure because both express in an orthonormal basis adapted to its action. The convergence theory generalises the static variational characterisation of eigenvalues into a dynamical one: the extremal Ritz values are Rayleigh quotients maximised over the nested Krylov subspaces, the Chebyshev bound of Theorem 2 is dual to the conjugate-gradient error bound that reappears in 43.07.04, and the optimal polynomial in is the object every Krylov method optimises.

The finite-precision theory is where the bridge to practice is built: Paige's theorem (Theorem 3) shows that loss of orthogonality is not a corruption of the eigenvalue information but a side effect of convergence, and the reorthogonalisation hierarchy (Theorem 4) trades exactly the storage and cost saved by the short recurrence against the orthogonality it sacrifices. Putting these together, the Arnoldi factorisation is at once the eigenvalue engine of large-scale computation and the inner machinery of the linear solvers GMRES and conjugate gradients; the bridge is that the same projected which yields Ritz values yields, by a small least-squares or tridiagonal solve, the residual-minimising and energy-minimising iterates of the rest of the chapter.

Full proof set Master

Proposition 1 (Krylov dimension and the grade). The dimensions are strictly increasing in until the grade , after which for all , and is -invariant.

Proof. The inclusion is immediate, so is nondecreasing and bounded by , hence eventually constant. Let be the least with , i.e. , so . Applying gives (the last inclusion since ), so , and inductively for all . For invariance, by the same membership . Strict increase before holds because was chosen least.

Proposition 2 (Arnoldi factorisation). Before breakdown, has orthonormal columns spanning , and $AQ_m = Q_mH_m + h_{m+1,m}q_{m+1}e_m^H_m = Q_m^AQ_m$ upper-Hessenberg and unreduced.

Proof. The orthonormality and span are the induction of the Key theorem; the factorisation is the column-stacked recurrence , valid for , with the last row of contributing only . Left-multiplying by and using extracts . Upper-Hessenberg structure is Exercise 3: forces for . Unreducedness is for , which is the no-breakdown hypothesis.

Proposition 3 (tridiagonalisation and the three-term recurrence for Hermitian ). If $A = A^H_m = T_m\beta_{j+1}q_{j+1} = Aq_j - \alpha_j q_j - \beta_j q_{j-1}\alpha_j = q_j^Aq_j \in \mathbb{R}\beta_{j+1} = h_{j+1,j} > 0$.

Proof. , so is Hermitian; combined with upper-Hessenberg ( for ) and conjugate symmetry (, hence for ), only the diagonal and first off-diagonals survive, giving tridiagonal . Its diagonal entries are real (a Hermitian-form diagonal), and the sub/super-diagonals are conjugate, ; a phase rescaling of the makes them real positive . In the orthogonalisation , the coefficients with equal , which vanish by tridiagonality, leaving only (giving ) and (giving , since ). Thus and .

Proposition 4 (Ritz residual identity and the interlacing bound). *For a Ritz pair with , , one has . For Hermitian the Ritz values interlace the spectrum: the eigenvalues of lie in , and the extreme Ritz values are monotone in .*

Proof. The residual identity is Exercise 4: applying to gives , and yields the norm. For the spectral bound, the eigenvalues of are the stationary values of the Rayleigh quotient with ; this quotient is confined to the numerical range of , which for Hermitian is . Hence every Ritz value lies in that interval. Monotonicity follows from : the maximum (minimum) of the Rayleigh quotient over the larger subspace is at least (at most) that over the smaller, so and . By the Cauchy interlacing theorem for the principal-submatrix-like compression, the full eigenvalues of interlace those of .

Proposition 5 (CG-Lanczos correspondence). For Hermitian positive-definite , the conjugate-gradient iterate minimising the -norm of the error is recovered from the Lanczos tridiagonal by solving and setting ; the CG short recurrences are the $LDL^T_m$ updated one column at a time.*

Proof sketch. The CG error-minimising iterate over is the Galerkin solution: , i.e. . Writing and , the condition becomes , i.e. . Since , is SPD and admits an factorisation; performing it incrementally — each new Lanczos column extends by one row and column, updating by one entry — turns the solve into the two coupled two-term recurrences for the CG search direction and iterate, with no need to store . The full statement and the Chebyshev convergence bound are the content of 43.07.04; the correspondence here is that CG is Lanczos with the tridiagonal system solved on the fly.

Connections Master

  • Eigenvalue, eigenvector, and the spectral radius 01.01.08 supplies the spectral objects the Ritz machinery approximates: the Ritz values are the eigenvalues of the small projected matrix , computed by the very theory of 01.01.08, and they converge to the extreme eigenvalues of the large . The Rayleigh-quotient characterisation that makes the extremal Ritz values monotone is the variational face of the eigenvalue theory there, lifted from single vectors to the nested Krylov subspaces; the minimal-polynomial degree that caps the Krylov grade is the same invariant that governs diagonalisability in that unit.

  • Reduction to Hessenberg/tridiagonal form 43.06.02 is the dense analogue this unit iterates: there a finite product of Householder similarities produces upper-Hessenberg (tridiagonal when Hermitian) on the whole of , here a growing Gram-Schmidt sequence produces with the identical band structure on a Krylov subspace. The two reach the same projected form by different routes — direct reflectors versus iterative orthogonalisation — and the band-preservation arguments are mirror images, so the present unit is the large-scale, matrix-free counterpart of that finite reduction; an Arnoldi run to completion () is a Hessenberg reduction.

  • GMRES 43.07.03 and the conjugate gradient method 43.07.04 are the linear-solver engines this unit powers: GMRES minimises over by solving a small least-squares problem on the Arnoldi via Givens rotations, while conjugate gradients is exactly the Lanczos process for SPD with the tridiagonal factored incrementally to give a short three-term iterate recurrence. Both inherit the Krylov-subspace optimality and the Chebyshev convergence mechanism established here, and the stationary-iteration splitting of 43.07.01 returns in both as a preconditioner that reshapes the spectrum to accelerate the Ritz and residual convergence.

Historical & philosophical context Master

The Krylov subspace is named for Aleksei Nikolaevich Krylov, the Russian naval engineer and applied mathematician who in 1931 used the sequence to compute the characteristic polynomial of a matrix arising in ship-vibration analysis, reducing the eigenvalue problem to the powers of acting on a single vector. The orthogonalised form was introduced by Walter Edwin Arnoldi in 1951, who proposed reducing a matrix to Hessenberg form by a Gram-Schmidt process on the Krylov sequence as a means of approximating eigenvalues of large matrices without forming the full reduction [Arnoldi 1951].

The symmetric specialisation is due to Cornelius Lanczos, whose 1950 paper at the U.S. National Bureau of Standards introduced the three-term recurrence — he called it the method of minimized iterations — for tridiagonalising a symmetric matrix and, in a companion development, for solving linear systems [Lanczos 1950]. The method was initially distrusted because, in the finite-precision arithmetic of early computers, the Lanczos vectors rapidly lost orthogonality and produced spurious repeated eigenvalues; for two decades it was regarded as numerically unreliable. Christopher Conway Paige's 1971 doctoral thesis at the University of London rehabilitated it by proving that the loss of orthogonality is not random corruption but coincides exactly with the convergence of a Ritz value, so the computed eigenvalues remain accurate [Paige 1971]. Beresford Parlett and collaborators then developed the selective and partial reorthogonalisation schemes that made the method both efficient and trustworthy, consolidated in The Symmetric Eigenvalue Problem [Parlett, B. N. — The Symmetric Eigenvalue Problem (SIAM Classics, 1998)]. Yousef Saad built the general (non-symmetric) Arnoldi theory of Ritz-value convergence and the modern large-eigenvalue and Krylov-solver framework that situates Arnoldi and Lanczos as the common engine of GMRES and conjugate gradients [Saad, Y. — Numerical Methods for Large Eigenvalue Problems (2nd ed.)].

Bibliography Master

@article{Krylov1931,
  author  = {Krylov, Aleksei N.},
  title   = {O chislennom reshenii uravneniya, kotorym v tekhnicheskikh voprosakh opredelyayutsya chastoty malykh kolebanii material'nykh sistem},
  journal = {Izvestiya Akademii Nauk SSSR, Otdelenie Matematicheskikh i Estestvennykh Nauk},
  volume  = {7},
  number  = {4},
  year    = {1931},
  pages   = {491--539}
}

@article{Arnoldi1951,
  author  = {Arnoldi, Walter E.},
  title   = {The Principle of Minimized Iterations in the Solution of the Matrix Eigenvalue Problem},
  journal = {Quarterly of Applied Mathematics},
  volume  = {9},
  number  = {1},
  year    = {1951},
  pages   = {17--29}
}

@article{Lanczos1950,
  author  = {Lanczos, Cornelius},
  title   = {An Iteration Method for the Solution of the Eigenvalue Problem of Linear Differential and Integral Operators},
  journal = {Journal of Research of the National Bureau of Standards},
  volume  = {45},
  number  = {4},
  year    = {1950},
  pages   = {255--282}
}

@phdthesis{Paige1971,
  author = {Paige, Christopher C.},
  title  = {The Computation of Eigenvalues and Eigenvectors of Very Large Sparse Matrices},
  school = {University of London},
  year   = {1971}
}

@book{Parlett1998,
  author    = {Parlett, Beresford N.},
  title     = {The Symmetric Eigenvalue Problem},
  publisher = {SIAM},
  series    = {Classics in Applied Mathematics},
  year      = {1998}
}

@book{Saad2011,
  author    = {Saad, Yousef},
  title     = {Numerical Methods for Large Eigenvalue Problems},
  edition   = {2},
  publisher = {Society for Industrial and Applied Mathematics},
  year      = {2011}
}

@book{TrefethenBau1997,
  author    = {Trefethen, Lloyd N. and Bau, David},
  title     = {Numerical Linear Algebra},
  publisher = {SIAM},
  address   = {Philadelphia},
  year      = {1997}
}

@book{GolubVanLoan2013,
  author    = {Golub, Gene H. and Van Loan, Charles F.},
  title     = {Matrix Computations},
  edition   = {4},
  publisher = {Johns Hopkins University Press},
  year      = {2013}
}