43.07.02 · numerical-analysis / 07-iterative-krylov-methods

Arnoldi and Lanczos iterations

shipped3 tiersLean: none

Anchor (Master): Saad 2011 *Numerical Methods for Large Eigenvalue Problems* 2e (SIAM) Ch. 6 (the Arnoldi method, Ritz values, and convergence) and Ch. 4 (the Lanczos algorithm in exact and finite precision); Parlett 1998 *The Symmetric Eigenvalue Problem* (SIAM Classics) Ch. 13 (the Lanczos algorithm, Paige's loss-of-orthogonality theory, and reorthogonalisation); Trefethen-Bau 1997 *Numerical Linear Algebra* (SIAM) Lectures 33-36

Intuition Beginner

Suppose you have a matrix so enormous that you can never write it out — millions of rows — but you are allowed one cheap operation: hand it a vector, and it hands back the matrix times that vector. From this single power you want to learn the matrix's most important features: its largest eigenvalues, or the solution of a system it defines. The trick of this unit is to build, out of repeated uses of that one operation, a tiny matrix that behaves like the giant one on the part of space that matters.

Start with one vector. Multiply by the matrix to get a second; multiply again for a third; and so on. These vectors, the starting one and its successive products, span a growing space that captures more and more of how the matrix acts. That growing space is the Krylov space, named for the engineer who first used it.

There is a problem: the raw products quickly pile up almost on top of one another, all leaning toward the dominant direction, so they make a numerically useless basis. The fix is to tidy them as you go. Each time you form a new product, you strip off the parts already pointing along the earlier directions, leaving only the genuinely new direction, and you rescale it to unit length. This tidying is the Arnoldi iteration, and the leftover bookkeeping — how much of each old direction you stripped off — assembles into a small, almost-triangular matrix.

When the big matrix is symmetric, the tidying simplifies dramatically: almost every stripping coefficient turns out to be zero, so each new vector only needs to be cleaned against the previous two. That short three-term recipe is the Lanczos iteration, and it is one of the most efficient ideas in all of computation.

Visual Beginner

The picture is a single starting vector being fed through the matrix again and again, with each fresh output tidied against everything built so far, so that a neat orthogonal frame grows one column at a time alongside a small bookkeeping matrix.

Read the table top to bottom. Each row adds one new direction. The left column is the raw product the matrix hands back; the middle column is that product after the old directions are stripped away and it is rescaled to length one; the right column notes the shape of the small matrix recording the strippings.

step	raw product	tidied new direction	small matrix so far
1	start vector $b$	$q_{1}$ (just rescale $b$ )	$1 \times 1$
2	$A q_{1}$	$q_{2}$ (strip off $q_{1}$ )	$2 \times 1$ Hessenberg
3	$A q_{2}$	$q_{3}$ (strip off $q_{1}, q_{2}$ )	$3 \times 2$ Hessenberg
4	$A q_{3}$	$q_{4}$ (strip off $q_{1}, q_{2}, q_{3}$ )	$4 \times 3$ Hessenberg

The takeaway: from one vector and the ability to multiply by the matrix, you grow an orthogonal frame and a small matrix. The eigenvalues of that small matrix are excellent estimates of the big matrix's extreme eigenvalues, and the symmetric case is cheaper because the frame only needs short-term memory.

Worked example Beginner

Take the symmetric matrix and starting vector $$ A = \begin{pmatrix} 2 & 1 & 0 \ 1 & 2 & 1 \ 0 & 1 & 2 \end{pmatrix}, \qquad b = \begin{pmatrix} 1 \ 0 \ 0 \end{pmatrix}. $$ We run two steps of the symmetric (Lanczos) tidying by hand.

Step 1. The first direction is just $b$ rescaled. Since $b$ already has length one, $q_{1} = (1, 0, 0)$ .

Step 2. Multiply: $A q_{1} = (2, 1, 0)$ . The amount of this lying along $q_{1}$ is the dot product $q_{1} \cdot A q_{1} = 2$ ; call it $α_{1} = 2$ . Strip it off: $(2, 1, 0) - 2 (1, 0, 0) = (0, 1, 0)$ . This leftover has length $1$ , so $β_{2} = 1$ and the new direction is $q_{2} = (0, 1, 0)$ .

Step 3. Multiply: $A q_{2} = (1, 2, 1)$ . The amount along $q_{2}$ is $α_{2} = q_{2} \cdot A q_{2} = 2$ . The amount along the previous $q_{1}$ is exactly the $β_{2} = 1$ we already recorded. Strip both off: $(1, 2, 1) - 2 (0, 1, 0) - 1 (1, 0, 0) = (0, 0, 1)$ , of length $1$ , so $q_{3} = (0, 0, 1)$ .

The small matrix assembled from these numbers is the tridiagonal $$ T = \begin{pmatrix} 2 & 1 \ 1 & 2 \end{pmatrix}, $$ using $α_{1}, α_{2}$ on the diagonal and $β_{2}$ on the off-diagonals. Its eigenvalues are $2 \pm 1 = 1$ and $3$ .

What this tells us: after only two steps the little matrix already gives two of the three exact eigenvalues of $A$ (the full spectrum is $2, 2 \pm 2$ , so $1$ and $3$ bracket the true extremes closely). Each step needed only one matrix-times-vector and a strip against the previous two directions — the short-memory pattern that makes the symmetric case so cheap.

Check your understanding Beginner

Formal definition Intermediate+

Let $A \in F^{n \times n}$ with $F \in {R, C}$ , and let $b \in F^{n}$ be nonzero. For $m \geq 1$ the $m$ -th Krylov subspace of $A$ generated by $b$ is $$ \mathcal{K}_m(A, b) = \operatorname{span}{b, Ab, A^2 b, \dots, A^{m-1}b} = {, p(A),b : p \text{ a polynomial of degree} < m ,}. $$ The two descriptions agree because $A^{j} b = (x^{j}) (A) b$ . The dimensions are nondecreasing, $dim K_{m} \leq dim K_{m + 1}$ , and they strictly increase by one at each step until the first index $m$ at which $K_{m} = K_{m + 1}$ ; at that point $K_{m}$ is an $A$ -invariant subspace and the sequence stalls. The least such $m$ is the grade of $b$ with respect to $A$ ^{[Trefethen, L. N. & Bau, D. — Numerical Linear Algebra (SIAM, 1997)]}.

The Arnoldi iteration. The Arnoldi process orthonormalises the Krylov sequence by modified Gram-Schmidt, in the sense of 01.01.09. Set $q_{1} = b /∥ b ∥$ . For $j = 1, 2, \dots$ form $w = A q_{j}$ , orthogonalise it against the existing basis by recording $$ h_{ij} = q_i^* w \quad (1 \le i \le j), \qquad w \leftarrow w - \sum_{i=1}^{j} h_{ij}, q_i, $$ then set $h_{j + 1, j} = ∥ w ∥$ and, if $h_{j + 1, j} \neq = 0$ , $q_{j + 1} = w / h_{j + 1, j}$ . Collecting the orthonormal vectors as the columns of $Q_{m} = [q_{1} \dots q_{m}]$ and the coefficients into an $(m + 1) \times m$ upper-Hessenberg matrix $\tilde{H}_{m} = (h_{ij})$ , the construction satisfies the Arnoldi relation $$ A Q_m = Q_{m+1}\tilde H_m = Q_m H_m + h_{m+1,m}, q_{m+1} e_m^, $$ where $H_m = Q_m^ A Q_m $i s t h e l e a d in g$ m \times m $b l oc k o f$ \tilde H_m $, an u pp er - H esse nb er g ma t r i x a s in [43.06.02] . T h eco l u mn so f$ Q_m $f or man or t h o n or ma l ba s i so f$ \mathcal{K}_m(A,b) $, an d$ H_m $i s t h e ma t r i x o f t h eor t h o g o na l p r o j ec t i o n o f$ A $o n t o$ \mathcal{K}m $e x p r esse d in t ha t ba s i s . * B r e ak d o w n * —$ h{m+1,m} = 0 $— occ u r se x a c tl y w h e n$ \mathcal{K}_m $i s$ A $- in v a r ian t, in w hi c h c a se t h ee i g e n v a l u eso f$ H_m $a r ee x a c t e i g e n v a l u eso f$ A$.

The Lanczos iteration. When $A = A^{*}$ is Hermitian, $H_{m} = Q_{m}^{*} A Q_{m}$ is both upper-Hessenberg and Hermitian, hence tridiagonal; write it $T_{m}$ with real diagonal $α_{j} = q_{j}^{*} A q_{j}$ and positive off-diagonal $β_{j + 1} = h_{j + 1, j}$ . The orthogonalisation then collapses to a three-term recurrence $$ \beta_{j+1}, q_{j+1} = A q_j - \alpha_j, q_j - \beta_j, q_{j-1}, \qquad \alpha_j = q_j^* A q_j, \quad \beta_{j+1} = |A q_j - \alpha_j q_j - \beta_j q_{j-1}|, $$ with $q_{0} = 0$ , $β_{1} = 0$ . Only the two previous vectors $q_{j}, q_{j - 1}$ enter, so the Lanczos process needs $O (n)$ storage and one matrix-vector product per step, against the $O (mn)$ storage and growing orthogonalisation cost of Arnoldi.

Ritz values and Ritz vectors. Let $(θ_{i}, s_{i})$ be an eigenpair of $H_{m}$ (or $T_{m}$ ): $H_{m} s_{i} = θ_{i} s_{i}$ . The scalar $θ_{i}$ is a Ritz value and $y_{i} = Q_{m} s_{i}$ a Ritz vector; together they are the Rayleigh-Ritz approximation of an eigenpair of $A$ from the subspace $K_{m}$ . The Ritz residual measures their quality: $$ |A y_i - \theta_i y_i| = |h_{m+1,m}|,|e_m^* s_i|, $$ so a Ritz pair is accurate precisely when the last entry of its eigenvector $s_{i}$ is small or the subdiagonal $h_{m + 1, m}$ is small.

Counterexamples to common slips

The Krylov dimension need not reach $n$ . If $b$ lies in a proper $A$ -invariant subspace — for instance if $b$ is a single eigenvector — the sequence stalls early at the grade of $b$ , and the Arnoldi process breaks down (a lucky breakdown) having captured an invariant subspace exactly. Breakdown is a feature, not a failure: it signals exact invariance.
Arnoldi's $H_{m}$ is Hessenberg, not triangular. The projection $Q_{m}^{*} A Q_{m}$ has one nonzero subdiagonal because $A q_{j}$ can have a component along the genuinely new direction $q_{j + 1}$ but none beyond it; expecting a triangular projection confuses the Krylov structure with a Schur form.
The three-term recurrence is exact only for Hermitian $A$ . For a non-Hermitian matrix the coefficients $h_{ij}$ with $i < j - 1$ do not vanish, so a short recurrence does not hold; the two-sided (nonsymmetric) Lanczos process recovers a three-term recurrence only by sacrificing orthogonality, building bi-orthogonal bases instead.
Ritz values converge to the extremes first, not the interior. The Rayleigh-Ritz approximation captures the outlying eigenvalues of $A$ — the largest and smallest — long before the clustered interior ones; reading interior eigenvalues off a small $m$ is unreliable without a spectral transformation.

Key theorem with proof Intermediate+

The signature result is the Arnoldi relation itself: that modified Gram-Schmidt on the Krylov sequence produces an orthonormal basis together with a Hessenberg projection, and that this projection becomes tridiagonal exactly when $A$ is Hermitian.

Theorem (the Arnoldi factorisation and its Lanczos specialisation). Let $A \in F^{n \times n}$ , $b \neq = 0$ , and run the Arnoldi process. As long as no breakdown has occurred ( $h_{j + 1, j} \neq = 0$ for $j < m$ ), the vectors $q_{1}, \dots, q_{m}$ are orthonormal and span $K_{m} (A, b)$ , and $$ A Q_m = Q_m H_m + h_{m+1,m}, q_{m+1}, e_m^, \qquad H_m = Q_m^ A Q_m, $$ with $H_{m}$ unreduced upper-Hessenberg. If $A = A^ $t h e n$ H_m = T_m $i s t r i d ia g o na l, an d t h er ec u r r e n cer e d u ces t o t h e t h r ee - t er m f or m$ \beta_{j+1}q_{j+1} = Aq_j - \alpha_j q_j - \beta_j q_{j-1}$* ^{[Trefethen, L. N. & Bau, D. — Numerical Linear Algebra (SIAM, 1997)]}.

Proof. Orthonormality and the span claim go together by induction on $m$ . For $m = 1$ , $q_{1} = b /∥ b ∥$ is a unit vector spanning $K_{1} = span {b}$ . Assume $q_{1}, \dots, q_{m}$ are orthonormal and span $K_{m}$ . The new vector is $w = A q_{m} - \sum_{i = 1}^{m} h_{im} q_{i}$ with $h_{im} = q_{i}^{*} A q_{m}$ , so by construction $q_{i}^{*} w = q_{i}^{*} A q_{m} - h_{im} = 0$ for each $i \leq m$ ; thus $w ⊥ span {q_{1}, \dots, q_{m}}$ , and $q_{m + 1} = w /∥ w ∥$ (when $∥ w ∥ = h_{m + 1, m} \neq = 0$ ) is a unit vector orthogonal to all earlier ones. For the span, $A q_{m} \in A K_{m} \subseteq K_{m + 1}$ , and since $K_{m} \subseteq K_{m + 1}$ the corrected $w$ lies in $K_{m + 1}$ ; conversely $A^{m} b = A (A^{m - 1} b)$ and $A^{m - 1} b \in K_{m} = span {q_{1}, \dots, q_{m}}$ , so $K_{m + 1} = K_{m} + span {q_{m + 1}}$ . Hence $q_{1}, \dots, q_{m + 1}$ are orthonormal and span $K_{m + 1}$ .

The factorisation is the matrix form of the recurrence. The defining identity $A q_{j} = \sum_{i = 1}^{j + 1} h_{ij} q_{i}$ (rearranged from $w = A q_{j} - \sum_{i \leq j} h_{ij} q_{i}$ and $q_{j + 1} = w / h_{j + 1, j}$ ) says column $j$ of $A Q_{m}$ equals column $j$ of $Q_{m + 1} \tilde{H}_{m}$ . Stacking columns $j = 1, \dots, m$ gives $A Q_{m} = Q_{m + 1} \tilde{H}_{m}$ . Splitting off the last row of $\tilde{H}_{m}$ , whose only nonzero entry is $h_{m + 1, m}$ in column $m$ , yields $A Q_{m} = Q_{m} H_{m} + h_{m + 1, m} q_{m + 1} e_{m}^{*}$ . Left-multiplying by $Q_{m}^{*}$ and using $Q_{m}^{*} Q_{m} = I$ , $Q_{m}^{*} q_{m + 1} = 0$ gives $Q_{m}^{*} A Q_{m} = H_{m}$ . The matrix $H_{m}$ is upper-Hessenberg because $h_{ij} = q_{i}^{*} A q_{j} = 0$ whenever $i > j + 1$ : indeed $A q_{j} \in K_{j + 1} = span {q_{1}, \dots, q_{j + 1}}$ , so it is orthogonal to $q_{i}$ for $i > j + 1$ . It is unreduced before breakdown because each $h_{j + 1, j} = ∥ w ∥ > 0$ .

When $A = A^{*}$ , $H_{m} = Q_{m}^{*} A Q_{m}$ satisfies $H_{m}^{*} = Q_{m}^{*} A^{*} Q_{m} = Q_{m}^{*} A Q_{m} = H_{m}$ , so $H_{m}$ is Hermitian as well as upper-Hessenberg; a Hermitian upper-Hessenberg matrix has $h_{ij} = 0$ for $i > j + 1$ and, by conjugate symmetry, $h_{j i} = \overline{h_{ij}} = 0$ for $j > i + 1$ , leaving only the diagonal and the two adjacent stripes — tridiagonal. Then $h_{ij} = q_{i}^{*} A q_{j} = 0$ for $i < j - 1$ too, so the orthogonalisation sum $\sum_{i \leq j} h_{ij} q_{i}$ retains only the $i = j$ and $i = j - 1$ terms, which is the three-term recurrence with $α_{j} = h_{j j}$ and $β_{j} = h_{j, j - 1}$ . $□$

Bridge. This factorisation is the foundational reason a single matrix-vector multiply suffices to probe a giant operator: it builds toward the Krylov solvers of the chapter by exposing, at step $m$ , an orthonormal basis $Q_{m}$ and a small projected matrix $H_{m}$ whose eigenvalues are the Ritz approximations of the Advanced results. This is exactly the dense Hessenberg reduction of 43.06.02 read iteratively — there a finite sequence of Householder similarities produces $Q^{*} A Q$ Hessenberg on the whole space, here a growing sequence of Gram-Schmidt steps produces $Q_{m}^{*} A Q_{m}$ Hessenberg on a Krylov slice — and the tridiagonal Lanczos case generalises the symmetric tridiagonalisation to the Krylov setting with the same conjugate-symmetry collapse. The relation appears again in 43.07.03, where GMRES minimises the residual over $K_{m}$ by solving a small least-squares problem on $\tilde{H}_{m}$ , and in 43.07.04, where conjugate gradients is exactly Lanczos for symmetric positive-definite $A$ with the tridiagonal $T_{m}$ factored on the fly. Putting these together, the central insight is that the Arnoldi factorisation compresses the action of $A$ on a Krylov subspace into a Hessenberg matrix one tractable size smaller at each step, and the bridge is that every Krylov algorithm — for eigenvalues, for linear systems, symmetric or not — is a way of reading information off this one compression.

Exercises Intermediate+

Exercise 4 (medium, symbolic).

Derive the Ritz residual identity $∥ A y - θ y ∥ = ∣ h_{m + 1, m} ∣ ∣ e_{m}^{*} s ∣$ for a Ritz pair $(θ, y)$ with $y = Q_{m} s$ , $H_{m} s = θ s$ , $∥ s ∥ = 1$ .

Hint

Apply the Arnoldi relation $A Q_{m} = Q_{m} H_{m} + h_{m + 1, m} q_{m + 1} e_{m}^{*}$ to $s$ , then use $H_{m} s = θ s$ and the orthonormality of $q_{m + 1}$ .

Answer

Apply $A Q_{m} = Q_{m} H_{m} + h_{m + 1, m} q_{m + 1} e_{m}^{*}$ to $s$ : $A y = A Q_{m} s = Q_{m} H_{m} s + h_{m + 1, m} q_{m + 1} (e_{m}^{*} s) = θ Q_{m} s + h_{m + 1, m} (e_{m}^{*} s) q_{m + 1} = θ y + h_{m + 1, m} (e_{m}^{*} s) q_{m + 1}$ . Hence $A y - θ y = h_{m + 1, m} (e_{m}^{*} s) q_{m + 1}$ . Since $q_{m + 1}$ is a unit vector, $∥ A y - θ y ∥ = ∣ h_{m + 1, m} ∣ ∣ e_{m}^{*} s ∣ ∥ q_{m + 1} ∥ = ∣ h_{m + 1, m} ∣ ∣ e_{m}^{*} s ∣$ . The residual is small precisely when the subdiagonal $h_{m + 1, m}$ is small (near breakdown) or the last component of the eigenvector $s$ is small (the Ritz vector barely uses the newest direction).

Exercise 6 (medium, symbolic).

Show that Arnoldi breakdown at step $m$ (i.e. $h_{m + 1, m} = 0$ ) implies $K_{m} (A, b)$ is $A$ -invariant, and conclude that every eigenvalue of $H_{m}$ is then an exact eigenvalue of $A$ .

Hint

Breakdown means $A q_{m} \in span {q_{1}, \dots, q_{m}}$ . Combine with the Arnoldi relation, which loses its last term.

Answer

If $h_{m + 1, m} = 0$ the Arnoldi relation becomes $A Q_{m} = Q_{m} H_{m}$ exactly, with no remainder term. For any $v \in K_{m}$ write $v = Q_{m} c$ ; then $A v = A Q_{m} c = Q_{m} H_{m} c \in range (Q_{m}) = K_{m}$ , so $K_{m}$ is $A$ -invariant. Now if $H_{m} s = θ s$ with $s \neq = 0$ , set $y = Q_{m} s \neq = 0$ (as $Q_{m}$ has orthonormal columns); then $A y = A Q_{m} s = Q_{m} H_{m} s = θ Q_{m} s = θ y$ , so $(θ, y)$ is an exact eigenpair of $A$ . Thus at breakdown the Ritz values cease to be approximations and become exact eigenvalues — a lucky breakdown.

Exercise 7 (hard, symbolic).

Prove that in exact arithmetic the Lanczos vectors $q_{1}, \dots, q_{m}$ are mutually orthogonal, by induction using only the three-term recurrence $β_{j + 1} q_{j + 1} = A q_{j} - α_{j} q_{j} - β_{j} q_{j - 1}$ and $A = A^{*}$ .

Hint

Assume $q_{1}, \dots, q_{j}$ are orthonormal. Take the inner product of the recurrence with $q_{i}$ for $i \leq j$ , splitting into the cases $i = j$ , $i = j - 1$ , and $i < j - 1$ ; use $α_{j} = q_{j}^{*} A q_{j}$ , $β_{j} = q_{j - 1}^{*} A q_{j}$ , and self-adjointness $q_{i}^{*} A q_{j} = (A q_{i})^{*} q_{j}$ .

Answer

Induct on $j$ , the hypothesis being that $q_{1}, \dots, q_{j}$ are orthonormal. Take $q_{i}^{*}$ of $β_{j + 1} q_{j + 1} = A q_{j} - α_{j} q_{j} - β_{j} q_{j - 1}$ for $i \leq j$ . For $i = j$ : $β_{j + 1} q_{j}^{*} q_{j + 1} = q_{j}^{*} A q_{j} - α_{j} q_{j}^{*} q_{j} - β_{j} q_{j}^{*} q_{j - 1} = α_{j} - α_{j} - 0 = 0$ , using the definition $α_{j} = q_{j}^{*} A q_{j}$ and orthonormality. For $i = j - 1$ : $β_{j + 1} q_{j - 1}^{*} q_{j + 1} = q_{j - 1}^{*} A q_{j} - 0 - β_{j}$ ; and $q_{j - 1}^{*} A q_{j} = (A q_{j - 1})^{*} q_{j}$ by self-adjointness, while the recurrence at step $j - 1$ gives $A q_{j - 1} = β_{j} q_{j} + α_{j - 1} q_{j - 1} + β_{j - 1} q_{j - 2}$ , so $(A q_{j - 1})^{*} q_{j} = β_{j}$ , giving $β_{j + 1} q_{j - 1}^{*} q_{j + 1} = β_{j} - β_{j} = 0$ . For $i < j - 1$ : $β_{j + 1} q_{i}^{*} q_{j + 1} = q_{i}^{*} A q_{j} = (A q_{i})^{*} q_{j}$ , and $A q_{i} = β_{i + 1} q_{i + 1} + α_{i} q_{i} + β_{i} q_{i - 1}$ lies in $span {q_{i - 1}, q_{i}, q_{i + 1}}$ , all of whose indices are $\leq i + 1 < j$ , so $(A q_{i})^{*} q_{j} = 0$ by the inductive orthonormality. As $β_{j + 1} \neq = 0$ before breakdown, $q_{i}^{*} q_{j + 1} = 0$ for every $i \leq j$ , completing the induction. (In finite precision these identities fail to hold exactly — the subject of the loss-of-orthogonality theory below.)

Exercise 8 (hard, symbolic).

For Hermitian $A$ with eigenvalues in $[λ_{m i n}, λ_{m a x}]$ , show that the largest Ritz value $θ_{m a x}^{(m)}$ from an $m$ -step Lanczos run satisfies $θ_{m a x}^{(m)} \leq λ_{m a x}$ and is nondecreasing in $m$ . (This is the monotone convergence of the extreme Ritz value.)

Hint

$θ_{m a x}^{(m)} = max_{0 \neq = x \in K_{m}} \frac{x ^{*} A x}{x ^{*} x}$ is the Rayleigh quotient maximised over the Krylov subspace; compare it to the maximum over all of $F^{n}$ , and use $K_{m} \subseteq K_{m + 1}$ .

Answer

By the Rayleigh-Ritz characterisation, the largest eigenvalue $θ_{m a x}^{(m)}$ of $T_{m} = Q_{m}^{*} A Q_{m}$ equals $max_{0 \neq = s} \frac{s ^{*} T _{m} s}{s ^{*} s} = max_{0 \neq = x \in K_{m}} \frac{x ^{*} A x}{x ^{*} x}$ , the maximum of the Rayleigh quotient of $A$ over the subspace $K_{m}$ (writing $x = Q_{m} s$ and using $Q_{m}^{*} Q_{m} = I$ ). Since $K_{m} \subseteq F^{n}$ , this constrained maximum is at most the global maximum $max_{x \neq = 0} \frac{x ^{*} A x}{x ^{*} x} = λ_{m a x}$ , giving $θ_{m a x}^{(m)} \leq λ_{m a x}$ . Because $K_{m} \subseteq K_{m + 1}$ , the maximum over the larger subspace is at least the maximum over the smaller, so $θ_{m a x}^{(m)} \leq θ_{m a x}^{(m + 1)}$ : the largest Ritz value is nondecreasing in $m$ and bounded above by $λ_{m a x}$ , hence converges. The symmetric argument with the minimum gives $θ_{m i n}^{(m)} \geq λ_{m i n}$ nonincreasing.

Advanced results Master

Theorem 1 (exactness at the grade; finite termination). Let $ν$ be the grade of $b$ with respect to $A$ — the least $m$ with $dim K_{m} = dim K_{m + 1}$ . The Arnoldi process breaks down at step $ν$ , the subspace $K_{ν}$ is $A$ -invariant, and the $ν$ eigenvalues of $H_{ν}$ are exact eigenvalues of $A$ . The Krylov dimension increases strictly by one until the grade and then stalls, because $A^{ν} b$ is the first power expressible in lower ones; at that step the orthogonal remainder $w$ vanishes, $h_{ν + 1, ν} = 0$ , and the Arnoldi relation closes to $A Q_{ν} = Q_{ν} H_{ν}$ . Any eigenpair $H_{ν} s = θ s$ lifts to $A (Q_{ν} s) = θ (Q_{ν} s)$ , so the Ritz pairs become exact. The grade satisfies $ν \leq de g m_{A}$ , the degree of the minimal polynomial of $A$ , with equality for a cyclic (non-derogatory) $A$ and a generic $b$ ; this is the exact-arithmetic sense in which Krylov methods terminate in at most $n$ steps ^{[Saad, Y. — Numerical Methods for Large Eigenvalue Problems (2nd ed.)]}.

Theorem 2 (Kaniel-Paige-Saad convergence of extremal Ritz values). Let $A = A^ $ha v ee i g e n v a l u es$ \lambda_1 > \lambda_2 \ge \dots \ge \lambda_n $w i t h u ni t e i g e n v ec t or s$ u_1, \dots, u_n $, an d r u n L an cz os f r o m$ q_1 $w i t h$ \cos\phi_1 = |u_1^* q_1| > 0 $. T h e l a r g es tR i t z v a l u e$ \theta_1^{(m)} $a f t er$ m$ steps obeys* $$ 0 \le \lambda_1 - \theta_1^{(m)} \le (\lambda_1 - \lambda_n)\left(\frac{\tan\phi_1}{T_{m-1}(1 + 2\gamma)}\right)^2, \qquad \gamma = \frac{\lambda_1 - \lambda_2}{\lambda_2 - \lambda_n}, $$ where $T_{m - 1}$ is the degree- $(m - 1)$ Chebyshev polynomial of the first kind. Because $T_{m - 1}$ grows like $\frac{1}{2} (1 + 2 γ (1 + γ))^{m - 1}$ outside $[- 1, 1]$ , the bound decays geometrically with a rate set by the relative gap $γ$ between the dominant eigenvalue and the rest of the spectrum: a well-separated extreme eigenvalue is captured in a handful of steps, while a clustered one converges slowly. The same Chebyshev mechanism — a polynomial small on the bulk of the spectrum and large at the target — drives the conjugate-gradient error bound, and the optimal Krylov polynomial is the structural object both share ^{[Saad, Y. — Numerical Methods for Large Eigenvalue Problems (2nd ed.)]}.

Theorem 3 (Paige; loss of orthogonality equals convergence). In finite-precision Lanczos with unit roundoff $ϵ_{mach}$ , the computed vectors $\overset{q}{^}_{j}$ satisfy a perturbed recurrence $A \overset{q}{^}_{j} = β_{j + 1} \overset{q}{^}_{j + 1} + α_{j} \overset{q}{^}_{j} + β_{j} \overset{q}{^}_{j - 1} + f_{j}$ with $∥ f_{j} ∥ = O (ϵ_{mach} ∥ A ∥)$ , and the loss of orthogonality of $\overset{q}{^}_{j + 1}$ against an earlier Ritz vector is inversely proportional to that Ritz value's residual: orthogonality is lost in the direction of a Ritz vector precisely as that Ritz value converges. Paige's analysis shows the rounding errors do not corrupt the computed Ritz values arbitrarily; instead they cause the Lanczos vectors to redevelop components along already-converged Ritz directions, producing duplicate (ghost) copies of converged eigenvalues rather than wrong ones. The tridiagonal $T_{m}$ computed in floating point is the exact $T_{m}$ of a slightly larger problem, so the eigenvalue information is preserved even as the orthonormal-basis property degrades ^{[Parlett, B. N. — The Symmetric Eigenvalue Problem (SIAM Classics, 1998)]}.

Theorem 4 (reorthogonalisation strategies). Full reorthogonalisation — orthogonalising each new Lanczos vector against all previous ones — restores orthogonality to $O (ϵ_{mach})$ at the cost $O (m^{2} n)$ of Arnoldi, defeating the short recurrence. Selective reorthogonalisation orthogonalises only against converged Ritz vectors, of which there are few, restoring semi-orthogonality (inner products $O (ϵ_{mach})$ ) at cost $O (mn)$ plus the occasional Ritz solve; partial reorthogonalisation monitors the recurrence for the $ω$ -estimates of lost orthogonality and reorthogonalises only when a threshold is crossed. Semi-orthogonality is enough: the computed Ritz values remain accurate to $O (ϵ_{mach})$ as long as inner products stay below $O (ϵ_{mach})$ , which is the design target of the selective and partial schemes that keep Lanczos both cheap and reliable ^{[Parlett, B. N. — The Symmetric Eigenvalue Problem (SIAM Classics, 1998)]}.

Synthesis. The Arnoldi and Lanczos iterations are one construction — orthonormalise the Krylov sequence $b, A b, A^{2} b, \dots$ and read off the projected matrix $H_{m} = Q_{m}^{*} A Q_{m}$ — viewed under one invariant, the compression of $A$ onto a growing subspace, and the foundational reason the scheme works is that this compression is a Hessenberg (tridiagonal, when Hermitian) matrix whose eigenvalues converge to the extremes of $A$ 's spectrum. This is exactly the iterative counterpart of the dense reduction of 43.06.02: there a finite sequence of Householder similarities bands the full matrix, here a growing sequence of Gram-Schmidt steps bands a Krylov slice, and the central insight is that both routes reach the same band structure because both express $A$ in an orthonormal basis adapted to its action. The convergence theory generalises the static variational characterisation of eigenvalues into a dynamical one: the extremal Ritz values are Rayleigh quotients maximised over the nested Krylov subspaces, the Chebyshev bound of Theorem 2 is dual to the conjugate-gradient error bound that reappears in 43.07.04, and the optimal polynomial in $A$ is the object every Krylov method optimises.

The finite-precision theory is where the bridge to practice is built: Paige's theorem (Theorem 3) shows that loss of orthogonality is not a corruption of the eigenvalue information but a side effect of convergence, and the reorthogonalisation hierarchy (Theorem 4) trades exactly the storage and cost saved by the short recurrence against the orthogonality it sacrifices. Putting these together, the Arnoldi factorisation is at once the eigenvalue engine of large-scale computation and the inner machinery of the linear solvers GMRES and conjugate gradients; the bridge is that the same projected $H_{m}$ which yields Ritz values yields, by a small least-squares or tridiagonal solve, the residual-minimising and energy-minimising iterates of the rest of the chapter.

Full proof set Master

Proposition 1 (Krylov dimension and the grade). The dimensions $dim K_{m} (A, b)$ are strictly increasing in $m$ until the grade $ν$ , after which $K_{m} = K_{ν}$ for all $m \geq ν$ , and $K_{ν}$ is $A$ -invariant.

Proof. The inclusion $K_{m} \subseteq K_{m + 1}$ is immediate, so $dim K_{m}$ is nondecreasing and bounded by $n$ , hence eventually constant. Let $ν$ be the least $m$ with $dim K_{m + 1} = dim K_{m}$ , i.e. $A^{ν} b \in K_{ν} = span {b, \dots, A^{ν - 1} b}$ , so $A^{ν} b = \sum_{i = 0}^{ν - 1} c_{i} A^{i} b$ . Applying $A$ gives $A^{ν + 1} b = \sum_{i} c_{i} A^{i + 1} b \in span {A b, \dots, A^{ν} b} \subseteq K_{ν}$ (the last inclusion since $A^{ν} b \in K_{ν}$ ), so $K_{ν + 1} = K_{ν}$ , and inductively $K_{m} = K_{ν}$ for all $m \geq ν$ . For invariance, $A K_{ν} = span {A b, \dots, A^{ν} b} \subseteq K_{ν}$ by the same membership $A^{ν} b \in K_{ν}$ . Strict increase before $ν$ holds because $ν$ was chosen least. $□$

Proposition 2 (Arnoldi factorisation). Before breakdown, $Q_{m}$ has orthonormal columns spanning $K_{m}$ , and $AQ_m = Q_mH_m + h_{m+1,m}q_{m+1}e_m^ $w i t h$ H_m = Q_m^AQ_m$ upper-Hessenberg and unreduced.

Proof. The orthonormality and span are the induction of the Key theorem; the factorisation is the column-stacked recurrence $A q_{j} = \sum_{i = 1}^{j + 1} h_{ij} q_{i}$ , valid for $j \leq m$ , with the last row of $\tilde{H}_{m}$ contributing only $h_{m + 1, m} q_{m + 1} e_{m}^{*}$ . Left-multiplying $A Q_{m} = Q_{m + 1} \tilde{H}_{m}$ by $Q_{m}^{*}$ and using $Q_{m}^{*} Q_{m + 1} = [I_{m} 0]$ extracts $H_{m}$ . Upper-Hessenberg structure is Exercise 3: $A q_{j} \in K_{j + 1}$ forces $q_{i}^{*} A q_{j} = 0$ for $i > j + 1$ . Unreducedness is $h_{j + 1, j} = ∥ w_{j} ∥ > 0$ for $j < m$ , which is the no-breakdown hypothesis. $□$

Proposition 3 (tridiagonalisation and the three-term recurrence for Hermitian $A$ ). If $A = A^ $t h e n$ H_m = T_m $i sr e a l sy mm e t r i c t r i d ia g o na l, an d t h e A r n o l d i r ec u r r e n cer e d u ces t o$ \beta_{j+1}q_{j+1} = Aq_j - \alpha_j q_j - \beta_j q_{j-1} $w i t h$ \alpha_j = q_j^Aq_j \in \mathbb{R} $,$ \beta_{j+1} = h_{j+1,j} > 0$.

Proof. $H_{m}^{*} = (Q_{m}^{*} A Q_{m})^{*} = Q_{m}^{*} A^{*} Q_{m} = Q_{m}^{*} A Q_{m} = H_{m}$ , so $H_{m}$ is Hermitian; combined with upper-Hessenberg ( $h_{ij} = 0$ for $i > j + 1$ ) and conjugate symmetry ( $h_{j i} = \overline{h_{ij}}$ , hence $= 0$ for $j > i + 1$ ), only the diagonal and first off-diagonals survive, giving tridiagonal $T_{m}$ . Its diagonal entries $α_{j} = h_{j j} = q_{j}^{*} A q_{j}$ are real (a Hermitian-form diagonal), and the sub/super-diagonals are conjugate, $h_{j, j - 1} = \overline{h_{j - 1, j}}$ ; a phase rescaling of the $q_{j}$ makes them real positive $β_{j}$ . In the orthogonalisation $w = A q_{j} - \sum_{i \leq j} h_{ij} q_{i}$ , the coefficients $h_{ij}$ with $i < j - 1$ equal $q_{i}^{*} A q_{j}$ , which vanish by tridiagonality, leaving only $i = j$ (giving $α_{j} q_{j}$ ) and $i = j - 1$ (giving $β_{j} q_{j - 1}$ , since $h_{j - 1, j} = \overline{h_{j, j - 1}} = β_{j}$ ). Thus $w = A q_{j} - α_{j} q_{j} - β_{j} q_{j - 1}$ and $β_{j + 1} q_{j + 1} = w$ . $□$

Proposition 4 (Ritz residual identity and the interlacing bound). *For a Ritz pair $(θ, y = Q_{m} s)$ with $H_{m} s = θ s$ , $∥ s ∥ = 1$ , one has $∥ A y - θ y ∥ = ∣ h_{m + 1, m} ∣ ∣ e_{m}^{*} s ∣$ . For Hermitian $A$ the Ritz values interlace the spectrum: the eigenvalues of $T_{m}$ lie in $[λ_{m i n} (A), λ_{m a x} (A)]$ , and the extreme Ritz values are monotone in $m$ .*

Proof. The residual identity is Exercise 4: applying $A Q_{m} = Q_{m} H_{m} + h_{m + 1, m} q_{m + 1} e_{m}^{*}$ to $s$ gives $A y = θ y + h_{m + 1, m} (e_{m}^{*} s) q_{m + 1}$ , and $∥ q_{m + 1} ∥ = 1$ yields the norm. For the spectral bound, the eigenvalues of $T_{m} = Q_{m}^{*} A Q_{m}$ are the stationary values of the Rayleigh quotient $\frac{s ^{*} T _{m} s}{s ^{*} s} = \frac{x ^{*} A x}{x ^{*} x}$ with $x = Q_{m} s \in K_{m}$ ; this quotient is confined to the numerical range of $A$ , which for Hermitian $A$ is $[λ_{m i n}, λ_{m a x}]$ . Hence every Ritz value lies in that interval. Monotonicity follows from $K_{m} \subseteq K_{m + 1}$ : the maximum (minimum) of the Rayleigh quotient over the larger subspace is at least (at most) that over the smaller, so $θ_{m a x}^{(m)} \leq θ_{m a x}^{(m + 1)} \leq λ_{m a x}$ and $θ_{m i n}^{(m)} \geq θ_{m i n}^{(m + 1)} \geq λ_{m i n}$ . By the Cauchy interlacing theorem for the principal-submatrix-like compression, the full eigenvalues of $T_{m + 1}$ interlace those of $T_{m}$ . $□$

Proposition 5 (CG-Lanczos correspondence). For Hermitian positive-definite $A$ , the conjugate-gradient iterate $x_{m} \in x_{0} + K_{m} (A, r_{0})$ minimising the $A$ -norm of the error is recovered from the Lanczos tridiagonal $T_{m}$ by solving $T_{m} z_{m} = ∥ r_{0} ∥ e_{1}$ and setting $x_{m} = x_{0} + Q_{m} z_{m}$ ; the CG short recurrences are the $LDL^ $f a c t or i s a t i o n o f$ T_m$ updated one column at a time.*

Proof sketch. The CG error-minimising iterate over $K_{m}$ is the Galerkin solution: $r_{m} = b - A x_{m} ⊥ K_{m}$ , i.e. $Q_{m}^{*} r_{m} = 0$ . Writing $x_{m} = x_{0} + Q_{m} z_{m}$ and $r_{0} = ∥ r_{0} ∥ q_{1}$ , the condition $Q_{m}^{*} (r_{0} - A Q_{m} z_{m}) = 0$ becomes $∥ r_{0} ∥ e_{1} - Q_{m}^{*} A Q_{m} z_{m} = 0$ , i.e. $T_{m} z_{m} = ∥ r_{0} ∥ e_{1}$ . Since $A ≻ 0$ , $T_{m} = Q_{m}^{*} A Q_{m}$ is SPD and admits an $L D L^{*}$ factorisation; performing it incrementally — each new Lanczos column extends $T_{m}$ by one row and column, updating $L, D$ by one entry — turns the solve into the two coupled two-term recurrences for the CG search direction and iterate, with no need to store $Q_{m}$ . The full statement and the Chebyshev convergence bound are the content of 43.07.04; the correspondence here is that CG is Lanczos with the tridiagonal system solved on the fly. $□$

Connections Master

Eigenvalue, eigenvector, and the spectral radius 01.01.08 supplies the spectral objects the Ritz machinery approximates: the Ritz values are the eigenvalues of the small projected matrix $H_{m}$ , computed by the very theory of 01.01.08, and they converge to the extreme eigenvalues of the large $A$ . The Rayleigh-quotient characterisation that makes the extremal Ritz values monotone is the variational face of the eigenvalue theory there, lifted from single vectors to the nested Krylov subspaces; the minimal-polynomial degree that caps the Krylov grade is the same invariant that governs diagonalisability in that unit.
Reduction to Hessenberg/tridiagonal form 43.06.02 is the dense analogue this unit iterates: there a finite product of Householder similarities produces $Q^{*} A Q$ upper-Hessenberg (tridiagonal when Hermitian) on the whole of $F^{n}$ , here a growing Gram-Schmidt sequence produces $Q_{m}^{*} A Q_{m}$ with the identical band structure on a Krylov subspace. The two reach the same projected form by different routes — direct reflectors versus iterative orthogonalisation — and the band-preservation arguments are mirror images, so the present unit is the large-scale, matrix-free counterpart of that finite reduction; an Arnoldi run to completion ( $m = n$ ) is a Hessenberg reduction.
GMRES 43.07.03 and the conjugate gradient method 43.07.04 are the linear-solver engines this unit powers: GMRES minimises $∥ b - A x ∥$ over $x_{0} + K_{m}$ by solving a small least-squares problem on the Arnoldi $\tilde{H}_{m}$ via Givens rotations, while conjugate gradients is exactly the Lanczos process for SPD $A$ with the tridiagonal $T_{m}$ factored incrementally to give a short three-term iterate recurrence. Both inherit the Krylov-subspace optimality and the Chebyshev convergence mechanism established here, and the stationary-iteration splitting $M$ of 43.07.01 returns in both as a preconditioner that reshapes the spectrum to accelerate the Ritz and residual convergence.

Historical & philosophical context Master

The Krylov subspace is named for Aleksei Nikolaevich Krylov, the Russian naval engineer and applied mathematician who in 1931 used the sequence $b, A b, A^{2} b, \dots$ to compute the characteristic polynomial of a matrix arising in ship-vibration analysis, reducing the eigenvalue problem to the powers of $A$ acting on a single vector. The orthogonalised form was introduced by Walter Edwin Arnoldi in 1951, who proposed reducing a matrix to Hessenberg form by a Gram-Schmidt process on the Krylov sequence as a means of approximating eigenvalues of large matrices without forming the full reduction ^{[Arnoldi 1951]}.

The symmetric specialisation is due to Cornelius Lanczos, whose 1950 paper at the U.S. National Bureau of Standards introduced the three-term recurrence — he called it the method of minimized iterations — for tridiagonalising a symmetric matrix and, in a companion development, for solving linear systems ^{[Lanczos 1950]}. The method was initially distrusted because, in the finite-precision arithmetic of early computers, the Lanczos vectors rapidly lost orthogonality and produced spurious repeated eigenvalues; for two decades it was regarded as numerically unreliable. Christopher Conway Paige's 1971 doctoral thesis at the University of London rehabilitated it by proving that the loss of orthogonality is not random corruption but coincides exactly with the convergence of a Ritz value, so the computed eigenvalues remain accurate ^{[Paige 1971]}. Beresford Parlett and collaborators then developed the selective and partial reorthogonalisation schemes that made the method both efficient and trustworthy, consolidated in The Symmetric Eigenvalue Problem ^{[Parlett, B. N. — The Symmetric Eigenvalue Problem (SIAM Classics, 1998)]}. Yousef Saad built the general (non-symmetric) Arnoldi theory of Ritz-value convergence and the modern large-eigenvalue and Krylov-solver framework that situates Arnoldi and Lanczos as the common engine of GMRES and conjugate gradients ^{[Saad, Y. — Numerical Methods for Large Eigenvalue Problems (2nd ed.)]}.

Bibliography Master

@article{Krylov1931,
  author  = {Krylov, Aleksei N.},
  title   = {O chislennom reshenii uravneniya, kotorym v tekhnicheskikh voprosakh opredelyayutsya chastoty malykh kolebanii material'nykh sistem},
  journal = {Izvestiya Akademii Nauk SSSR, Otdelenie Matematicheskikh i Estestvennykh Nauk},
  volume  = {7},
  number  = {4},
  year    = {1931},
  pages   = {491--539}
}

@article{Arnoldi1951,
  author  = {Arnoldi, Walter E.},
  title   = {The Principle of Minimized Iterations in the Solution of the Matrix Eigenvalue Problem},
  journal = {Quarterly of Applied Mathematics},
  volume  = {9},
  number  = {1},
  year    = {1951},
  pages   = {17--29}
}

@article{Lanczos1950,
  author  = {Lanczos, Cornelius},
  title   = {An Iteration Method for the Solution of the Eigenvalue Problem of Linear Differential and Integral Operators},
  journal = {Journal of Research of the National Bureau of Standards},
  volume  = {45},
  number  = {4},
  year    = {1950},
  pages   = {255--282}
}

@phdthesis{Paige1971,
  author = {Paige, Christopher C.},
  title  = {The Computation of Eigenvalues and Eigenvectors of Very Large Sparse Matrices},
  school = {University of London},
  year   = {1971}
}

@book{Parlett1998,
  author    = {Parlett, Beresford N.},
  title     = {The Symmetric Eigenvalue Problem},
  publisher = {SIAM},
  series    = {Classics in Applied Mathematics},
  year      = {1998}
}

@book{Saad2011,
  author    = {Saad, Yousef},
  title     = {Numerical Methods for Large Eigenvalue Problems},
  edition   = {2},
  publisher = {Society for Industrial and Applied Mathematics},
  year      = {2011}
}

@book{TrefethenBau1997,
  author    = {Trefethen, Lloyd N. and Bau, David},
  title     = {Numerical Linear Algebra},
  publisher = {SIAM},
  address   = {Philadelphia},
  year      = {1997}
}

@book{GolubVanLoan2013,
  author    = {Golub, Gene H. and Van Loan, Charles F.},
  title     = {Matrix Computations},
  edition   = {4},
  publisher = {Johns Hopkins University Press},
  year      = {2013}
}

Prerequisites

01.01.08
01.01.09
43.06.02

Tier anchors

beginner: Building the best small matrix that mimics a giant one, by repeatedly multiplying a single starting vector and tidying the results — Strang 2016 *Introduction to Linear Algebra* 5e (Wellesley-Cambridge) Ch. 6 (eigenvalues and powers of a matrix); Trefethen-Bau 1997 *Numerical Linear Algebra* (SIAM) Lecture 33 (overview of the Arnoldi idea)
intermediate: Trefethen-Bau 1997 *Numerical Linear Algebra* (SIAM) Lectures 33-34 (Arnoldi iteration and the Arnoldi/Lanczos approximation of eigenvalues — the modified-Gram-Schmidt construction and the Hessenberg relation $AQ_m = Q_{m+1}\tilde H_m$); Golub-Van Loan 2013 *Matrix Computations* 4e (Johns Hopkins) §10.5 (the Lanczos and Arnoldi processes)
master: Saad 2011 *Numerical Methods for Large Eigenvalue Problems* 2e (SIAM) Ch. 6 (the Arnoldi method, Ritz values, and convergence) and Ch. 4 (the Lanczos algorithm in exact and finite precision); Parlett 1998 *The Symmetric Eigenvalue Problem* (SIAM Classics) Ch. 13 (the Lanczos algorithm, Paige's loss-of-orthogonality theory, and reorthogonalisation); Trefethen-Bau 1997 *Numerical Linear Algebra* (SIAM) Lectures 33-36

References

Trefethen, L. N. & Bau, D. — Numerical Linear Algebra (SIAM, 1997) · Lectures 33-34: the Krylov subspace K_m = span{b, Ab, ..., A^{m-1}b}, the Arnoldi iteration as modified Gram-Schmidt orthogonalisation of the Krylov sequence, the Hessenberg relation A Q_m = Q_{m+1} H-tilde_m and the projected H_m = Q_m^* A Q_m, and the Arnoldi/Rayleigh-Ritz approximation of eigenvalues (Ritz values as eigenvalues of H_m).
Saad, Y. — Numerical Methods for Large Eigenvalue Problems (2nd ed.) · SIAM, 2011. Ch. 6: the Arnoldi process, the orthogonal projection of A onto the Krylov subspace, Ritz values and Ritz vectors, the residual bound ||A y_i - theta_i y_i|| = |h_{m+1,m}| |e_m^* s_i| for a Ritz pair, and the convergence of extremal Ritz values; Ch. 4: the symmetric Lanczos algorithm, the three-term recurrence, the tridiagonal T_m, and its finite-precision behaviour.
Parlett, B. N. — The Symmetric Eigenvalue Problem (SIAM Classics, 1998) · Ch. 13: the Lanczos algorithm, the three-term recurrence beta_{j+1} q_{j+1} = A q_j - alpha_j q_j - beta_j q_{j-1}, Paige's theorem that loss of orthogonality coincides with convergence of a Ritz value, the appearance of spurious 'ghost' eigenvalues, and the full / selective / partial reorthogonalisation remedies.
Golub, G. H. & Van Loan, C. F. — Matrix Computations (4th ed.) · Johns Hopkins University Press, 2013. §10.1-10.5: Krylov subspace properties, the Arnoldi factorisation A Q_m = Q_m H_m + h_{m+1,m} q_{m+1} e_m^*, the symmetric Lanczos tridiagonalisation, the cost and storage of the two processes, and their role as the inner engine of GMRES and conjugate gradients.

Estimated time

beginner: 18m
intermediate: 45m
master: 88m