43.06.04 · numerical-analysis / 06-eigenvalue-algorithms

Bauer-Fike and the conditioning of eigenvalues

shipped3 tiersLean: none

Anchor (Master): Trefethen-Bau *Numerical Linear Algebra* Lectures 34, 33; Trefethen-Embree *Spectra and Pseudospectra* (Princeton, 2005) Ch. 1-2 and §52; Stewart-Sun *Matrix Perturbation Theory* (Academic Press, 1990) Ch. IV-V; Kato *Perturbation Theory for Linear Operators* Ch. II §1-2; Golub-Van Loan *Matrix Computations* (4th ed.) §7.2-§7.3

Intuition Beginner

Eigenvalues are the special stretch factors of a matrix, and a fair question is how much they move when the matrix is nudged. If you measure a matrix from real data, every entry carries a little error, and you would like to know whether the eigenvalues you compute are trustworthy or whether a tiny change in the matrix could throw them off badly.

For some matrices the eigenvalues are rock solid: a small change in the matrix produces an equally small change in every eigenvalue. These are the well-behaved matrices, the ones whose special directions sit at right angles to each other, like the axes of a tilted but undistorted grid. Symmetric matrices are the headline example, and for them an error of a certain size moves each eigenvalue by no more than that same size.

Other matrices are touchy. Their special directions lean hard against one another, almost pointing the same way, and when directions nearly coincide the matrix is close to one that has a genuine defect. For such a matrix a microscopic change in the entries can move an eigenvalue by a large amount. The single number that measures this touchiness is how skewed the eigenvector directions are, and the Bauer-Fike theorem turns that skewness into a guaranteed bound on how far the eigenvalues can wander.

Visual Beginner

The picture contrasts two matrices under the same small perturbation. For the well-behaved matrix the eigenvalues barely twitch; for the touchy one they smear out into a wide blot. The difference is entirely in how the eigenvector directions sit relative to each other.

Two facts are visible. When the eigenvector arrows are perpendicular the wiggle circles are as small as the perturbation itself. When the arrows nearly line up the wiggle circles balloon, and their radius is the perturbation size multiplied by a skewness factor that grows without bound as the directions merge.

Worked example Beginner

Take the matrix

A = (1010002) .

Its eigenvalues are $1$ and $2$ , read straight off the diagonal because the matrix is triangular. The eigenvector for $1$ is $(1, 0)$ , and the eigenvector for $2$ turns out to be $(1000, 1)$ rescaled, which points almost flat along the horizontal. So the two eigenvector directions are nearly parallel, a warning sign of touchiness.

Step 1. Perturb the matrix in its bottom-left corner, which started at $0$ :

A + E = (1 0.001 10002) .

The change has size $0.001$ , very small.

Step 2. Find the new eigenvalues. They solve $(1 - t) (2 - t) - (1000) (0.001) = 0$ , that is $(1 - t) (2 - t) - 1 = 0$ , which expands to $t^{2} - 3 t + 1 = 0$ .

Step 3. Solve: $t = (3 \pm 9 - 4) /2 = (3 \pm 5) /2$ , so the eigenvalues are about $2.618$ and $0.382$ .

Step 4. Compare. The eigenvalues moved from $1$ and $2$ to about $0.382$ and $2.618$ — each shifted by roughly $0.6$ . A change of size $0.001$ in the matrix moved the eigenvalues by $0.6$ , about six hundred times as much.

What this tells us: the huge off-diagonal entry $1000$ makes the eigenvector directions nearly parallel, and that near-parallelism amplifies a tiny perturbation into a large eigenvalue shift. A symmetric matrix, whose eigenvectors are perpendicular, would never let this happen.

Check your understanding Beginner

Exercise (easy, multiple choice).

For which kind of matrix are the eigenvalues the least sensitive to small changes in the entries?

A. A matrix whose eigenvector directions nearly coincide. B. A symmetric matrix, whose eigenvectors are perpendicular. C. A triangular matrix with a huge off-diagonal entry. D. A matrix close to one with a repeated defective eigenvalue.

Hint

Sensitivity grows when the eigenvector directions lean against each other; it is smallest when they are at right angles.

Answer

B. A symmetric matrix, whose eigenvectors are perpendicular. Perpendicular eigenvectors give the smallest possible skewness factor, so each eigenvalue moves by no more than the size of the perturbation. Feedback-correct: symmetric and other normal matrices have perfectly conditioned eigenvalues. Feedback-wrong: nearly coinciding directions, a huge off-diagonal entry, and nearness to a defect all amplify sensitivity rather than reduce it.

Formal definition Intermediate+

Let $A \in M_{n} (C)$ be diagonalisable, $A = V Λ V^{- 1}$ , where $Λ = diag (λ_{1}, \dots, λ_{n})$ collects the eigenvalues and the columns of $V$ are right eigenvectors, $A V = V Λ$ , in the sense of 01.01.08. Let $∥ \cdot ∥_{p}$ denote a vector $p$ -norm and the matrix norm it induces, and write $κ_{p} (V) = ∥ V ∥_{p} ∥ V^{- 1} ∥_{p}$ for the condition number of the eigenvector matrix, the same quantity defined as the conditioning of a linear solve in 43.01.02. The spectrum is $σ (A) = {λ_{1}, \dots, λ_{n}}$ .

Definition (eigenvalue conditioning). A matrix $A$ is said to have well-conditioned eigenvalues when small perturbations of $A$ produce comparably small movements of $σ (A)$ , and ill-conditioned eigenvalues otherwise. The Bauer-Fike theorem below makes "comparably small" precise through the factor $κ_{p} (V)$ , and the per-eigenvalue refinement uses the left and right eigenvectors of a single eigenvalue.

Definition (left and right eigenvectors; condition number of a simple eigenvalue). Let $λ$ be a simple eigenvalue of $A$ (algebraic multiplicity one). A right eigenvector is a nonzero $x$ with $A x = λ x$ ; a left eigenvector is a nonzero $y$ with $y^{*} A = λ y^{*}$ , equivalently $A^{*} y = \overset{ˉ}{λ} y$ . For a simple eigenvalue $y^{*} x \neq = 0$ , and the condition number of the eigenvalue is

κ (λ) = \frac{∥ x ∥ _{2} ∥ y ∥ _{2}}{∣ y ^{*} x ∣}, s (λ) = \frac{1}{κ ( λ )} = \frac{∣ y ^{*} x ∣}{∥ x ∥ _{2} ∥ y ∥ _{2}},

with $x, y$ normalised to unit length giving $κ (λ) = 1/∣ y^{*} x ∣ = 1/ cos θ$ , where $θ = ∠ (x, y)$ is the acute angle between the left and right eigenvectors. The reciprocal $s (λ)$ is the separation or eigenvalue sensitivity.

Definition ( $ϵ$ -pseudospectrum). For $ϵ > 0$ , the $ϵ$ -pseudospectrum of $A$ in the norm $∥ \cdot ∥_{2}$ is

σ_{ϵ} (A) = {z \in C : z \in σ (A + E) for some E with ∥ E ∥_{2} \leq ϵ} = {z \in C : ∥ (z I - A)^{- 1} ∥_{2} \geq ϵ^{- 1}},

with the convention $∥ (z I - A)^{- 1} ∥_{2} = \infty$ for $z \in σ (A)$ . The two descriptions coincide (proved in the Advanced results). The pseudospectrum is the qualitative picture of eigenvalue sensitivity: for a normal matrix it is the union of closed $ϵ$ -discs about the eigenvalues, while for a non-normal matrix it can bulge far beyond them.

Notation: $V^{- 1}$ is the inverse of the eigenvector matrix; $y^{*}$ is the conjugate transpose of $y$ ; $∠ (x, y)$ is the acute angle between the lines spanned by $x$ and $y$ ; $σ (A)$ is the spectrum and $σ_{ϵ} (A)$ the pseudospectrum; the resolvent is $(z I - A)^{- 1}$ defined for $z \in / σ (A)$ . A matrix is normal when $A^{*} A = A A^{*}$ , which is equivalent to having a unitary eigenvector matrix and hence $κ_{2} (V) = 1$ .

Counterexamples to common slips

Bauer-Fike bounds the distance from every perturbed eigenvalue to some original eigenvalue; it does not pair them up. A perturbation can pull two eigenvalues toward a common point or split a tight pair, so the bound is a one-sided exclusion statement, not a matching.
The factor is $κ (V)$ , the conditioning of the eigenvector matrix, not $κ (A)$ . A matrix can have $κ (A)$ enormous yet perfectly conditioned eigenvalues (any normal matrix with a wide spectrum), and a matrix can have $κ (A) = 1$ in some scaling yet ill-conditioned eigenvalues — the two condition numbers measure different problems.
The per-eigenvalue formula $1/∣ y^{*} x ∣$ requires a simple eigenvalue. At a defective eigenvalue the right and left eigenvectors become orthogonal, $y^{*} x \to 0$ , the condition number blows up, and the perturbation expansion ceases to be first order: the eigenvalue moves like a fractional power $ϵ^{1/ m}$ of the perturbation for a Jordan block of size $m$ .

Key theorem with proof Intermediate+

Theorem (Bauer-Fike; Bauer-Fike 1960 ^{[source pending]}; Trefethen-Bau Lecture 34 ^{[source pending]}). Let $A \in M_{n} (C)$ be diagonalisable, $A = V Λ V^{- 1}$ , and let $μ$ be any eigenvalue of the perturbed matrix $A + E$ . Then there is an eigenvalue $λ_{i}$ of $A$ with

∣ μ - λ_{i} ∣ \leq κ_{p} (V) ∥ E ∥_{p},

where $κ_{p} (V) = ∥ V ∥_{p} ∥ V^{- 1} ∥_{p}$ and $∥ \cdot ∥_{p}$ is any induced matrix norm. In particular every eigenvalue of $A + E$ lies in the union of the closed discs of radius $κ_{p} (V) ∥ E ∥_{p}$ centred at the eigenvalues of $A$ .

Proof. If $μ \in σ (A)$ the bound holds with $∣ μ - λ_{i} ∣ = 0$ , so assume $μ \in / σ (A)$ ; then $Λ - μ I$ is invertible. Let $v \neq = 0$ satisfy $(A + E) v = μv$ , that is $(A - μ I) v = - E v$ . Substitute $A = V Λ V^{- 1}$ and apply $V^{- 1}$ on the left:

(Λ - μ I) V^{- 1} v = - V^{- 1} E v = - V^{- 1} E V (V^{- 1} v) .

Write $w = V^{- 1} v \neq = 0$ . Since $Λ - μ I$ is invertible,

w = - (Λ - μ I)^{- 1} V^{- 1} E V w .

Take norms and use submultiplicativity:

∥ w ∥_{p} \leq ∥ (Λ - μ I)^{- 1} ∥_{p} ∥ V^{- 1} ∥_{p} ∥ E ∥_{p} ∥ V ∥_{p} ∥ w ∥_{p} .

Divide by $∥ w ∥_{p} > 0$ :

1 \leq ∥ (Λ - μ I)^{- 1} ∥_{p} κ_{p} (V) ∥ E ∥_{p} .

Because $Λ - μ I$ is diagonal, its induced $p$ -norm is the largest entry modulus of its inverse, $∥ (Λ - μ I)^{- 1} ∥_{p} = max_{i} ∣ λ_{i} - μ ∣^{- 1} = (min_{i} ∣ λ_{i} - μ ∣)^{- 1}$ . Substituting,

1 \leq \frac{κ _{p} ( V ) ∥ E ∥ _{p}}{min _{i} ∣ λ _{i} - μ ∣}, hence i min ∣ λ_{i} - μ ∣ \leq κ_{p} (V) ∥ E ∥_{p},

which is the claim with $λ_{i}$ the minimising eigenvalue. $□$

Bridge. The Bauer-Fike bound builds toward the per-eigenvalue condition number $1/∣ y^{*} x ∣$ and the pseudospectral picture of the Advanced results, where the global factor $κ (V)$ is refined into one sensitivity number for each eigenvalue, and it appears again in 43.06.03 as the reason the QR algorithm can compute the eigenvalues of a normal matrix to full accuracy while a non-normal matrix limits the attainable accuracy. The foundational reason the factor is $κ (V)$ and not $κ (A)$ is that the diagonalising change of basis $V$ is exactly the lens through which the perturbation $E$ is viewed: the conjugated perturbation $V^{- 1} E V$ carries the norm inflation $κ (V)$ , and this is exactly the conditioning of the eigenvector basis rather than of the linear solve. The normal case generalises the Hermitian Weyl bound of 01.01.14: when $A$ is normal, $V$ is unitary, $κ_{2} (V) = 1$ , and Bauer-Fike collapses to $∣ μ - λ_{i} ∣ \leq ∥ E ∥_{2}$ , the central insight being that perpendicularity of eigenvectors is the precise structural condition for perfect eigenvalue conditioning. Putting these together, eigenvalue sensitivity is governed not by the size of the matrix or its inverse but by how far its eigenvectors are from orthogonal, and the bridge is the recognition that the same skewness factor reappears, sharpened to a single eigenvalue, as $1/ cos ∠ (x, y)$ .

Exercises Intermediate+

Exercise 4 (medium, symbolic).

Show that if $A$ is normal then $κ_{2} (V) = 1$ for a suitable eigenvector matrix $V$ , and conclude that Bauer-Fike sharpens to $∣ μ - λ_{i} ∣ \leq ∥ E ∥_{2}$ .

Hint

A normal matrix is unitarily diagonalisable; a unitary $V$ satisfies $∥ V ∥_{2} = ∥ V^{- 1} ∥_{2} = 1$ .

Answer

By the spectral theorem for normal operators, a normal $A$ admits an orthonormal eigenbasis, so its eigenvector matrix $V$ can be taken unitary, $V^{*} V = I$ . A unitary matrix preserves the Euclidean norm, $∥ V w ∥_{2} = ∥ w ∥_{2}$ for all $w$ , so $∥ V ∥_{2} = 1$ , and likewise $V^{- 1} = V^{*}$ is unitary with $∥ V^{- 1} ∥_{2} = 1$ . Hence $κ_{2} (V) = ∥ V ∥_{2} ∥ V^{- 1} ∥_{2} = 1$ . Substituting into Bauer-Fike, $min_{i} ∣ μ - λ_{i} ∣ \leq κ_{2} (V) ∥ E ∥_{2} = ∥ E ∥_{2}$ . So for a normal matrix the eigenvalues are perfectly conditioned in the $2$ -norm. Rubric: full credit for the unitary diagonalisation, $κ_{2} (V) = 1$ , and the sharpened bound.

Exercise 5 (medium, symbolic).

Let $A$ have a simple eigenvalue $λ$ with right eigenvector $x$ ( $A x = λ x$ ) and left eigenvector $y$ ( $y^{*} A = λ y^{*}$ ), and consider the analytic perturbation $A + tE$ . Assuming the eigenvalue varies analytically as $λ (t)$ with $λ (0) = λ$ and right eigenvector $x (t)$ , derive the first-order coefficient $λ^{'} (0) = y^{*} E x / (y^{*} x)$ .

Hint

Differentiate $(A + tE) x (t) = λ (t) x (t)$ at $t = 0$ and left-multiply by $y^{*}$ , using $y^{*} A = λ y^{*}$ .

Answer

Differentiate $(A + tE) x (t) = λ (t) x (t)$ at $t = 0$ : $$ E x + A x'(0) = \lambda'(0) x + \lambda x'(0). $$ Left-multiply by $y^{*}$ : $$ y^* E x + y^* A x'(0) = \lambda'(0), y^* x + \lambda, y^* x'(0). $$ Since $y^{*} A = λ y^{*}$ , the term $y^{*} A x^{'} (0) = λ y^{*} x^{'} (0)$ cancels the last term on the right. What remains is $y^{*} E x = λ^{'} (0) y^{*} x$ , and because $λ$ is simple $y^{*} x \neq = 0$ , so $$ \lambda'(0) = \frac{y^* E x}{y^* x}. $$ Taking norms, $∣ λ^{'} (0) ∣ \leq ∥ E ∥_{2} ∥ x ∥_{2} ∥ y ∥_{2} /∣ y^{*} x ∣ = κ (λ) ∥ E ∥_{2}$ , which is the per-eigenvalue condition number. Rubric: full credit for the differentiation, the cancellation using the left eigenvector, and the quotient.

Exercise 7 (hard, short-answer).

Prove Gershgorin's disc theorem: every eigenvalue of $A \in M_{n} (C)$ lies in the union of the discs $D_{i} = {z : ∣ z - a_{ii} ∣ \leq R_{i}}$ with $R_{i} = \sum_{j \neq = i} ∣ a_{ij} ∣$ .

Hint

Let $A x = λ x$ and let $k$ be an index where $∣ x_{k} ∣$ is largest. Look at the $k$ -th component of the eigenvector equation.

Answer

Let $λ$ be an eigenvalue with eigenvector $x \neq = 0$ , and choose $k$ with $∣ x_{k} ∣ = max_{j} ∣ x_{j} ∣ > 0$ . The $k$ -th row of $A x = λ x$ reads $\sum_{j} a_{k j} x_{j} = λ x_{k}$ , that is $$ (\lambda - a_{kk}) x_k = \sum_{j \neq k} a_{kj} x_j. $$ Take moduli and use the triangle inequality: $$ |\lambda - a_{kk}|,|x_k| \le \sum_{j \neq k} |a_{kj}|,|x_j| \le \Big(\sum_{j \neq k}|a_{kj}|\Big) |x_k| = R_k,|x_k|, $$ where the second step uses $∣ x_{j} ∣ \leq ∣ x_{k} ∣$ for every $j$ . Dividing by $∣ x_{k} ∣ > 0$ gives $∣ λ - a_{k k} ∣ \leq R_{k}$ , so $λ \in D_{k} \subseteq ⋃_{i} D_{i}$ . Since $λ$ was arbitrary, every eigenvalue lies in the union of the discs. Rubric: full credit for the largest-component choice, the $k$ -th row inequality, and the disc conclusion.

Exercise 8 (hard, symbolic).

Show that for $z \in / σ (A)$ the resolvent norm controls perturbation reachability: if $∥ (z I - A)^{- 1} ∥_{2} \geq 1/ ϵ$ then there exists $E$ with $∥ E ∥_{2} \leq ϵ$ such that $z \in σ (A + E)$ . (This is one inclusion of the pseudospectrum equivalence.)

Hint

Let $u$ be a unit vector with $∥ (z I - A)^{- 1} u ∥_{2} = ∥ (z I - A)^{- 1} ∥_{2}$ , set $v = (z I - A)^{- 1} u$ normalised, and build a rank-one $E$ sending $v$ to make $z$ an eigenvalue.

Answer

Let $R = (z I - A)^{- 1}$ with $∥ R ∥_{2} = ∥ R u ∥_{2}$ for some unit vector $u$ , and put $v = R u$ , so $∥ v ∥_{2} = ∥ R ∥_{2} \geq 1/ ϵ$ and $(z I - A) v = u$ , that is $A v = z v - u$ . Define the rank-one perturbation $$ E = \frac{u, v^}{|v|_2^2}, \qquad |E|_2 = \frac{|u|_2,|v|_2}{|v|_2^2} = \frac{1}{|v|_2} \le \epsilon. $$ Then $E v = u,(v^ v)/|v|_2^2 = u$, so $$ (A + E)v = Av + Ev = (zv - u) + u = zv. $$ Thus $z \in σ (A + E)$ with $∥ E ∥_{2} = 1/∥ v ∥_{2} \leq ϵ$ . Rubric: full credit for the rank-one construction, the norm bound $\leq ϵ$ , and the verification $(A + E) v = z v$ .

Advanced results Master

Theorem (condition number of a simple eigenvalue; Stewart-Sun Ch. V ^{[source pending]}; Kato Ch. II §2 ^{[source pending]}). Let $λ$ be a simple eigenvalue of $A$ with unit right eigenvector $x$ and unit left eigenvector $y$ , $A x = λ x$ , $y^ A = \lambda y^ $. U n d er t h e ana l y t i c p er t u r ba t i o n$ A + tE $t h ee i g e n v a l u e b r an c h$ \lambda(t) $w i t h$ \lambda(0) = \lambda $i s ana l y t i c n e a r$ t = 0$ with

λ (t) = λ + t \frac{y ^{*} E x}{y ^{*} x} + O (t^{2}), ∣ λ (t) - λ ∣ \leq \frac{∥ E ∥ _{2}}{∣ y ^{*} x ∣} ∣ t ∣ + O (t^{2}) .

The condition number $\kappa(\lambda) = 1/|y^ x| = 1/\cos\angle(x, y) $g o v er n s t h e f i r s t - or d er se n s i t i v i t y, an d i t e q u a l s$ 1$ exactly when the left and right eigenvectors are parallel.*

For a normal matrix the left and right eigenvectors of a simple eigenvalue coincide up to phase, so $∣ y^{*} x ∣ = 1$ and $κ (λ) = 1$ : this is the per-eigenvalue refinement of the global statement $κ_{2} (V) = 1$ from Bauer-Fike. The condition number is independent of any scaling of $x$ and $y$ because the ratio $∥ x ∥∥ y ∥/∣ y^{*} x ∣$ is homogeneous of degree zero in each. As $A$ approaches a matrix with a defective $λ$ , the eigenvectors $x$ and $y$ rotate toward orthogonality, $y^{*} x \to 0$ , and $κ (λ) \to \infty$ , the analytic boundary at which the first-order expansion fails and the eigenvalue begins to move like a fractional power of the perturbation.

Theorem (defective eigenvalue: fractional-power perturbation; Kato Ch. II §1 ^{[source pending]}). Let $λ$ be an eigenvalue of $A$ whose largest Jordan block has size $m \geq 2$ . For a generic perturbation $A + tE$ , the $m$ eigenvalues splitting off from $λ$ behave like

λ_{k} (t) = λ + c (t)^{1/ m} ω^{k} + o (t^{1/ m}), ω = e^{2 π i / m}, k = 0, \dots, m - 1,

a Puiseux series in $t^{1/ m}$ rather than a Taylor series in $t$ . A defective eigenvalue is therefore infinitely ill-conditioned in the first-order sense: a perturbation of size $ϵ$ moves it by order $ϵ^{1/ m}$ , which dwarfs $ϵ$ as $ϵ \to 0$ . The companion matrix of a polynomial with a repeated root exhibits exactly this, and it is why root-finding through the companion eigenproblem inherits the conditioning of multiple roots. The Bauer-Fike factor $κ (V)$ diverges in this limit because $V$ becomes singular as the eigenvectors merge, consistent with the breakdown of diagonalisability.

Theorem (pseudospectrum: three equivalent definitions; Trefethen-Embree Ch. 1-2 ^{[source pending]}). For $A \in M_{n} (C)$ and $ϵ > 0$ , the following sets coincide:

σ_{ϵ} (A) = {z : ∥ (z I - A)^{- 1} ∥_{2} \geq ϵ^{- 1}} = {z : z \in σ (A + E), ∥ E ∥_{2} \leq ϵ} = {z : σ_{m i n} (z I - A) \leq ϵ},

where $σ_{m i n}$ is the smallest singular value (with $∥ (z I - A)^{- 1} ∥ = \infty$ for $z \in σ (A)$ ). The resolvent-norm form makes the pseudospectrum computable as a level set of $z \mapsto σ_{m i n} (z I - A)$ , the perturbation form gives its meaning as the reachable spectra under bounded perturbations, and the singular-value form is the bridge between the two. For a normal $A$ the resolvent norm is exactly $1/ dist (z, σ (A))$ , so $σ_{ϵ} (A)$ is the union of $ϵ$ -discs about the eigenvalues; for a non-normal $A$ the resolvent norm can be enormous far from the spectrum, and the pseudospectrum bulges accordingly. The pseudospectrum, not the spectrum, governs transient growth of $∥ e^{t A} ∥$ and $∥ A^{k} ∥$ and the behaviour of non-normal iterations, which is why a non-normal eigenvalue computation reports the spectrum but the pseudospectrum reports the trustworthy information.

Theorem (Bauer-Fike via pseudospectra; the disc-inflation form). For diagonalisable $A = V Λ V^{- 1}$ and every $ϵ > 0$ ,

σ_{ϵ} (A) \subseteq i ⋃ {z : ∣ z - λ_{i} ∣ \leq κ_{2} (V) ϵ} .

Equivalently, the resolvent obeys $∥ (z I - A)^{- 1} ∥_{2} \leq κ_{2} (V) / dist (z, σ (A))$ . This is Bauer-Fike restated at the level of sets: the $ϵ$ -pseudospectrum is contained in the eigenvalues' discs inflated by the eigenvector conditioning. The resolvent bound is the engine — it is proved by the same conjugation $V^{- 1} (z I - A)^{- 1} V = (z I - Λ)^{- 1}$ that drove the theorem in the Intermediate tier, now read as a statement about the resolvent rather than a single perturbed eigenvector. The inclusion is generally not tight: the genuine pseudospectrum of a non-normal matrix can be far smaller than the Bauer-Fike disc union, which is why pseudospectra are the sharp tool and Bauer-Fike the convenient upper bound.

Synthesis. Eigenvalue conditioning is one structural fact at three magnifications, and the diagonalisation $A = V Λ V^{- 1}$ is the foundational reason the bounds take their form: Bauer-Fike charges the global $κ (V)$ , the per-eigenvalue theorem refines this to $1/∣ y^{*} x ∣ = 1/ cos ∠ (x, y)$ for a simple eigenvalue, and the pseudospectrum draws the exact reachable set as a resolvent level curve. This is exactly where non-normality becomes visible: a normal matrix has unitary $V$ , so $κ_{2} (V) = 1$ , the left and right eigenvectors coincide, $∣ y^{*} x ∣ = 1$ , and the pseudospectrum is a union of round discs, so Bauer-Fike collapses to the Hermitian Weyl bound of 01.01.14. A non-normal matrix breaks all three: skewed eigenvectors inflate $κ (V)$ , the left and right eigenvectors rotate apart so $∣ y^{*} x ∣$ shrinks, and the resolvent norm balloons far from the spectrum so the pseudospectrum bulges. The central insight is that eigenvalue sensitivity measures departure from normality, and the bridge from algebra to numerics is that the same factor controls how accurately any backward-stable algorithm — the QR algorithm of 43.06.03 above all — can report an eigenvalue: the computed spectrum is the exact spectrum of a nearby matrix, and Bauer-Fike turns that backward error into a forward error of size $κ (V) ∥ E ∥$ . Putting these together, the static perturbation theory of this unit tells the iterative algorithms of the chapter when their output can be trusted and when only the pseudospectrum carries meaning.

Full proof set Master

Proposition (Bauer-Fike, full statement and proof). For diagonalisable $A = V Λ V^{- 1}$ and any eigenvalue $μ$ of $A + E$ , $min_{i} ∣ μ - λ_{i} ∣ \leq κ_{p} (V) ∥ E ∥_{p}$ for every induced matrix norm.

Proof. If $μ \in σ (A)$ the claim is immediate. Otherwise let $(A + E) v = μv$ with $v \neq = 0$ , so $(A - μ I) v = - E v$ . With $A = V Λ V^{- 1}$ and $w = V^{- 1} v \neq = 0$ , left-multiplication by $V^{- 1}$ gives $(Λ - μ I) w = - V^{- 1} E V w$ , hence $w = - (Λ - μ I)^{- 1} V^{- 1} E V w$ since $Λ - μ I$ is invertible. Submultiplicativity yields $$ |w|_p \le |(\Lambda - \mu I)^{-1}|_p,|V^{-1}|_p,|E|_p,|V|_p,|w|_p. $$ Cancel $∥ w ∥_{p} > 0$ and use that for a diagonal matrix $∥ (Λ - μ I)^{- 1} ∥_{p} = (min_{i} ∣ λ_{i} - μ ∣)^{- 1}$ in any induced norm, because the induced norm of a diagonal matrix is its largest entry modulus. Rearranging gives $min_{i} ∣ λ_{i} - μ ∣ \leq κ_{p} (V) ∥ E ∥_{p}$ . $□$

Proposition (first-order eigenvalue perturbation, simple eigenvalue). *Let $λ$ be a simple eigenvalue of $A$ with right eigenvector $x$ and left eigenvector $y$ , $y^{*} x \neq = 0$ . The branch $λ (t)$ of $A + tE$ with $λ (0) = λ$ is analytic at $t = 0$ and $λ^{'} (0) = y^{*} E x / (y^{*} x)$ .*

Proof. Simplicity of $λ$ means the spectral projection $P = x y^{*} / (y^{*} x)$ onto the eigenline is well defined and the resolvent $(ζ I - A)^{- 1}$ has a simple pole at $ζ = λ$ . By the analytic perturbation theory of an isolated simple eigenvalue, $λ (t)$ and a right eigenvector $x (t)$ depend analytically on $t$ for small $t$ . Differentiate the eigenrelation $(A + tE) x (t) = λ (t) x (t)$ at $t = 0$ : $$ Ex + Ax'(0) = \lambda'(0)x + \lambda x'(0). $$ Apply $y^{*}$ on the left and use $y^{*} A = λ y^{*}$ to cancel $y^{*} A x^{'} (0) = λ y^{*} x^{'} (0)$ against $λ y^{*} x^{'} (0)$ : $$ y^*Ex = \lambda'(0),y^*x. $$ Dividing by $y^{*} x \neq = 0$ gives $λ^{'} (0) = y^{*} E x / (y^{*} x)$ . The bound $∣ λ^{'} (0) ∣ \leq ∥ E ∥_{2} ∥ x ∥_{2} ∥ y ∥_{2} /∣ y^{*} x ∣ = κ (λ) ∥ E ∥_{2}$ follows from Cauchy-Schwarz. $□$

Proposition (left and right eigenvectors of a normal matrix coincide). *If $A$ is normal and $λ$ is a simple eigenvalue with unit right eigenvector $x$ , then $y = x$ is a unit left eigenvector, so $∣ y^{*} x ∣ = 1$ and $κ (λ) = 1$ .*

Proof. A normal matrix has $A^{*} A = A A^{*}$ , so $A$ and $A^{*}$ share eigenvectors: if $A x = λ x$ with $∥ x ∥ = 1$ , then $A^{*} x = \overset{ˉ}{λ} x$ . The latter says $x^{*} A = λ x^{*}$ , so $x$ is a left eigenvector for $λ$ . Taking $y = x$ , $y^{*} x = x^{*} x = 1$ and $κ (λ) = ∥ x ∥∥ y ∥/∣ y^{*} x ∣ = 1$ . The standard route is via the spectral theorem for normal operators, which produces an orthonormal eigenbasis; the eigenvector matrix $V$ assembling that basis is unitary, giving $κ_{2} (V) = 1$ in agreement with the per-eigenvalue value. $□$

Proposition (pseudospectrum equivalence: resolvent and perturbation forms). For $z \in / σ (A)$ , $∥ (z I - A)^{- 1} ∥_{2} \geq ϵ^{- 1}$ if and only if $z \in σ (A + E)$ for some $E$ with $∥ E ∥_{2} \leq ϵ$ .

Proof. ( $\Leftarrow$ ) Suppose $(A + E) v = z v$ with $v \neq = 0$ and $∥ E ∥_{2} \leq ϵ$ . Then $(z I - A) v = E v$ , so $v = (z I - A)^{- 1} E v$ and $∥ v ∥_{2} \leq ∥ (z I - A)^{- 1} ∥_{2} ∥ E ∥_{2} ∥ v ∥_{2} \leq ∥ (z I - A)^{- 1} ∥_{2} ϵ ∥ v ∥_{2}$ . Cancelling $∥ v ∥_{2}$ gives $∥ (z I - A)^{- 1} ∥_{2} \geq ϵ^{- 1}$ .

( $\Rightarrow$ ) Suppose $∥ (z I - A)^{- 1} ∥_{2} \geq ϵ^{- 1}$ . Let $u$ be a unit vector attaining the operator norm, $∥ (z I - A)^{- 1} u ∥_{2} = ∥ (z I - A)^{- 1} ∥_{2} =: β \geq ϵ^{- 1}$ , and set $v = (z I - A)^{- 1} u$ , so $∥ v ∥_{2} = β$ and $(z I - A) v = u$ , that is $A v = z v - u$ . Define the rank-one perturbation $E = u v^{*} /∥ v ∥_{2}^{2}$ , with $∥ E ∥_{2} = ∥ u ∥_{2} ∥ v ∥_{2} /∥ v ∥_{2}^{2} = 1/ β \leq ϵ$ . Then $E v = u (v^{*} v) /∥ v ∥_{2}^{2} = u$ , so $(A + E) v = A v + E v = (z v - u) + u = z v$ . Hence $z \in σ (A + E)$ with $∥ E ∥_{2} \leq ϵ$ . The third form follows from $∥ (z I - A)^{- 1} ∥_{2} = 1/ σ_{m i n} (z I - A)$ , a standard singular-value identity. $□$

Proposition (Bauer-Fike resolvent bound). For diagonalisable $A = V Λ V^{- 1}$ and $z \in / σ (A)$ , $∥ (z I - A)^{- 1} ∥_{2} \leq κ_{2} (V) / dist (z, σ (A))$ .

Proof. Conjugating, $(z I - A)^{- 1} = V (z I - Λ)^{- 1} V^{- 1}$ , since $z I - A = V (z I - Λ) V^{- 1}$ . Taking $2$ -norms and using submultiplicativity, $$ |(zI - A)^{-1}|_2 \le |V|_2,|(zI - \Lambda)^{-1}|_2,|V^{-1}|_2 = \kappa_2(V),|(zI - \Lambda)^{-1}|_2. $$ The diagonal resolvent has $∥ (z I - Λ)^{- 1} ∥_{2} = max_{i} ∣ z - λ_{i} ∣^{- 1} = 1/ dist (z, σ (A))$ , giving the bound. Combining with the perturbation form of the previous proposition: if $z \in σ_{ϵ} (A)$ then $ϵ^{- 1} \leq ∥ (z I - A)^{- 1} ∥_{2} \leq κ_{2} (V) / dist (z, σ (A))$ , so $dist (z, σ (A)) \leq κ_{2} (V) ϵ$ , recovering the disc-inflation form of Bauer-Fike. $□$

Connections Master

The eigenvalue conditioning of this unit is the non-normal completion of the Hermitian perturbation theory of 01.01.14: where Weyl's inequalities and Cauchy interlacing give $∣ λ_{k} (A + E) - λ_{k} (A) ∣ \leq ∥ E ∥_{2}$ for self-adjoint $A$ — the case $κ_{2} (V) = 1$ — Bauer-Fike supplies the general diagonalisable bound $κ_{2} (V) ∥ E ∥_{2}$ , and the two agree exactly when $A$ is normal, so the symmetric theory is recovered as the perfectly conditioned boundary case rather than re-proved.

The factor $κ (V)$ is the eigenvector-matrix instance of the conditioning theory of 43.01.02: that unit defines $κ (A) = ∥ A ∥∥ A^{- 1} ∥$ as the sensitivity of a linear solve, and here the identical quantity applied to the diagonaliser $V$ measures the sensitivity of the eigenproblem — the same conditioning concept, attached to a different matrix, which is why a matrix can be well-conditioned for solving yet ill-conditioned for its eigenvalues.

Bauer-Fike is the accuracy verdict on the QR algorithm of 43.06.03: a backward-stable eigensolver returns the exact spectrum of $A + E$ with $∥ E ∥ = O (ϵ_{mach} ∥ A ∥)$ , and this unit converts that backward error into the forward eigenvalue error $κ (V) O (ϵ_{mach} ∥ A ∥)$ , so a normal matrix yields full-accuracy eigenvalues while a non-normal one limits attainable accuracy regardless of the algorithm's stability.

The condition number $1/∣ y^{*} x ∣$ built from left and right eigenvectors connects to the spectral-projection and biorthogonality structure underlying 43.06.01: the same left eigenvector that measures sensitivity is the one that makes the spectral projection $P = x y^{*} / (y^{*} x)$ , and the deflation step of power-iteration variants relies on exactly this biorthogonal pairing, so a poorly separated $y^{*} x$ degrades both the conditioning and the numerical deflation simultaneously.

Historical & philosophical context Master

The exclusion theorem now universally called Bauer-Fike appeared in the 1960 Numerische Mathematik paper of Friedrich Bauer and C. T. Fike, "Norms and exclusion theorems," which packaged a family of eigenvalue-localisation results around the single inequality $min_{i} ∣ μ - λ_{i} ∣ \leq κ_{p} (V) ∥ E ∥_{p}$ and connected eigenvalue perturbation to the conditioning of the diagonalising transformation ^{[Bauer 1960]}. The result generalised the much older disc-localisation of Semyon Gershgorin, whose 1931 paper bounded the spectrum by the union of row discs $∣ z - a_{ii} ∣ \leq \sum_{j \neq = i} ∣ a_{ij} ∣$ and is the diagonally-dominant special case that Bauer-Fike subsumes when $A$ itself is taken as a perturbation of its diagonal ^{[Gershgorin 1931]}.

The first-order perturbation formula $\dot{λ} = y^{*} E x / (y^{*} x)$ and the analytic theory of a simple eigenvalue, together with the Puiseux-series behaviour at a defective eigenvalue, were placed on their definitive footing by Tosio Kato in Perturbation Theory for Linear Operators, whose treatment of the analytic perturbation of isolated eigenvalues remains the standard reference ^{[Kato 1980]}. Stewart and Sun's Matrix Perturbation Theory consolidated the matrix-analytic side, defining the eigenvalue condition number through the biorthogonal left and right eigenvectors and developing the sensitivity theory used in numerical practice ^{[Stewart 1990]}. The pseudospectral viewpoint, which reframes non-normal sensitivity as the geometry of resolvent level sets, was developed and popularised by Lloyd Trefethen and collaborators through the 1990s and synthesised in Trefethen and Embree's Spectra and Pseudospectra, establishing that for non-normal matrices the pseudospectrum, not the spectrum, predicts the behaviour of matrix powers, exponentials, and iterative solvers ^{[Trefethen 2005]}.

Bibliography Master

@article{BauerFike1960,
  author  = {Bauer, Friedrich L. and Fike, C. T.},
  title   = {Norms and exclusion theorems},
  journal = {Numerische Mathematik},
  volume  = {2},
  year    = {1960},
  pages   = {137--141}
}

@article{Gershgorin1931,
  author  = {Gershgorin, Semyon Aronovich},
  title   = {{\"U}ber die Abgrenzung der Eigenwerte einer Matrix},
  journal = {Izvestiya Akademii Nauk SSSR, Otdelenie Fiziko-Matematicheskikh Nauk},
  volume  = {6},
  year    = {1931},
  pages   = {749--754}
}

@book{Kato1980,
  author    = {Kato, Tosio},
  title     = {Perturbation Theory for Linear Operators},
  edition   = {2nd},
  publisher = {Springer-Verlag},
  address   = {Berlin},
  year      = {1980}
}

@book{StewartSun1990,
  author    = {Stewart, G. W. and Sun, Ji-Guang},
  title     = {Matrix Perturbation Theory},
  publisher = {Academic Press},
  address   = {Boston},
  year      = {1990}
}

@book{TrefethenEmbree2005,
  author    = {Trefethen, Lloyd N. and Embree, Mark},
  title     = {Spectra and Pseudospectra: The Behavior of Nonnormal Matrices and Operators},
  publisher = {Princeton University Press},
  address   = {Princeton},
  year      = {2005}
}

@book{TrefethenBau1997,
  author    = {Trefethen, Lloyd N. and Bau, David},
  title     = {Numerical Linear Algebra},
  publisher = {SIAM},
  address   = {Philadelphia},
  year      = {1997}
}

@book{HornJohnson2013,
  author    = {Horn, Roger A. and Johnson, Charles R.},
  title     = {Matrix Analysis},
  edition   = {2nd},
  publisher = {Cambridge University Press},
  year      = {2013}
}

@book{GolubVanLoan2013,
  author    = {Golub, Gene H. and Van Loan, Charles F.},
  title     = {Matrix Computations},
  edition   = {4th},
  publisher = {Johns Hopkins University Press},
  address   = {Baltimore},
  year      = {2013}
}

Prerequisites

01.01.08
01.01.14
43.01.02

Tier anchors

beginner: Wiggling a matrix a little and watching its eigenvalues move — a lot when the eigenvectors lean against each other, barely at all when they sit at right angles — Trefethen-Bau *Numerical Linear Algebra* Lecture 34 (introduction to pseudospectra); 3Blue1Brown *Essence of Linear Algebra* Ch. 14 (eigenvectors as the directions a matrix acts simply on)
intermediate: Trefethen-Bau *Numerical Linear Algebra* Lecture 34 (eigenvalue perturbation, the Bauer-Fike theorem, condition of an eigenvalue); Golub-Van Loan *Matrix Computations* §7.2 (perturbation theory for the eigenproblem); Horn-Johnson *Matrix Analysis* §6.3 (Bauer-Fike, Gershgorin)
master: Trefethen-Bau *Numerical Linear Algebra* Lectures 34, 33; Trefethen-Embree *Spectra and Pseudospectra* (Princeton, 2005) Ch. 1-2 and §52; Stewart-Sun *Matrix Perturbation Theory* (Academic Press, 1990) Ch. IV-V; Kato *Perturbation Theory for Linear Operators* Ch. II §1-2; Golub-Van Loan *Matrix Computations* (4th ed.) §7.2-§7.3

References

images/Shilov-Linear-Algebra__4cbdee00cc.jpg · Shilov *Linear Algebra* — Fast Track archive cover; Ch. 6 the eigenbasis and diagonalisation $A = V\Lambda V^{-1}$ whose conditioning $\kappa(V)$ is the subject of this unit
Bauer, F. L. & Fike, C. T. — Norms and exclusion theorems · Numerische Mathematik 2 (1960), 137-141 — the theorem that every eigenvalue of $A+E$ lies within $\kappa_p(V)\|E\|_p$ of an eigenvalue of the diagonalisable $A = V\Lambda V^{-1}$
Trefethen, L. N. & Bau, D. — Numerical Linear Algebra (SIAM, 1997) · Lecture 34 — eigenvalue perturbation, the Bauer-Fike theorem, the condition number $1/|y^*x|$ of a simple eigenvalue, and pseudospectra of non-normal matrices
Trefethen, L. N. & Embree, M. — Spectra and Pseudospectra: The Behavior of Nonnormal Matrices and Operators (Princeton, 2005) · Ch. 1-2 and §52 — the $\epsilon$-pseudospectrum as the set of eigenvalues of $A+E$ over $\|E\| \le \epsilon$, its three equivalent definitions, and the resolvent-norm picture of non-normal sensitivity
Stewart, G. W. & Sun, J.-G. — Matrix Perturbation Theory (Academic Press, 1990) · Ch. IV-V — the condition number of a simple eigenvalue via left/right eigenvectors $s(\lambda) = |y^*x|$, and the perturbation expansion $\lambda(t) = \lambda + t\, y^*Ex/(y^*x) + O(t^2)$
Gershgorin, S. A. — Über die Abgrenzung der Eigenwerte einer Matrix · Izvestiya Akademii Nauk SSSR, Otd. Fiz.-Mat. Nauk 6 (1931), 749-754 — the localisation of the spectrum in the union of the discs $|z - a_{ii}| \le \sum_{j\neq i}|a_{ij}|$
Kato, T. — Perturbation Theory for Linear Operators (Springer, 2nd ed., 1980) · Ch. II §1-2 — analytic perturbation of a simple eigenvalue, the first-order coefficient $y^*Ex/(y^*x)$, and the breakdown of the analytic expansion at a defective (non-diagonalisable) eigenvalue

Estimated time

beginner: 18m
intermediate: 45m
master: 90m