43.06.04 · numerical-analysis / 06-eigenvalue-algorithms

Bauer-Fike and the conditioning of eigenvalues

shipped3 tiersLean: none

Anchor (Master): Trefethen-Bau *Numerical Linear Algebra* Lectures 34, 33; Trefethen-Embree *Spectra and Pseudospectra* (Princeton, 2005) Ch. 1-2 and §52; Stewart-Sun *Matrix Perturbation Theory* (Academic Press, 1990) Ch. IV-V; Kato *Perturbation Theory for Linear Operators* Ch. II §1-2; Golub-Van Loan *Matrix Computations* (4th ed.) §7.2-§7.3

Intuition Beginner

Eigenvalues are the special stretch factors of a matrix, and a fair question is how much they move when the matrix is nudged. If you measure a matrix from real data, every entry carries a little error, and you would like to know whether the eigenvalues you compute are trustworthy or whether a tiny change in the matrix could throw them off badly.

For some matrices the eigenvalues are rock solid: a small change in the matrix produces an equally small change in every eigenvalue. These are the well-behaved matrices, the ones whose special directions sit at right angles to each other, like the axes of a tilted but undistorted grid. Symmetric matrices are the headline example, and for them an error of a certain size moves each eigenvalue by no more than that same size.

Other matrices are touchy. Their special directions lean hard against one another, almost pointing the same way, and when directions nearly coincide the matrix is close to one that has a genuine defect. For such a matrix a microscopic change in the entries can move an eigenvalue by a large amount. The single number that measures this touchiness is how skewed the eigenvector directions are, and the Bauer-Fike theorem turns that skewness into a guaranteed bound on how far the eigenvalues can wander.

Visual Beginner

The picture contrasts two matrices under the same small perturbation. For the well-behaved matrix the eigenvalues barely twitch; for the touchy one they smear out into a wide blot. The difference is entirely in how the eigenvector directions sit relative to each other.

Two facts are visible. When the eigenvector arrows are perpendicular the wiggle circles are as small as the perturbation itself. When the arrows nearly line up the wiggle circles balloon, and their radius is the perturbation size multiplied by a skewness factor that grows without bound as the directions merge.

Worked example Beginner

Take the matrix

Its eigenvalues are and , read straight off the diagonal because the matrix is triangular. The eigenvector for is , and the eigenvector for turns out to be rescaled, which points almost flat along the horizontal. So the two eigenvector directions are nearly parallel, a warning sign of touchiness.

Step 1. Perturb the matrix in its bottom-left corner, which started at :

The change has size , very small.

Step 2. Find the new eigenvalues. They solve , that is , which expands to .

Step 3. Solve: , so the eigenvalues are about and .

Step 4. Compare. The eigenvalues moved from and to about and — each shifted by roughly . A change of size in the matrix moved the eigenvalues by , about six hundred times as much.

What this tells us: the huge off-diagonal entry makes the eigenvector directions nearly parallel, and that near-parallelism amplifies a tiny perturbation into a large eigenvalue shift. A symmetric matrix, whose eigenvectors are perpendicular, would never let this happen.

Check your understanding Beginner

Formal definition Intermediate+

Let be diagonalisable, , where collects the eigenvalues and the columns of are right eigenvectors, , in the sense of 01.01.08. Let denote a vector -norm and the matrix norm it induces, and write for the condition number of the eigenvector matrix, the same quantity defined as the conditioning of a linear solve in 43.01.02. The spectrum is .

Definition (eigenvalue conditioning). A matrix is said to have well-conditioned eigenvalues when small perturbations of produce comparably small movements of , and ill-conditioned eigenvalues otherwise. The Bauer-Fike theorem below makes "comparably small" precise through the factor , and the per-eigenvalue refinement uses the left and right eigenvectors of a single eigenvalue.

Definition (left and right eigenvectors; condition number of a simple eigenvalue). Let be a simple eigenvalue of (algebraic multiplicity one). A right eigenvector is a nonzero with ; a left eigenvector is a nonzero with , equivalently . For a simple eigenvalue , and the condition number of the eigenvalue is

with normalised to unit length giving , where is the acute angle between the left and right eigenvectors. The reciprocal is the separation or eigenvalue sensitivity.

Definition (-pseudospectrum). For , the -pseudospectrum of in the norm is

with the convention for . The two descriptions coincide (proved in the Advanced results). The pseudospectrum is the qualitative picture of eigenvalue sensitivity: for a normal matrix it is the union of closed -discs about the eigenvalues, while for a non-normal matrix it can bulge far beyond them.

Notation: is the inverse of the eigenvector matrix; is the conjugate transpose of ; is the acute angle between the lines spanned by and ; is the spectrum and the pseudospectrum; the resolvent is defined for . A matrix is normal when , which is equivalent to having a unitary eigenvector matrix and hence .

Counterexamples to common slips

  • Bauer-Fike bounds the distance from every perturbed eigenvalue to some original eigenvalue; it does not pair them up. A perturbation can pull two eigenvalues toward a common point or split a tight pair, so the bound is a one-sided exclusion statement, not a matching.
  • The factor is , the conditioning of the eigenvector matrix, not . A matrix can have enormous yet perfectly conditioned eigenvalues (any normal matrix with a wide spectrum), and a matrix can have in some scaling yet ill-conditioned eigenvalues — the two condition numbers measure different problems.
  • The per-eigenvalue formula requires a simple eigenvalue. At a defective eigenvalue the right and left eigenvectors become orthogonal, , the condition number blows up, and the perturbation expansion ceases to be first order: the eigenvalue moves like a fractional power of the perturbation for a Jordan block of size .

Key theorem with proof Intermediate+

Theorem (Bauer-Fike; Bauer-Fike 1960 [source pending]; Trefethen-Bau Lecture 34 [source pending]). Let be diagonalisable, , and let be any eigenvalue of the perturbed matrix . Then there is an eigenvalue of with

where and is any induced matrix norm. In particular every eigenvalue of lies in the union of the closed discs of radius centred at the eigenvalues of .

Proof. If the bound holds with , so assume ; then is invertible. Let satisfy , that is . Substitute and apply on the left:

Write . Since is invertible,

Take norms and use submultiplicativity:

Divide by :

Because is diagonal, its induced -norm is the largest entry modulus of its inverse, . Substituting,

which is the claim with the minimising eigenvalue.

Bridge. The Bauer-Fike bound builds toward the per-eigenvalue condition number and the pseudospectral picture of the Advanced results, where the global factor is refined into one sensitivity number for each eigenvalue, and it appears again in 43.06.03 as the reason the QR algorithm can compute the eigenvalues of a normal matrix to full accuracy while a non-normal matrix limits the attainable accuracy. The foundational reason the factor is and not is that the diagonalising change of basis is exactly the lens through which the perturbation is viewed: the conjugated perturbation carries the norm inflation , and this is exactly the conditioning of the eigenvector basis rather than of the linear solve. The normal case generalises the Hermitian Weyl bound of 01.01.14: when is normal, is unitary, , and Bauer-Fike collapses to , the central insight being that perpendicularity of eigenvectors is the precise structural condition for perfect eigenvalue conditioning. Putting these together, eigenvalue sensitivity is governed not by the size of the matrix or its inverse but by how far its eigenvectors are from orthogonal, and the bridge is the recognition that the same skewness factor reappears, sharpened to a single eigenvalue, as .

Exercises Intermediate+

Advanced results Master

Theorem (condition number of a simple eigenvalue; Stewart-Sun Ch. V [source pending]; Kato Ch. II §2 [source pending]). Let be a simple eigenvalue of with unit right eigenvector and unit left eigenvector , , $y^ A = \lambda y^A + tE\lambda(t)\lambda(0) = \lambdat = 0$ with

The condition number $\kappa(\lambda) = 1/|y^ x| = 1/\cos\angle(x, y)1$ exactly when the left and right eigenvectors are parallel.*

For a normal matrix the left and right eigenvectors of a simple eigenvalue coincide up to phase, so and : this is the per-eigenvalue refinement of the global statement from Bauer-Fike. The condition number is independent of any scaling of and because the ratio is homogeneous of degree zero in each. As approaches a matrix with a defective , the eigenvectors and rotate toward orthogonality, , and , the analytic boundary at which the first-order expansion fails and the eigenvalue begins to move like a fractional power of the perturbation.

Theorem (defective eigenvalue: fractional-power perturbation; Kato Ch. II §1 [source pending]). Let be an eigenvalue of whose largest Jordan block has size . For a generic perturbation , the eigenvalues splitting off from behave like

a Puiseux series in rather than a Taylor series in . A defective eigenvalue is therefore infinitely ill-conditioned in the first-order sense: a perturbation of size moves it by order , which dwarfs as . The companion matrix of a polynomial with a repeated root exhibits exactly this, and it is why root-finding through the companion eigenproblem inherits the conditioning of multiple roots. The Bauer-Fike factor diverges in this limit because becomes singular as the eigenvectors merge, consistent with the breakdown of diagonalisability.

Theorem (pseudospectrum: three equivalent definitions; Trefethen-Embree Ch. 1-2 [source pending]). For and , the following sets coincide:

where is the smallest singular value (with for ). The resolvent-norm form makes the pseudospectrum computable as a level set of , the perturbation form gives its meaning as the reachable spectra under bounded perturbations, and the singular-value form is the bridge between the two. For a normal the resolvent norm is exactly , so is the union of -discs about the eigenvalues; for a non-normal the resolvent norm can be enormous far from the spectrum, and the pseudospectrum bulges accordingly. The pseudospectrum, not the spectrum, governs transient growth of and and the behaviour of non-normal iterations, which is why a non-normal eigenvalue computation reports the spectrum but the pseudospectrum reports the trustworthy information.

Theorem (Bauer-Fike via pseudospectra; the disc-inflation form). For diagonalisable and every ,

Equivalently, the resolvent obeys . This is Bauer-Fike restated at the level of sets: the -pseudospectrum is contained in the eigenvalues' discs inflated by the eigenvector conditioning. The resolvent bound is the engine — it is proved by the same conjugation that drove the theorem in the Intermediate tier, now read as a statement about the resolvent rather than a single perturbed eigenvector. The inclusion is generally not tight: the genuine pseudospectrum of a non-normal matrix can be far smaller than the Bauer-Fike disc union, which is why pseudospectra are the sharp tool and Bauer-Fike the convenient upper bound.

Synthesis. Eigenvalue conditioning is one structural fact at three magnifications, and the diagonalisation is the foundational reason the bounds take their form: Bauer-Fike charges the global , the per-eigenvalue theorem refines this to for a simple eigenvalue, and the pseudospectrum draws the exact reachable set as a resolvent level curve. This is exactly where non-normality becomes visible: a normal matrix has unitary , so , the left and right eigenvectors coincide, , and the pseudospectrum is a union of round discs, so Bauer-Fike collapses to the Hermitian Weyl bound of 01.01.14. A non-normal matrix breaks all three: skewed eigenvectors inflate , the left and right eigenvectors rotate apart so shrinks, and the resolvent norm balloons far from the spectrum so the pseudospectrum bulges. The central insight is that eigenvalue sensitivity measures departure from normality, and the bridge from algebra to numerics is that the same factor controls how accurately any backward-stable algorithm — the QR algorithm of 43.06.03 above all — can report an eigenvalue: the computed spectrum is the exact spectrum of a nearby matrix, and Bauer-Fike turns that backward error into a forward error of size . Putting these together, the static perturbation theory of this unit tells the iterative algorithms of the chapter when their output can be trusted and when only the pseudospectrum carries meaning.

Full proof set Master

Proposition (Bauer-Fike, full statement and proof). For diagonalisable and any eigenvalue of , for every induced matrix norm.

Proof. If the claim is immediate. Otherwise let with , so . With and , left-multiplication by gives , hence since is invertible. Submultiplicativity yields $$ |w|_p \le |(\Lambda - \mu I)^{-1}|_p,|V^{-1}|_p,|E|_p,|V|_p,|w|_p. $$ Cancel and use that for a diagonal matrix in any induced norm, because the induced norm of a diagonal matrix is its largest entry modulus. Rearranging gives .

Proposition (first-order eigenvalue perturbation, simple eigenvalue). *Let be a simple eigenvalue of with right eigenvector and left eigenvector , . The branch of with is analytic at and .*

Proof. Simplicity of means the spectral projection onto the eigenline is well defined and the resolvent has a simple pole at . By the analytic perturbation theory of an isolated simple eigenvalue, and a right eigenvector depend analytically on for small . Differentiate the eigenrelation at : $$ Ex + Ax'(0) = \lambda'(0)x + \lambda x'(0). $$ Apply on the left and use to cancel against : $$ y^*Ex = \lambda'(0),y^*x. $$ Dividing by gives . The bound follows from Cauchy-Schwarz.

Proposition (left and right eigenvectors of a normal matrix coincide). *If is normal and is a simple eigenvalue with unit right eigenvector , then is a unit left eigenvector, so and .*

Proof. A normal matrix has , so and share eigenvectors: if with , then . The latter says , so is a left eigenvector for . Taking , and . The standard route is via the spectral theorem for normal operators, which produces an orthonormal eigenbasis; the eigenvector matrix assembling that basis is unitary, giving in agreement with the per-eigenvalue value.

Proposition (pseudospectrum equivalence: resolvent and perturbation forms). For , if and only if for some with .

Proof. () Suppose with and . Then , so and . Cancelling gives .

() Suppose . Let be a unit vector attaining the operator norm, , and set , so and , that is . Define the rank-one perturbation , with . Then , so . Hence with . The third form follows from , a standard singular-value identity.

Proposition (Bauer-Fike resolvent bound). For diagonalisable and , .

Proof. Conjugating, , since . Taking -norms and using submultiplicativity, $$ |(zI - A)^{-1}|_2 \le |V|_2,|(zI - \Lambda)^{-1}|_2,|V^{-1}|_2 = \kappa_2(V),|(zI - \Lambda)^{-1}|_2. $$ The diagonal resolvent has , giving the bound. Combining with the perturbation form of the previous proposition: if then , so , recovering the disc-inflation form of Bauer-Fike.

Connections Master

The eigenvalue conditioning of this unit is the non-normal completion of the Hermitian perturbation theory of 01.01.14: where Weyl's inequalities and Cauchy interlacing give for self-adjoint — the case — Bauer-Fike supplies the general diagonalisable bound , and the two agree exactly when is normal, so the symmetric theory is recovered as the perfectly conditioned boundary case rather than re-proved.

The factor is the eigenvector-matrix instance of the conditioning theory of 43.01.02: that unit defines as the sensitivity of a linear solve, and here the identical quantity applied to the diagonaliser measures the sensitivity of the eigenproblem — the same conditioning concept, attached to a different matrix, which is why a matrix can be well-conditioned for solving yet ill-conditioned for its eigenvalues.

Bauer-Fike is the accuracy verdict on the QR algorithm of 43.06.03: a backward-stable eigensolver returns the exact spectrum of with , and this unit converts that backward error into the forward eigenvalue error , so a normal matrix yields full-accuracy eigenvalues while a non-normal one limits attainable accuracy regardless of the algorithm's stability.

The condition number built from left and right eigenvectors connects to the spectral-projection and biorthogonality structure underlying 43.06.01: the same left eigenvector that measures sensitivity is the one that makes the spectral projection , and the deflation step of power-iteration variants relies on exactly this biorthogonal pairing, so a poorly separated degrades both the conditioning and the numerical deflation simultaneously.

Historical & philosophical context Master

The exclusion theorem now universally called Bauer-Fike appeared in the 1960 Numerische Mathematik paper of Friedrich Bauer and C. T. Fike, "Norms and exclusion theorems," which packaged a family of eigenvalue-localisation results around the single inequality and connected eigenvalue perturbation to the conditioning of the diagonalising transformation [Bauer 1960]. The result generalised the much older disc-localisation of Semyon Gershgorin, whose 1931 paper bounded the spectrum by the union of row discs and is the diagonally-dominant special case that Bauer-Fike subsumes when itself is taken as a perturbation of its diagonal [Gershgorin 1931].

The first-order perturbation formula and the analytic theory of a simple eigenvalue, together with the Puiseux-series behaviour at a defective eigenvalue, were placed on their definitive footing by Tosio Kato in Perturbation Theory for Linear Operators, whose treatment of the analytic perturbation of isolated eigenvalues remains the standard reference [Kato 1980]. Stewart and Sun's Matrix Perturbation Theory consolidated the matrix-analytic side, defining the eigenvalue condition number through the biorthogonal left and right eigenvectors and developing the sensitivity theory used in numerical practice [Stewart 1990]. The pseudospectral viewpoint, which reframes non-normal sensitivity as the geometry of resolvent level sets, was developed and popularised by Lloyd Trefethen and collaborators through the 1990s and synthesised in Trefethen and Embree's Spectra and Pseudospectra, establishing that for non-normal matrices the pseudospectrum, not the spectrum, predicts the behaviour of matrix powers, exponentials, and iterative solvers [Trefethen 2005].

Bibliography Master

@article{BauerFike1960,
  author  = {Bauer, Friedrich L. and Fike, C. T.},
  title   = {Norms and exclusion theorems},
  journal = {Numerische Mathematik},
  volume  = {2},
  year    = {1960},
  pages   = {137--141}
}

@article{Gershgorin1931,
  author  = {Gershgorin, Semyon Aronovich},
  title   = {{\"U}ber die Abgrenzung der Eigenwerte einer Matrix},
  journal = {Izvestiya Akademii Nauk SSSR, Otdelenie Fiziko-Matematicheskikh Nauk},
  volume  = {6},
  year    = {1931},
  pages   = {749--754}
}

@book{Kato1980,
  author    = {Kato, Tosio},
  title     = {Perturbation Theory for Linear Operators},
  edition   = {2nd},
  publisher = {Springer-Verlag},
  address   = {Berlin},
  year      = {1980}
}

@book{StewartSun1990,
  author    = {Stewart, G. W. and Sun, Ji-Guang},
  title     = {Matrix Perturbation Theory},
  publisher = {Academic Press},
  address   = {Boston},
  year      = {1990}
}

@book{TrefethenEmbree2005,
  author    = {Trefethen, Lloyd N. and Embree, Mark},
  title     = {Spectra and Pseudospectra: The Behavior of Nonnormal Matrices and Operators},
  publisher = {Princeton University Press},
  address   = {Princeton},
  year      = {2005}
}

@book{TrefethenBau1997,
  author    = {Trefethen, Lloyd N. and Bau, David},
  title     = {Numerical Linear Algebra},
  publisher = {SIAM},
  address   = {Philadelphia},
  year      = {1997}
}

@book{HornJohnson2013,
  author    = {Horn, Roger A. and Johnson, Charles R.},
  title     = {Matrix Analysis},
  edition   = {2nd},
  publisher = {Cambridge University Press},
  year      = {2013}
}

@book{GolubVanLoan2013,
  author    = {Golub, Gene H. and Van Loan, Charles F.},
  title     = {Matrix Computations},
  edition   = {4th},
  publisher = {Johns Hopkins University Press},
  address   = {Baltimore},
  year      = {2013}
}