01.01.14 · foundations / linear-algebra

Rayleigh quotient and the Courant-Fischer min-max characterisation of eigenvalues

shipped3 tiersLean: none

Anchor (Master): Shilov *Linear Algebra* Ch. 10 §10.2–§10.3; Horn-Johnson *Matrix Analysis* §4.2–§4.3 (Courant-Fischer, interlacing, Weyl); Parlett *The Symmetric Eigenvalue Problem* Ch. 10–11; Reed-Simon *Methods of Modern Mathematical Physics* Vol. IV §XIII.1 (min-max for self-adjoint operators); Kato *Perturbation Theory for Linear Operators* Ch. I §6.10

Intuition Beginner

A symmetric matrix stretches space along a set of perpendicular axes, with a stretch factor for each axis. The spectral theorem 01.01.13 guarantees those axes and factors exist; the factors are the eigenvalues. A natural question is whether you can find the biggest and smallest stretch factors without first solving for every eigenvalue. The Rayleigh quotient answers yes, and it does so by a simple measurement you can take on any vector at all.

The Rayleigh quotient of a vector is the amount the matrix stretches that vector along its own direction, measured as a ratio. Push the vector through the matrix, see how much of the result points back along the original vector, and divide by the vector's own length-squared. For an eigenvector the answer is exactly its eigenvalue. For any other vector the answer is a blend of eigenvalues, and a blend can never exceed the largest ingredient or fall below the smallest.

So the largest eigenvalue is the highest value the Rayleigh quotient ever reaches, and the smallest eigenvalue is its lowest value. The middle eigenvalues are trickier: each one is the best you can do on some restricted family of directions, and the Courant-Fischer theorem says exactly which family. This turns every eigenvalue into the answer to an optimisation problem.

Visual Beginner

The picture shows a symmetric two-by-two matrix and its Rayleigh quotient evaluated all the way around the unit circle. As a unit vector sweeps from angle zero to a full turn, the Rayleigh quotient rises and falls between two values: it peaks at the direction of the top eigenvector, where it equals the larger eigenvalue, and it bottoms out at the perpendicular direction, where it equals the smaller eigenvalue.

Two facts are visible. The curve never goes above the larger eigenvalue or below the smaller one, so the eigenvalues are the ceiling and floor of the Rayleigh quotient. The peak and the valley occur at perpendicular directions, which are the two eigendirections of the symmetric matrix.

Worked example Beginner

Take the symmetric matrix

Step 1. Write the Rayleigh quotient for a unit vector . Since has length , the denominator is and the quotient is just the top part, .

Step 2. Compute . Multiplying, .

Step 3. Dot with . We get .

Step 4. Simplify. Using and , this becomes .

Step 5. Read off the high and low values. The term ranges from to , so ranges from to . The maximum happens when , that is degrees, the direction . The minimum happens at degrees, the direction .

Step 6. Check against the eigenvalues. The characteristic polynomial is , so the eigenvalues are and . These match the floor and ceiling found above.

What this tells us: the largest eigenvalue is the maximum of the Rayleigh quotient and the smallest eigenvalue is its minimum, with both extremes reached along the eigendirections. You found the eigenvalues by maximising and minimising a single ratio, not by any extra factoring work.

Check your understanding Beginner

Formal definition Intermediate+

Let be a finite-dimensional inner-product space over in the sense of 01.01.09, with norm and the inner product linear in the first slot and conjugate-linear in the second. Throughout this unit is self-adjoint, , with adjoint as in 01.01.10. By the spectral theorem 01.01.13 the eigenvalues of are real; list them with multiplicity in decreasing order

and fix an orthonormal eigenbasis with . The largest eigenvalue and the smallest .

Definition (Rayleigh quotient). For with , the Rayleigh quotient of at is

Self-adjointness makes the numerator real: , so for all . The quotient is scale-invariant, for every nonzero scalar , so it is a function on the projective space of directions, equivalently on the unit sphere , where it reduces to .

Definition (Rayleigh-Ritz subspace value). For a subspace with , write

for the largest and smallest Rayleigh values over the directions inside . Both extrema are attained because is continuous on the compact unit sphere of . The eigenvalues will be obtained by optimising and over all subspaces of a fixed dimension.

Notation: denotes the set of -dimensional subspaces of ; is the orthogonal complement of ; is the dimension of . The Kronecker delta is when and otherwise. The notation means is positive semidefinite, that is for all .

Counterexamples to common slips

  • The reality and the range both require self-adjointness. For the non-self-adjoint on , the numerator is purely imaginary on most vectors, and the eigenvalues are not real; the extremal characterisation simply does not apply.
  • "The maximum of is attained at a unique direction" fails when is a repeated eigenvalue. If , every unit vector in the two-dimensional top eigenspace attains the maximum, so the maximiser is a whole sphere of directions, not a point.
  • Courant-Fischer requires the eigenvalues to be listed with multiplicity. Omitting repetitions shifts the index and breaks the formula; in the min-max statement is the -th eigenvalue counted with multiplicity, not the -th distinct value.

Key theorem with proof Intermediate+

Theorem (Rayleigh-Ritz extremal characterisation; Shilov Ch. 10 §10.2 [source pending]; Horn-Johnson §4.2 [source pending]). Let $A = A^V\lambda_1 \geq \cdots \geq \lambda_nu_1, \ldots, u_nv \neq 0$,*

the range of is exactly the closed interval , and

with the maximum attained precisely on the -eigenspace and the minimum on the -eigenspace.

Proof. Expand an arbitrary in the orthonormal eigenbasis: with . Orthonormality gives , and since ,

where orthonormality collapsed the double sum. Therefore

The right-hand side is a convex combination of the eigenvalues with weights summing to . A convex combination of real numbers lies between their minimum and maximum, so .

For attainment and the full range: setting puts all weight on , giving ; setting gives . For an intermediate value , choose with and take ; then . So the range is all of .

Finally, suppose for some . Then , that is . Every term is non-negative because , so each term vanishes: whenever . Hence lies in the -eigenspace. The minimum case is symmetric, replacing by .

Bridge. The Rayleigh-Ritz characterisation builds toward the full Courant-Fischer min-max theorem proved in the Advanced results, where the same convex-combination identity is restricted to subspaces to pin down every intermediate eigenvalue, and it appears again in 01.01.12 (singular value decomposition), where applying it to identifies the largest singular value as . The mechanism is the diagonalisation supplied by the spectral theorem 01.01.13: once the self-adjoint operator is written in its orthonormal eigenbasis, the Rayleigh quotient becomes a weighted average of eigenvalues, and weighted averages are governed entirely by the extreme ingredients. The foundational identity is , the spectral expansion of the quadratic form attached to . Putting these together, the extremal characterisation removes the need to compute a characteristic polynomial to find the dominant eigenvalue, justifies the power-iteration and Rayleigh-quotient-iteration algorithms of numerical linear algebra, and is the finite-dimensional shadow of the variational principle that estimates ground-state energies in quantum mechanics, which appears again in 12.07.03 (the Rayleigh-Ritz method).

Exercises Intermediate+

Lean formalization Intermediate+

Mathlib carries the Rayleigh quotient and the supremum/infimum eigenvalue facts for self-adjoint operators, but the full Courant-Fischer min-max equality, Cauchy interlacing, and Weyl's inequalities are not present as single named theorems. The readable view of the intended statements:

import Mathlib.Analysis.InnerProductSpace.Rayleigh
import Mathlib.Analysis.InnerProductSpace.Spectrum

variable {𝕜 V : Type*} [RCLike 𝕜]
  [NormedAddCommGroup V] [InnerProductSpace 𝕜 V] [FiniteDimensional 𝕜 V]

/-- The supremum of the Rayleigh quotient over the unit sphere is an
eigenvalue (the top eigenvalue). Mathlib:
`LinearMap.IsSymmetric.hasEigenvalue_iSup`. -/
example (T : V →ₗ[𝕜] V) (hT : T.IsSymmetric) (hV : 0 < Module.rank 𝕜 V) :
    Module.End.HasEigenvalue T
      (⨆ x : {v : V // v ≠ 0}, RCLike.re (T.reApplyInnerSelf x / ‖(x : V)‖ ^ 2)) :=
  hT.hasEigenvalue_iSup hV

/-- Courant-Fischer max-min characterisation of the k-th eigenvalue.
NOT a single named Mathlib theorem. -/
theorem courant_fischer_maxmin
    (T : V →ₗ[𝕜] V) (hT : T.IsSymmetric) (k : ℕ)
    (lam : ℕ → ℝ) (hsort : Antitone lam) /- eigenvalues, decreasing -/ :
    lam k = sSup { r : ℝ | ∃ S : Submodule 𝕜 V, Module.finrank 𝕜 S = k + 1
        r = sInf { R : ℝ | ∃ v : V, v ∈ S ∧ v ≠ 0
          R = RCLike.re (inner (T v) v) / ‖v‖ ^ 2 } } :=
  sorry  -- S_k test subspace for ≥; dimension-count intersection for

/-- Cauchy interlacing: eigenvalues of a compression to a codim-1
subspace interlace those of T. NOT a single named Mathlib theorem. -/
theorem cauchy_interlace
    (T : V →ₗ[𝕜] V) (hT : T.IsSymmetric) (W : Submodule 𝕜 V)
    (hW : Module.finrank 𝕜 W + 1 = Module.finrank 𝕜 V)
    (lamT lamW : ℕ → ℝ) (k : ℕ) :
    lamT (k + 1) ≤ lamW k ∧ lamW k ≤ lamT k :=
  sorry  -- apply courant_fischer to T and to the compression P_W T P_W

The proof gap to a clean contribution: (i) order the eigenvalues with multiplicity as an antitone Fin n → ℝ and connect to Module.End.eigenvalues; (ii) prove courant_fischer_maxmin from the spectral orthonormal basis using the S_k = span(u_0, ..., u_k) test subspace and the Submodule.finrank_sup_add_finrank_inf_le dimension-count for the intersection; (iii) derive cauchy_interlace and weyl_inequality as corollaries. Each is reachable from Mathlib.Analysis.InnerProductSpace.Rayleigh and the finite-dimensional spectral theorem, but no current named lemma presents the min-max double characterisation or the interlacing inequalities in packaged form.

Advanced results Master

Theorem (Courant-Fischer min-max; Fischer 1905 [source pending]; Courant 1920 [source pending]; Horn-Johnson §4.2 [source pending]). Let $A = A^V\lambda_1 \geq \cdots \geq \lambda_nk \in {1, \ldots, n}$,*

The first equality recovers as a plain maximum (the case ) and as a plain minimum (the case of the second form), so the Rayleigh-Ritz characterisation is the boundary case of the general formula. The decisive feature is that the characterisation of refers to no eigenvector other than through the optimisation over -dimensional subspaces; this is what makes the formula usable for operators whose eigenvectors are unavailable, and it is the form that survives passage to the infinite-dimensional discrete spectrum. The full proof is in the proof set below.

Theorem (Cauchy interlacing; Cauchy 1829 [source pending]). Let $A = A^n\lambda_1 \geq \cdots \geq \lambda_nBB = P A|WA(n-1)W(n-1) \times (n-1)W\mu_1 \geq \cdots \geq \mu{n-1}$. Then*

More generally, deleting rows and the corresponding columns produces eigenvalues with .

Interlacing is the workhorse behind the symmetric eigenvalue problem: the eigenvalues of a leading principal submatrix sandwich those of the full matrix, which justifies bisection methods using Sturm sequences and explains why adding a row and column to a symmetric matrix can shift each eigenvalue by at most one slot.

Theorem (Weyl's inequalities; Weyl 1912 [source pending]). Let be self-adjoint on with eigenvalues listed decreasingly. For indices with ,

and dually when . In particular , so each eigenvalue is Lipschitz in the operator: .

Weyl's inequalities are the eigenvalue-perturbation theory that controls how the spectrum moves when a self-adjoint operator is perturbed by another. The special case recovers the monotonicity , and the Lipschitz bound is the spectral stability statement underpinning numerical eigenvalue computation and the continuity of the spectrum in perturbation theory.

Theorem (spectral gap and the second eigenvalue). With , the second eigenvalue admits the constrained form

the maximum of the Rayleigh quotient over the orthogonal complement of the top eigenvector. The gap measures how sharply the maximiser is isolated: for a positive operator normalised to , the gap controls the geometric convergence rate of power iteration and, in the graph-Laplacian setting, the algebraic connectivity and mixing time.

The deflation form is the recursive version of the Rayleigh-Ritz characterisation: having found the top eigenvectors, the next eigenvalue is the maximum of on what remains. Courant-Fischer is the eigenvector-free upgrade that removes the dependence on .

Theorem (min-max for semibounded self-adjoint operators; Courant-Fischer-Weyl in the Hilbert-space form; Reed-Simon Vol. IV §XIII.1 [source pending]). Let be a self-adjoint operator on a Hilbert space , bounded below, with the bottom of the essential spectrum. Define

Then for each either and is the -th eigenvalue below the essential spectrum (counted with multiplicity), or and there are at most eigenvalues below .

This is the theorem that powers the variational estimation of bound-state energies. The finite sum over eigenvalues becomes an optimisation over finite-dimensional trial subspaces of the domain, and the Rayleigh-Ritz method computes upper bounds for the discrete eigenvalues below the continuous spectrum. The Sturm-Liouville operator on a bounded interval and the Dirichlet Laplacian on a bounded domain are the canonical instances, with the min-max principle giving Courant's nodal-domain bound and Weyl's eigenvalue-counting asymptotics.

Synthesis. The Rayleigh quotient converts the algebraic eigenvalue problem for a self-adjoint operator into a problem of optimisation, and the spectral theorem 01.01.13 is what makes the conversion exact: in the orthonormal eigenbasis is a weighted average of eigenvalues, so its extremes are the extreme eigenvalues and its critical points are the eigenvectors. The Courant-Fischer min-max formula completes the picture by characterising every eigenvalue, not only the top and bottom, through optimisation over subspaces of fixed dimension, and it does so without naming a single eigenvector. From that one formula the interlacing of Cauchy and the perturbation inequalities of Weyl follow as corollaries by applying the same max-min to a compression or to a sum, and the monotonicity for is immediate from the pointwise inequality of Rayleigh quotients. Putting these together, the gradient computation identifies critical points with eigenvectors and drives Rayleigh-quotient iteration, the second-eigenvalue gap controls the convergence of power iteration and the connectivity of graphs, and the eigenvector-free min-max form passes intact to the discrete spectrum of a semibounded self-adjoint operator, where it becomes the variational principle of 12.07.03 and the eigenvalue-asymptotics machinery of Weyl and Courant for the differential operators of mathematical physics. The Rayleigh quotient, the min-max theorem, interlacing, and the perturbation inequalities are four readings of the single fact that the quadratic form on the unit sphere has its eigenvalues as critical values.

Full proof set Master

Proposition (Courant-Fischer min-max, both forms). Let $A = A^V\lambda_1 \geq \cdots \geq \lambda_nu_1, \ldots, u_nk$,* $$ \lambda_k = \max_{\dim S = k}\ \min_{0 \neq v \in S} R(v) = \min_{\dim S = n-k+1}\ \max_{0 \neq v \in S} R(v). $$

Proof. Max-min form, lower bound. Take , which has dimension . For , $$ R(v) = \frac{\sum_{i=1}^k \lambda_i |c_i|^2}{\sum_{i=1}^k |c_i|^2} \geq \lambda_k, $$ because each for , with equality at . Hence , so the outer maximum is at least .

Max-min form, upper bound. Let be any -dimensional subspace and set , of dimension . By the subspace dimension formula , there is a nonzero . Writing , $$ R(v) = \frac{\sum_{i=k}^n \lambda_i |c_i|^2}{\sum_{i=k}^n |c_i|^2} \leq \lambda_k, $$ since each for . Thus for every such , so the outer maximum is at most . The two bounds give the max-min equality.

Min-max form. Apply the max-min form to , whose eigenvalues are in decreasing order, so the -th eigenvalue of is . The Rayleigh quotient of is , and , . The max-min statement for at index reads . Re-indexing (so ) gives .

Proposition (Cauchy interlacing). Let $A = A^nV\lambda_1 \geq \cdots \geq \lambda_nW \subseteq Vn - 1B = P_W A|WP_WWW\mu_1 \geq \cdots \geq \mu{n-1}\lambda_{k+1} \leq \mu_k \leq \lambda_k1 \leq k \leq n-1$.*

Proof. For , since and . So the Rayleigh quotient of on agrees with that of restricted to : for .

Upper bound . By Courant-Fischer for on the -dimensional , . Every -dimensional is also a -dimensional subspace of , so this maximum is over a subfamily of the subspaces appearing in the Courant-Fischer max-min for . A maximum over a smaller family is no larger, so .

Lower bound . Use the min-max form. For , . For on , . The subspaces of dimension form a subfamily of all dimension- subspaces of , and a minimum over a smaller family is no smaller, so . Combining, . The general -deletion bound follows by iterating the codimension-one step times.

Proposition (Weyl's inequality ). For self-adjoint on and indices with , the stated inequality holds.

Proof. Let be an orthonormal eigenbasis for and one for . Set $$ U = \mathrm{span}(u_i, u_{i+1}, \ldots, u_n), \quad \dim U = n - i + 1, \qquad W = \mathrm{span}(w_j, \ldots, w_n), \quad \dim W = n - j + 1. $$ On , every unit vector has (it is a combination of eigenvectors with eigenvalues ); on , . The intersection has dimension at least . For any unit , $$ R_{A+B}(v) = \langle (A+B)v, v\rangle = R_A(v) + R_B(v) \leq \lambda_i(A) + \lambda_j(B). $$ Now apply the min-max form to . Set , so a subspace of dimension exists inside . Then $$ \lambda_m(A+B) = \min_{\dim S = n-m+1}\ \max_{0\neq v\in S} R_{A+B}(v) \leq \max_{0\neq v\in U\cap W} R_{A+B}(v) \leq \lambda_i(A) + \lambda_j(B), $$ where the first inequality chooses to be an -dimensional subspace of . This is the claim . The dual lower inequality follows by applying this to and re-indexing.

Proposition (Rayleigh-quotient iteration converges cubically near a simple eigenvalue). Let $A = A^\lambdauv_{m+1} = (A - R(v_m) I)^{-1} v_mv_0u\sin\angle(v_{m+1}, u) = O\big(\sin^3\angle(v_m, u)\big)$.*

Proof. Write with , , where . The Rayleigh quotient satisfies , with bounded; so . Let act on the invariant complement , with spectral gap by simplicity. Applying scales the -component by and the -component by a factor of size at most . Hence $$ \tan\theta_{m+1} = \frac{|\text{component} \perp u|}{|\text{component along } u|} \leq \frac{O(1)\sin\theta_m}{|\lambda - R(v_m)|^{-1}\cos\theta_m} = O\big(|R(v_m) - \lambda|\big)\tan\theta_m = O(\sin^2\theta_m)\tan\theta_m. $$ Since for small angles, , giving the cubic local convergence. The self-adjointness is used twice: to make an -invariant orthogonal complement, and to make second order in rather than first order.

Connections Master

The Rayleigh quotient is the variational face of the spectral theorem 01.01.13: where the spectral theorem produces the eigenvalues by diagonalisation, the extremal characterisation produces them by optimisation, and the equivalence of the two is the identity . The self-adjoint hypothesis is exactly the condition from 01.01.10 that makes real and the eigenvalues real, without which the quotient has no order structure to extremise.

The min-max characterisation specialises to the singular value decomposition 01.01.12 through : the singular values are , so Courant-Fischer applied to the positive self-adjoint gives , the variational characterisation of singular values that underlies the Eckart-Young low-rank approximation theorem.

The interlacing and perturbation inequalities feed the quadratic-form theory of 01.01.15: Sylvester's law of inertia classifies symmetric matrices by the signs of their eigenvalues, and Weyl's monotonicity plus Cauchy interlacing govern how those signs persist under congruence, compression, and perturbation, which is the algebraic content of the inertia of a bordered or perturbed form.

The eigenvector-free min-max form is the bridge to the variational method in quantum mechanics 12.07.03: the Rayleigh-Ritz estimate of a ground-state energy is the statement for any trial state , the Rayleigh case for the Hamiltonian , and the min-max over trial subspaces produces upper bounds for the excited-state energies (MacDonald's theorem) exactly as the finite-dimensional Courant-Fischer formula predicts.

Historical & philosophical context Master

The Rayleigh quotient began in mechanics, not in linear algebra. In The Theory of Sound (1877) Lord Rayleigh studied the small oscillations of a system about equilibrium and observed that the ratio of the potential energy to the kinetic energy of a trial configuration is a stationary estimate of the squared natural frequencies, accurate to second order when the trial mode is close to a true normal mode [Rayleigh 1877]. The quotient is the abstraction of that energy ratio, with the stiffness operator and the denominator the mass form. Ritz turned the stationarity observation into an algorithm in 1909, expanding the trial configuration in a finite basis and reducing the variational problem to a generalised matrix eigenvalue problem, which is why the finite-basis variational method carries his name [Ritz 1909].

The min-max formula appeared first in Fischer's 1905 study of real quadratic forms, where the -th eigenvalue is given as a max-min over -dimensional subspaces [Fischer 1905]. Weyl, in his 1912 work on the asymptotic distribution of the eigenvalues of the Laplacian on a membrane, used the same variational characterisation to prove both the perturbation inequalities now bearing his name and the leading-order eigenvalue-counting law, establishing that the count of eigenvalues below a threshold grows like the volume of the domain times a universal constant [Weyl 1912]. Courant extended the principle to general self-adjoint differential operators in 1920 and connected it to the nodal domains of eigenfunctions, giving the geometric form of the theorem that propagated through Methoden der mathematischen Physik into the standard toolkit of mathematical physics [Courant 1920]. The finite-dimensional Cauchy interlacing relation, which the modern proof derives as a corollary of min-max, predates all of this: Cauchy established it in 1829 while analysing the secular equation of planetary perturbation theory [Cauchy 1829].

Bibliography Master

@book{Rayleigh1877,
  author    = {Rayleigh, John William Strutt, Baron},
  title     = {The Theory of Sound, Volume I},
  publisher = {Macmillan},
  address   = {London},
  year      = {1877}
}

@article{Fischer1905,
  author  = {Fischer, Ernst},
  title   = {{\"U}ber quadratische Formen mit reellen Koeffizienten},
  journal = {Monatshefte f{\"u}r Mathematik und Physik},
  volume  = {16},
  year    = {1905},
  pages   = {234--249}
}

@article{Ritz1909,
  author  = {Ritz, Walther},
  title   = {{\"U}ber eine neue Methode zur L{\"o}sung gewisser Variationsprobleme der mathematischen Physik},
  journal = {Journal f{\"u}r die reine und angewandte Mathematik},
  volume  = {135},
  year    = {1909},
  pages   = {1--61}
}

@article{Weyl1912,
  author  = {Weyl, Hermann},
  title   = {Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen (mit einer Anwendung auf die Theorie der Hohlraumstrahlung)},
  journal = {Mathematische Annalen},
  volume  = {71},
  year    = {1912},
  pages   = {441--479}
}

@article{Courant1920,
  author  = {Courant, Richard},
  title   = {{\"U}ber die Eigenwerte bei den Differentialgleichungen der mathematischen Physik},
  journal = {Mathematische Zeitschrift},
  volume  = {7},
  year    = {1920},
  pages   = {1--57}
}

@article{Cauchy1829,
  author  = {Cauchy, Augustin-Louis},
  title   = {Sur l'{\'e}quation {\`a} l'aide de laquelle on d{\'e}termine les in{\'e}galit{\'e}s s{\'e}culaires des mouvements des plan{\`e}tes},
  journal = {Exercices de Math{\'e}matiques},
  volume  = {4},
  year    = {1829},
  pages   = {140--160}
}

@book{HornJohnson2013,
  author    = {Horn, Roger A. and Johnson, Charles R.},
  title     = {Matrix Analysis},
  edition   = {2nd},
  publisher = {Cambridge University Press},
  year      = {2013}
}

@book{Parlett1998,
  author    = {Parlett, Beresford N.},
  title     = {The Symmetric Eigenvalue Problem},
  publisher = {SIAM},
  series    = {Classics in Applied Mathematics},
  year      = {1998}
}

@book{ReedSimon1978,
  author    = {Reed, Michael and Simon, Barry},
  title     = {Methods of Modern Mathematical Physics, Vol. IV: Analysis of Operators},
  publisher = {Academic Press},
  address   = {New York},
  year      = {1978}
}

@book{Shilov1977,
  author    = {Shilov, Georgi E.},
  title     = {Linear Algebra},
  publisher = {Dover Publications},
  address   = {New York},
  year      = {1977},
  note      = {Translation of the 1971 Russian edition, transl. R. A. Silverman}
}