43.01.02 · numerical-analysis / 01-floating-point-conditioning

Conditioning and condition numbers of problems

shipped3 tiersLean: none

Anchor (Master): Higham 2002 *Accuracy and Stability of Numerical Algorithms* 2e (SIAM) Ch. 1-2, Ch. 6; Trefethen-Bau 1997 *Numerical Linear Algebra* (SIAM) Lectures 12-13; Golub-Van Loan 2013 *Matrix Computations* 4e (Johns Hopkins) §2.6 (the singular value/condition number theory); Demmel 1997 *Applied Numerical Linear Algebra* (SIAM) Ch. 1-2

Intuition Beginner

Suppose you want to compute an answer from some input data, and the data is slightly off — measured imperfectly, or rounded when it was stored in the computer. A natural question comes before you pick any method at all: how much can a small error in the input swing the answer? Some problems are forgiving. A tiny wobble in the input produces a tiny wobble in the output. Other problems are touchy: a barely-visible change in the input throws the answer wildly off. The conditioning of a problem is the name for this sensitivity.

The condition number is a single number that measures it. Roughly, it is the worst factor by which a small relative error in the input gets magnified into a relative error in the output. A condition number near $1$ means the problem is well-behaved: errors pass through about the same size they came in. A condition number of a million means a relative error of one part in a million in the data can become a relative error of order one in the answer — the answer can be meaningless even though the data looked clean.

The key idea, and the one that separates this from everything later, is that conditioning belongs to the problem, not to any recipe for solving it. It is fixed the moment you state what you want to compute from what data. Even a perfect, error-free method cannot beat the conditioning: if the problem amplifies input error by a factor of a million, and your input carries rounding error, the answer inherits that amplification no matter how cleverly you compute.

So conditioning sets the speed limit. Before asking whether a particular method is good, you ask whether the problem itself is sensitive. A separate notion, the stability of an algorithm, asks whether a given recipe makes things worse than the problem already forces — that comes next door. Here we measure only the problem.

Visual Beginner

Picture the input as a point and draw a tiny circle of uncertainty around it: all the inputs you cannot tell apart from the true one. The problem maps that little circle to a blob of possible outputs. For a well-conditioned problem the output blob is small — about the same size as the input circle. For an ill-conditioned problem the output blob is stretched out: the map magnifies the uncertainty.

The table below contrasts a forgiving problem with a touchy one, using the same size of input wobble in each row.

problem	input change	output change	rough magnification
double the input ( $y = 2 x$ )	$1%$	$1%$	$1$ (well-conditioned)
add two nearly-opposite numbers	$0.1%$	$50%$	huge (ill-conditioned)
solve a stretched linear system	$0.01%$	$10%$	about $1000$

The magnification factor in the last column is the condition number in disguise. The whole subject of conditioning is reading that factor off the problem before you compute anything. The stretched-ellipse picture also connects to the singular value story: the longest and shortest stretch directions of a linear map are exactly its largest and smallest singular values, and their ratio is the magnification.

Worked example Beginner

Let us measure the conditioning of one concrete problem: subtracting two close numbers. Take the function that sends an input $x$ to the output $1 - x$ , and evaluate near $x = 0.9999$ .

The true output is $1 - 0.9999 = 0.0001$ . Now nudge the input by a small relative amount. Suppose $x$ is stored with a relative error of one part in ten thousand, so instead of $0.9999$ the computer holds $0.99999$ — the input moved by about $0.00009$ , a relative change of roughly $0.009%$ .

The new output is $1 - 0.99999 = 0.00001$ . Compare the two outputs: the answer changed from $0.0001$ to $0.00001$ , a relative change of about $90%$ . A relative input change of $0.009%$ produced a relative output change of $90%$ .

The magnification is the ratio of these relative changes: about $90 \div 0.009 \approx 10000$ . So near $x = 0.9999$ this subtraction problem has condition number around ten thousand. The reason is plain once you see it: the output $1 - x$ is a small number obtained by cancelling two numbers that are both close to $1$ , so the tiny surviving difference is extremely sensitive to either of them.

What this tells us: cancellation between nearly-equal quantities is ill-conditioned, and the condition number puts a number on how ill. Four digits of input accuracy can collapse to almost none in the output. The lesson is to detect this from the problem — the small denominator, the near-cancellation — before blaming any particular calculation.

Check your understanding Beginner

Exercise (easy, multiple choice).

A problem has condition number about $1$ . What does this tell you?

A. The problem has no solution. B. Small relative errors in the input stay about the same size in the output. C. Any algorithm for it will be slow. D. The input must be an integer.

Hint

The condition number is the magnification factor from relative input error to relative output error. A factor near $1$ means almost no magnification.

Answer

B. Small relative errors in the input stay about the same size in the output. Feedback-correct: a condition number near $1$ is the definition of a well-conditioned problem — input error is passed through with little amplification. Feedback-wrong: conditioning is about error sensitivity, not solvability (A), speed (C), or the type of the data (D); those are separate questions.

Formal definition Intermediate+

A problem is a map $f : X \to Y$ between normed vector spaces, evaluated at a fixed input $x \in X$ ; the value $f (x)$ is the desired output. Conditioning measures the sensitivity of $f (x)$ to perturbations of $x$ . Throughout, $∥ \cdot ∥$ denotes the relevant norm on each space, and on matrices and on the linear maps between $X$ and $Y$ we use the induced operator norm $∥ A ∥ = sup_{∥ v ∥ = 1} ∥ A v ∥$ introduced for the singular value decomposition 01.01.12.

Absolute condition number. The absolute condition number of $f$ at $x$ is $$ \hat\kappa = \hat\kappa(x) = \lim_{\delta \to 0^+} ; \sup_{0 < |\delta x| \le \delta} \frac{|f(x + \delta x) - f(x)|}{|\delta x|}. $$ When $f$ is differentiable at $x$ with Fréchet derivative (Jacobian) $J (x)$ — the linear map best approximating $f$ near $x$ , whose multivariable form is governed by the chain rule 02.05.03 — the supremum of $∥ J (x) δ x ∥/∥ δ x ∥$ over directions is exactly the operator norm of the derivative, so $$ \hat\kappa(x) = |J(x)|. $$

Relative condition number. In numerical work relative errors are the natural currency, because floating-point data carries bounded relative error. The relative condition number of $f$ at $x$ (with $x \neq = 0$ and $f (x) \neq = 0$ ) is $$ \kappa = \kappa(x) = \lim_{\delta \to 0^+}; \sup_{0 < |\delta x| \le \delta} \left( \frac{|f(x+\delta x) - f(x)|}{|f(x)|} \Big/ \frac{|\delta x|}{|x|}\right), $$ the worst-case ratio of relative output change to relative input change. For differentiable $f$ this evaluates to $$ \kappa(x) = \frac{|J(x)|,|x|}{|f(x)|}. $$ A problem is well-conditioned at $x$ if $κ (x)$ is small (order $1$ to a few) and ill-conditioned if $κ (x)$ is large. The threshold is application-dependent, but a useful rule of thumb is that with double-precision data (about $16$ significant digits) a condition number of $1 0^{k}$ erodes roughly $k$ of those digits.

Condition number of matrix-vector multiplication. Fix a matrix $A \in F^{m \times n}$ and consider the problem $f (x) = A x$ for $x \in F^{n}$ , perturbing only $x$ . The map is linear, so $J (x) = A$ and $\overset{κ}{^} = ∥ A ∥$ . The relative condition number is $$ \kappa(x) = \frac{|A|,|x|}{|Ax|}, $$ which depends on the direction of $x$ : it is largest when $x$ aligns with the direction $A$ shrinks most.

The matrix condition number. For a square invertible $A$ , the problem of solving $A x = b$ — that is, $b \mapsto A^{- 1} b$ — has, by the same computation applied to $A^{- 1}$ , a relative condition number bounded above by the quantity $$ \kappa(A) = |A|,|A^{-1}|, $$ called the condition number of the matrix $A$ . It is independent of $b$ and serves as the worst-case sensitivity of the linear system over all right-hand sides. By convention $κ (A) = \infty$ for singular $A$ . In the operator ( $2$ -)norm, the singular value decomposition gives the closed form $$ \kappa_2(A) = \frac{\sigma_1(A)}{\sigma_n(A)}, $$ the ratio of largest to smallest singular value — the SVD-level fact stated in 01.01.12, here promoted to the organising quantity of the whole conditioning theory. Always $κ (A) \geq 1$ , with $κ_{2} (A) = 1$ exactly for scalar multiples of unitary matrices.

Counterexamples to common slips

The condition number of a matrix and the condition number of the problem of multiplying by that matrix at a particular vector are not the same number. For $f (x) = A x$ , the relative condition number $∥ A ∥∥ x ∥/∥ A x ∥$ varies with $x$ and can be as small as $1$ ; the matrix condition number $κ (A) = ∥ A ∥∥ A^{- 1} ∥$ is the maximum of that quantity over all $x$ (equivalently the condition of the solve problem), and is what people mean by "the condition number of $A$ ".
A large determinant or a small determinant says little about conditioning. The matrix $1 0^{- 10} I_{n}$ has determinant $1 0^{- 10 n}$ yet condition number $1$ (it is a scalar multiple of the identity), so it is perfectly conditioned. Conditioning is the ratio $σ_{1} / σ_{n}$ , not the scale.
Ill-conditioning is not the same as singularity. A nonsingular matrix can have $κ (A) = 1 0^{12}$ : it is invertible, the problem has a unique solution, and yet that solution is hypersensitive to data error. Conditioning measures how close the problem sits to a degenerate one, not whether it has crossed over.

Key theorem with proof Intermediate+

The central quantitative statement is that the matrix condition number governs, to first order, the sensitivity of the solution of a linear system to perturbations of the data. We prove the version perturbing the right-hand side; the coefficient-matrix version appears in the master tier.

Theorem (conditioning of the linear system $A x = b$ ). Let $A \in F^{n \times n}$ be invertible, let $b \neq = 0$ , and let $x = A^{- 1} b$ be the exact solution. If $b$ is perturbed to $b + δ b$ and $x + δ x = A^{- 1} (b + δ b)$ is the corresponding solution, then $$ \frac{|\delta x|}{|x|} \le \kappa(A),\frac{|\delta b|}{|b|}, \qquad \kappa(A) = |A|,|A^{-1}|, $$ in any norm with its induced operator norm, and the bound is attained for suitable $b$ and $δ b$ ^{[Trefethen, L. N. & Bau, D. — Numerical Linear Algebra]}.

Proof. Linearity of $A^{- 1}$ gives $δ x = A^{- 1} δ b$ , so taking operator norms, $∥ δ x ∥ = ∥ A^{- 1} δ b ∥ \leq ∥ A^{- 1} ∥ ∥ δ b ∥$ . Separately, from $b = A x$ we get $∥ b ∥ = ∥ A x ∥ \leq ∥ A ∥ ∥ x ∥$ , hence $∥ x ∥ \geq ∥ b ∥/∥ A ∥$ , equivalently $1/∥ x ∥ \leq ∥ A ∥/∥ b ∥$ . Multiplying the two inequalities, $$ \frac{|\delta x|}{|x|} \le |A^{-1}|,|\delta b| \cdot \frac{|A|}{|b|} = |A|,|A^{-1}|,\frac{|\delta b|}{|b|} = \kappa(A),\frac{|\delta b|}{|b|}. $$

For attainment in the operator norm, choose $δ b$ in the direction where $A^{- 1}$ achieves its norm, so $∥ A^{- 1} δ b ∥ = ∥ A^{- 1} ∥ ∥ δ b ∥$ ; in the SVD $A = U Σ V^{*}$ this is $δ b = u_{n}$ , the last left singular vector, giving $δ x = σ_{n}^{- 1} v_{n}$ . Choose $b$ in the direction where $A$ achieves its norm, $b = σ_{1} u_{1}$ with $x = v_{1}$ , so $∥ b ∥ = ∥ A ∥ ∥ x ∥$ . Then both inequalities are equalities and the bound is met with the value $σ_{1} / σ_{n} = κ_{2} (A)$ . $□$

Bridge. This theorem is the foundational reason the single scalar $κ (A)$ is the right summary of a linear system's difficulty: it is exactly the tight worst-case amplification of relative data error into relative solution error, and the SVD attainment argument shows this is exactly $σ_{1} / σ_{n}$ , so the geometric stretch ratio of 01.01.12 and the numerical sensitivity factor are one quantity. The same operator-norm mechanism builds toward the full perturbation theory of $A x = b$ in 43.03.03, where perturbing $A$ as well as $b$ produces the refined bound with the $(1 - κ ∥ δ A ∥/∥ A ∥)^{- 1}$ correction; and it appears again in 43.04.01, where the least-squares problem inherits a conditioning governed by $κ (A)$ together with the residual angle. Putting these together, the conditioning of a problem and the stretch geometry of its underlying linear map are dual descriptions of the same sensitivity, and the central insight is that this sensitivity is fixed before any algorithm is chosen — the bridge is that a backward-stable algorithm (treated in 43.01.03) achieves a forward error of order $κ (A) ϵ_{mach}$ and no better, so $κ (A)$ is the floor that conditioning imposes on accuracy.

Exercises Intermediate+

Exercise 3 (medium, symbolic).

Show that the relative condition number of the subtraction problem $f (x) = x - a$ at a fixed constant $a$ , perturbing $x$ , is $κ (x) = ∣ x ∣/∣ x - a ∣$ , and explain why it blows up as $x \to a$ .

Hint

$f^{'} (x) = 1$ , so $κ = ∣ f^{'} (x) ∣ ∣ x ∣/∣ f (x) ∣ = ∣ x ∣/∣ x - a ∣$ .

Answer

The derivative is $f^{'} (x) = 1$ , so $κ (x) = ∣1∣ \cdot ∣ x ∣/∣ x - a ∣ = ∣ x ∣/∣ x - a ∣$ . As $x \to a$ the denominator $∣ x - a ∣ \to 0$ while the numerator $∣ x ∣ \to ∣ a ∣ \neq = 0$ (for $a \neq = 0$ ), so $κ (x) \to \infty$ . This is the analytic statement of catastrophic cancellation: subtracting two nearly-equal quantities is arbitrarily ill-conditioned, because the small surviving difference is the ratio of a finite numerator to a vanishing denominator. The worked Beginner example, $f (x) = 1 - x$ near $x = 0.9999$ , is the case $a = 1$ giving $κ \approx 0.9999/0.0001 \approx 1 0^{4}$ .

Exercise 5 (medium, numeric).

For the matrix $$ A = \begin{pmatrix} 1 & 1 \ 1 & 1.0001 \end{pmatrix}, $$ estimate $κ_{2} (A)$ to one significant figure, and state how many digits of a double-precision right-hand side you expect to lose when solving $A x = b$ .

Hint

$κ_{2} (A) = σ_{1} / σ_{2}$ and $σ_{1} σ_{2} = ∣ det A ∣$ . Estimate $σ_{1} \approx ∥ A ∥_{2} \approx 2$ and use $det A = 0.0001$ to recover $σ_{2}$ .

Answer

The determinant is $1 \cdot 1.0001 - 1 \cdot 1 = 1 0^{- 4}$ , so $σ_{1} σ_{2} = 1 0^{- 4}$ . The largest singular value is close to the operator norm, about $2$ (the rows nearly equal $(1, 1)$ , summing constructively), so $σ_{2} \approx 1 0^{- 4} /2 = 5 \times 1 0^{- 5}$ and $κ_{2} (A) \approx 2/ (5 \times 1 0^{- 5}) = 4 \times 1 0^{4}$ . With $κ \approx 1 0^{4.6}$ , expect to lose between four and five of the roughly sixteen significant decimal digits of double precision, leaving about eleven trustworthy digits in the solution.

Exercise 6 (hard, symbolic).

Let $A$ and $B$ be invertible $n \times n$ matrices. Prove the submultiplicativity bound $κ (A B) \leq κ (A) κ (B)$ for any operator-norm condition number, and give a $2 \times 2$ example where equality fails strictly.

Hint

Use submultiplicativity of the operator norm, $∥ X Y ∥ \leq ∥ X ∥∥ Y ∥$ , on both $A B$ and $(A B)^{- 1} = B^{- 1} A^{- 1}$ .

Answer

By submultiplicativity of the operator norm, $∥ A B ∥ \leq ∥ A ∥ ∥ B ∥$ and $∥ (A B)^{- 1} ∥ = ∥ B^{- 1} A^{- 1} ∥ \leq ∥ B^{- 1} ∥ ∥ A^{- 1} ∥$ . Multiplying, $κ (A B) = ∥ A B ∥ ∥ (A B)^{- 1} ∥ \leq ∥ A ∥ ∥ B ∥ ∥ B^{- 1} ∥ ∥ A^{- 1} ∥ = κ (A) κ (B)$ . For strict inequality take $A = diag (2, 1)$ and $B = diag (1, 2)$ in the $2$ -norm: $κ_{2} (A) = κ_{2} (B) = 2$ , so the bound gives $κ_{2} (A B) \leq 4$ , but $A B = diag (2, 2) = 2 I$ has $κ_{2} (A B) = 1 < 4$ . The compensating directions cancel rather than compound.

Exercise 7 (hard, symbolic).

The relative condition number of the eigenvalue problem for a simple eigenvalue $λ$ of a diagonalizable matrix $A$ , with right eigenvector $x$ and left eigenvector $y$ (so $A x = λ x$ , $y^{*} A = λ y^{*}$ , $y^{*} x \neq = 0$ ), is the quantity $s (λ)^{- 1}$ with $s (λ) = ∣ y^{*} x ∣/ (∥ y ∥ ∥ x ∥)$ . Show that for a Hermitian matrix every eigenvalue has condition number $1$ .

Hint

For Hermitian $A$ , the left and right eigenvectors for $λ$ coincide (take $y = x$ ). Compute $s (λ)$ in that case.

Answer

For Hermitian $A$ the eigenvectors may be chosen orthonormal, and the left eigenvector for $λ$ equals the right eigenvector: $y = x$ . Then $s (λ) = ∣ x^{*} x ∣/ (∥ x ∥ ∥ x ∥) = ∥ x ∥^{2} /∥ x ∥^{2} = 1$ , so the eigenvalue condition number $s (λ)^{- 1} = 1$ . Hermitian (and more generally normal) eigenvalue problems are perfectly conditioned in this sense — a perturbation of size $ϵ$ moves each eigenvalue by at most $ϵ$ , the Weyl bound 01.01.14. The condition number degrades only when $A$ is non-normal and the left and right eigenvectors become nearly orthogonal, $∣ y^{*} x ∣ ≪ ∥ y ∥∥ x ∥$ — the regime quantified by the Bauer-Fike theorem 43.06.04.

Exercise 8 (hard, symbolic).

Show that the relative condition number of $f (x) = x$ for $x > 0$ is $\frac{1}{2}$ everywhere, and contrast this with the condition number of $g (x) = x - x - 1$ for large $x$ , explaining the discrepancy.

Hint

For $f$ , use $f^{'} (x) = \frac{1}{2 x}$ . For $g$ , rationalise: $x - x - 1 = 1/ (x + x - 1)$ , then differentiate, or argue from cancellation.

Answer

For $f (x) = x$ , $f^{'} (x) = \frac{1}{2 x}$ , so $κ_{f} (x) = ∣ f^{'} (x) ∣ ∣ x ∣/∣ f (x) ∣ = \frac{1}{2 x} \cdot x / x = \frac{1}{2}$ , constant. The square root is well-conditioned.

For $g (x) = x - x - 1$ , the problem $g$ itself is well-conditioned: rationalising gives $g (x) = 1/ (x + x - 1)$ , with $g^{'} (x) = - \frac{1}{2} (x^{- 1/2} - (x - 1)^{- 1/2}) / (x + x - 1)^{2}$ , and one finds $κ_{g} (x) \to \frac{1}{2}$ as $x \to \infty$ — no ill-conditioning in the problem. The discrepancy people observe is not in the problem but in the naïve algorithm: computing $g$ by the literal subtraction $x - x - 1$ evaluates an ill-conditioned intermediate step (subtraction of near-equal numbers, condition $\approx 2 x$ by Exercise 3) even though the overall map is benign. This separates problem conditioning (here $\frac{1}{2}$ ) from algorithm stability: the rationalised formula $1/ (x + x - 1)$ is a stable algorithm for the same well-conditioned problem.

Advanced results Master

Conditioning of the linear system under coefficient-matrix perturbation. The right-hand-side bound of the Key theorem extends to perturbations of $A$ itself. Let $A$ be invertible, $A x = b$ , and let $(A + δ A) (x + δ x) = b + δ b$ . To first order, $δ x = A^{- 1} (δ b - δ A x)$ , and a standard manipulation with the operator norm yields, whenever $κ (A) ∥ δ A ∥/∥ A ∥ < 1$ , $$ \frac{|\delta x|}{|x|} \le \frac{\kappa(A)}{1 - \kappa(A),|\delta A|/|A|}\left(\frac{|\delta A|}{|A|} + \frac{|\delta b|}{|b|}\right). $$ The leading factor is $κ (A)$ ; the denominator is a second-order correction that becomes important only when $κ (A) ∥ δ A ∥/∥ A ∥$ approaches $1$ , i.e. when the perturbation is large enough to threaten invertibility. The full a-posteriori treatment — residuals, backward error, iterative refinement — is the subject of 43.03.03.

Condition number as reciprocal distance to singularity. The $2$ -norm condition number has a geometric reading independent of any perturbation analysis. By the SVD, the distance in operator norm from an invertible $A$ to the nearest singular matrix is exactly $σ_{n}$ , the smallest singular value: $min {∥ E ∥_{2} : A + E singular} = σ_{n} (A)$ , attained by $E = - σ_{n} u_{n} v_{n}^{*}$ . Dividing by $∥ A ∥_{2} = σ_{1}$ , $$ \frac{1}{\kappa_2(A)} = \frac{\sigma_n}{\sigma_1} = \frac{\mathrm{dist}_2(A, \text{singular matrices})}{|A|_2}. $$ The reciprocal condition number is the relative distance to singularity. This is the Eckart-Young theorem of 01.01.12 specialised to the rank-deficiency boundary, and it explains why ill-conditioning ( $κ$ large) is geometrically the same as nearness to a singular (degenerate) problem.

Componentwise conditioning and the Bauer-Skeel number. The normwise condition number $κ (A)$ can overstate sensitivity when the data and perturbations have widely varying scales, because a single norm flattens that structure. The componentwise condition number measures relative perturbations entry by entry: for $∣ δ A ∣ \leq ϵ ∣ A ∣$ and $∣ δ b ∣ \leq ϵ ∣ b ∣$ (inequalities and absolute values taken entrywise), the relevant quantity is the Bauer-Skeel condition number $$ \mathrm{cond}(A, x) = \frac{\big| ,|A^{-1}|,|A|,|x| , \big|\infty}{|x|\infty}, $$ bounded above by $cond (A) = ∣ A^{- 1} ∣ ∣ A ∣_{\infty}$ , which is invariant under row scaling of $A$ and can be far smaller than $κ_{\infty} (A)$ for badly-scaled but otherwise benign systems ^{[Higham, N. J. — Accuracy and Stability of Numerical Algorithms (2nd ed.)]}.

Conditioning of polynomial roots: Wilkinson's polynomial. Root-finding can be catastrophically ill-conditioned even when each coefficient is benign. Wilkinson's celebrated example is the monic polynomial with roots $1, 2, \dots, 20$ , $$ p(x) = \prod_{k=1}^{20}(x - k) = x^{20} - 210,x^{19} + \cdots. $$ The condition number of the root $r = α_{j}$ with respect to a perturbation of the coefficient $a_{i}$ of $x^{i}$ is, by implicit differentiation of $p (α_{j}) = 0$ , $$ \frac{\partial \alpha_j}{\partial a_i} = -\frac{\alpha_j^{,i}}{p'(\alpha_j)}. $$ For $j = 16$ ( $α_{j} = 16$ ) and $i = 19$ , the value $∣ - α_{j}^{19} / p^{'} (α_{j}) ∣$ exceeds $1 0^{10}$ : a relative perturbation of order $1 0^{- 10}$ in the coefficient of $x^{19}$ splits well-separated real roots into complex pairs. The roots-from-coefficients map is the ill-conditioned problem; no root-finding algorithm can recover the roots accurately from coefficients perturbed at the level of double-precision rounding ^{[Wilkinson, J. H. — Rounding Errors in Algebraic Processes]}.

Conditioning of eigenvalues. For a simple eigenvalue $λ$ of a diagonalizable $A$ with unit right and left eigenvectors $x, y$ , first-order perturbation theory gives $δ λ = (y^{*} δ A x) / (y^{*} x) + O (∥ δ A ∥^{2})$ , so the absolute eigenvalue condition number is $1/∣ y^{*} x ∣ = 1/ s (λ)$ , with $s (λ) = ∣ y^{*} x ∣$ the eigenvalue's spectral sensitivity. Globally, the Bauer-Fike theorem bounds the migration of the entire spectrum: every eigenvalue of $A + E$ lies within $κ_{2} (V) ∥ E ∥_{2}$ of an eigenvalue of $A$ , where $A = V Λ V^{- 1}$ and $κ_{2} (V)$ is the condition number of the eigenvector matrix. For normal $A$ one may take $V$ unitary, $κ_{2} (V) = 1$ , and the spectrum is perfectly conditioned; non-normality, measured by $κ_{2} (V) ≫ 1$ , is precisely what makes eigenvalues sensitive. The full non-normal theory, including pseudospectra, is 43.06.04.

Synthesis. Conditioning is the foundational reason numerical analysis separates the difficulty of a problem from the quality of an algorithm: the condition number is a property of the map $f$ at the point $x$ , computed from its derivative as $κ = ∥ J ∥ ∥ x ∥/∥ f (x) ∥$ , and it is fixed before any method is named. This is exactly the quantity that the singular value decomposition of 01.01.12 makes geometric — for the linear-solve problem $κ_{2} (A) = σ_{1} / σ_{n}$ is the ratio of the longest to the shortest stretch of $A$ , and the central insight is that this ratio is simultaneously the worst-case error amplification, the reciprocal relative distance to a singular matrix (Eckart-Young), and the floor on accuracy that a backward-stable algorithm achieves. The matrix condition number, the polynomial-root condition number $- α^{i} / p^{'} (α)$ , and the eigenvalue condition number $1/∣ y^{*} x ∣$ are the same construction — operator norm of a derivative, relative-scaled — instantiated on three problems, and putting these together one sees that ill-conditioning, near-singularity, near-multiplicity, and catastrophic cancellation are one phenomenon viewed through different problems.

The conditioning theory built here is dual to the stability theory of 43.01.03: conditioning bounds how much error the problem must produce, stability bounds how much error the algorithm adds, and the fundamental theorem of backward-error analysis multiplies the two into the achievable forward error $O (κ ϵ_{mach})$ . The bridge from this unit to the rest of numerical linear algebra is that every later error guarantee — for $A x = b$ in 43.03.03, for least squares in 43.04.01, for eigenvalues in 43.06.04 — is a conditioning factor times a backward error, and this unit supplies the first factor.

Full proof set Master

Proposition 1 (relative condition number via the Jacobian). Let $f : X \to Y$ be Fréchet differentiable at $x \neq = 0$ with derivative $J (x)$ , and suppose $f (x) \neq = 0$ . Then the relative condition number is $κ (x) = ∥ J (x) ∥ ∥ x ∥/∥ f (x) ∥$ .

Proof. Fréchet differentiability means $f (x + δ x) - f (x) = J (x) δ x + o (∥ δ x ∥)$ as $∥ δ x ∥ \to 0$ . Hence for any direction, $$ \frac{|f(x+\delta x) - f(x)|}{|\delta x|} = \frac{|J(x),\delta x + o(|\delta x|)|}{|\delta x|} \xrightarrow{;|\delta x|\to 0;} \frac{|J(x),\delta x|}{|\delta x|}, $$ and taking the supremum over directions as $δ \to 0^{+}$ replaces the right side by $sup_{∥ v ∥ = 1} ∥ J (x) v ∥ = ∥ J (x) ∥$ , the operator norm. This is the absolute condition number $\overset{κ}{^} = ∥ J (x) ∥$ . Dividing the output relative change by the input relative change inserts the factor $∥ x ∥/∥ f (x) ∥$ uniformly (it does not depend on $δ x$ ), so the limiting supremum of the ratio is $\overset{κ}{^} ∥ x ∥/∥ f (x) ∥ = ∥ J (x) ∥ ∥ x ∥/∥ f (x) ∥$ . The $o (∥ δ x ∥)$ remainder vanishes in the limit, so the supremum is attained in the limiting sense by directions $v$ achieving $∥ J (x) v ∥ = ∥ J (x) ∥$ . $□$

Proposition 2 ( $κ_{2} (A) = σ_{1} / σ_{n}$ ). For an invertible $A \in F^{n \times n}$ with singular values $σ_{1} \geq \dots \geq σ_{n} > 0$ , $κ_{2} (A) = ∥ A ∥_{2} ∥ A^{- 1} ∥_{2} = σ_{1} / σ_{n}$ .

Proof. By the SVD $A = U Σ V^{*}$ with $Σ = diag (σ_{1}, \dots, σ_{n})$ , the operator $2$ -norm equals the largest singular value, $∥ A ∥_{2} = σ_{1}$ , since $∥ A ∥_{2} = sup_{∥ x ∥ = 1} ∥ U Σ V^{*} x ∥ = sup_{∥ y ∥ = 1} ∥Σ y ∥ = σ_{1}$ using unitary invariance of the Euclidean norm and the substitution $y = V^{*} x$ . The inverse is $A^{- 1} = V Σ^{- 1} U^{*}$ with $Σ^{- 1} = diag (σ_{1}^{- 1}, \dots, σ_{n}^{- 1})$ , whose largest diagonal entry is $σ_{n}^{- 1}$ ; by the same argument $∥ A^{- 1} ∥_{2} = σ_{n}^{- 1}$ . Therefore $κ_{2} (A) = σ_{1} \cdot σ_{n}^{- 1} = σ_{1} / σ_{n}$ . $□$

Proposition 3 (reciprocal condition number is relative distance to singularity). For invertible $A$ , $min {∥ E ∥_{2} : A + E singular} = σ_{n} (A)$ , hence $1/ κ_{2} (A) = σ_{n} / σ_{1} = dist_{2} (A, Sing) /∥ A ∥_{2}$ .

Proof. Let $A = U Σ V^{*}$ . The matrix $E = - σ_{n} u_{n} v_{n}^{*}$ , of operator norm $σ_{n}$ , gives $A + E = U (Σ - σ_{n} e_{n} e_{n}^{*}) V^{*}$ , whose last singular value is $0$ , so $A + E$ is singular; thus the minimum is at most $σ_{n}$ . Conversely, suppose $A + E$ is singular, so there is a unit vector $w$ with $(A + E) w = 0$ , i.e. $A w = - E w$ . Then $∥ E w ∥ = ∥ A w ∥ \geq σ_{n} ∥ w ∥ = σ_{n}$ , because $∥ A w ∥ = ∥Σ V^{*} w ∥ \geq σ_{n} ∥ V^{*} w ∥ = σ_{n} ∥ w ∥$ (the smallest singular value lower-bounds the stretch). Hence $∥ E ∥_{2} \geq ∥ E w ∥ \geq σ_{n}$ . The two inequalities give the minimum exactly $σ_{n}$ . Dividing by $∥ A ∥_{2} = σ_{1}$ and using Proposition 2 gives the relative-distance identity. $□$

Proposition 4 (perturbed-system bound). Let $A$ be invertible, $A x = b \neq = 0$ , and $(A + δ A) (x + δ x) = b + δ b$ with $κ (A) ∥ δ A ∥/∥ A ∥ < 1$ . Then $$ \frac{|\delta x|}{|x|} \le \frac{\kappa(A)}{1 - \kappa(A),|\delta A|/|A|}\left(\frac{|\delta A|}{|A|} + \frac{|\delta b|}{|b|}\right). $$

Proof. Subtract $A x = b$ from $(A + δ A) (x + δ x) = b + δ b$ to get $A δ x + δ A (x + δ x) = δ b$ , hence $δ x = A^{- 1} (δ b - δ A (x + δ x))$ . Taking norms and using submultiplicativity, $$ |\delta x| \le |A^{-1}|\big(|\delta b| + |\delta A|,|x| + |\delta A|,|\delta x|\big). $$ Collecting the $∥ δ x ∥$ term, $(1 - ∥ A^{- 1} ∥ ∥ δ A ∥) ∥ δ x ∥ \leq ∥ A^{- 1} ∥ (∥ δ b ∥ + ∥ δ A ∥ ∥ x ∥)$ . Divide by $∥ x ∥$ and use $∥ b ∥ \leq ∥ A ∥ ∥ x ∥$ (so $1/∥ x ∥ \leq ∥ A ∥/∥ b ∥$ ) on the $∥ δ b ∥$ term, and write $∥ A^{- 1} ∥ ∥ δ A ∥ = κ (A) ∥ δ A ∥/∥ A ∥$ throughout: $$ \Big(1 - \kappa(A)\tfrac{|\delta A|}{|A|}\Big)\frac{|\delta x|}{|x|} \le \kappa(A)\Big(\tfrac{|\delta b|}{|b|} + \tfrac{|\delta A|}{|A|}\Big). $$ The hypothesis $κ (A) ∥ δ A ∥/∥ A ∥ < 1$ makes the bracket on the left positive, and dividing through gives the stated bound. $□$

Connections Master

Singular value decomposition 01.01.12 supplies the exact closed form $κ_{2} (A) = σ_{1} / σ_{n}$ that turns the abstract definition $∥ A ∥ ∥ A^{- 1} ∥$ into a computable geometric quantity, and the Eckart-Young distance-to-singularity theorem that reads $1/ κ_{2}$ as the relative distance to the nearest singular matrix. The conditioning theory of this unit is the numerical-analysis promotion of the SVD's $σ_{1} / σ_{r}$ remark to the central organising scalar of the whole field; every condition-number identity here is an operator-norm statement about a derivative, and the SVD is what evaluates those operator norms.
Chain rule for multi-variable functions 02.05.03 is the analytic engine under the relative condition number $κ = ∥ J (x) ∥ ∥ x ∥/∥ f (x) ∥$ : the Jacobian $J (x)$ is the multivariable derivative whose composition law is the chain rule, and conditioning of a composite problem $f = g \circ h$ obeys the chain-rule-driven sub-bound $κ_{f} \leq κ_{g} \cdot κ_{h}$ at matched points. This is the precise sense in which conditioning of a pipeline can compound: the condition numbers of the stages multiply, exactly as the Jacobians compose.
Floating-point arithmetic and the IEEE model 43.01.01 is the same-wave sibling that supplies the input-error model conditioning acts on: data enters a computation with relative error bounded by the unit roundoff $ϵ_{mach}$ , and the condition number is the factor by which conditioning magnifies that floor into output error. Conditioning is meaningful precisely because floating point guarantees a relative error model, which is why the relative (not absolute) condition number is the operative quantity in numerical work.
Backward stability and backward-error analysis 43.01.03 is the dual same-wave sibling: conditioning measures the error the problem forces, stability measures the error the algorithm adds, and the fundamental theorem of backward-error analysis multiplies them into the achievable forward error, $\frac{∥ f ~ ( x ) - f ( x ) ∥}{∥ f ( x ) ∥} = O (κ (x) ϵ_{mach})$ . This unit owns the first factor $κ (x)$ ; 43.01.03 owns the second. The two together are the conceptual core of the field, and neither alone explains the accuracy of a computation.
Perturbation theory for $A x = b$ 43.03.03 takes the Key theorem's right-hand-side bound to its full form — perturbing $A$ and $b$ simultaneously, the residual estimate $∥ δ x ∥/∥ x ∥ \leq κ (A) ∥ r ∥/∥ b ∥$ , the Rigal-Gaches backward error, and iterative refinement — turning the condition number into the concrete a-posteriori guarantee for a computed solution. The conditioning here is the analytic input to that solver-level error analysis.
Bauer-Fike and the conditioning of eigenvalues 43.06.04 instantiate the condition-number construction on the eigenvalue problem, where the relevant quantity is $1/∣ y^{*} x ∣$ for a simple eigenvalue and the global bound is $κ_{2} (V) ∥ E ∥$ . Normal matrices are perfectly conditioned ( $κ_{2} (V) = 1$ , matching the Weyl bound of 01.01.14); non-normality is the eigenvalue analogue of ill-conditioning. This unit defines the conditioning framework; 43.06.04 specialises it to the spectral problem where left and right eigenvectors diverge.

Historical & philosophical context Master

The matrix condition number was introduced by Alan Turing in Rounding-off errors in matrix processes (Quarterly Journal of Mechanics and Applied Mathematics 1, 1948, 287–308), where he defined two measures of the ill-conditioning of a linear system — the "N-condition number" $N (A) N (A^{- 1})$ built from the Frobenius norm and the "M-condition number" — to explain why Gaussian elimination on certain systems lost accuracy independent of the arithmetic ^{[Turing, A. M. — Rounding-off errors in matrix processes]}. Turing's paper, written in the first years of stored-program computing, isolated the decisive conceptual point: the loss of accuracy was a property of the matrix, not of his method, and could be predicted before computing.

John von Neumann and Herman Goldstine had, in their 1947 analysis of matrix inversion, already studied the propagation of rounding error through elimination, and the operator-norm condition number $∥ A ∥ ∥ A^{- 1} ∥$ in the form used today was crystallised through the work of the numerical-analysis school of the 1950s and 1960s. James Wilkinson made the conditioning-versus-stability distinction the organising principle of the subject; his Rounding Errors in Algebraic Processes (Prentice-Hall, 1963) and The Algebraic Eigenvalue Problem (Oxford, 1965) established backward error analysis and, with the degree-20 "Wilkinson polynomial", gave the canonical demonstration that a problem can be wildly ill-conditioned while every coefficient is innocuous ^{[Wilkinson, J. H. — Rounding Errors in Algebraic Processes]}. The polynomial-root example showed that the roots-from-coefficients map, not any root-finding code, was at fault, sharpening the separation of problem from algorithm.

The synthesis into the modern framework — absolute and relative condition numbers of a general problem $f$ via its derivative, the matrix condition number as $σ_{1} / σ_{n}$ , componentwise and structured conditioning — is due to the textbook tradition consolidated by Golub and Van Loan's Matrix Computations (1983, four editions), Trefethen and Bau's Numerical Linear Algebra (SIAM, 1997), and Nicholas Higham's Accuracy and Stability of Numerical Algorithms (SIAM, 1996, 2nd ed. 2002), the last of which gave the comprehensive treatment of componentwise condition numbers and the Bauer-Skeel theory ^{[Higham, N. J. — Accuracy and Stability of Numerical Algorithms (2nd ed.)]}.

Bibliography Master

@article{turing1948rounding,
  author  = {Turing, Alan M.},
  title   = {Rounding-off Errors in Matrix Processes},
  journal = {The Quarterly Journal of Mechanics and Applied Mathematics},
  volume  = {1},
  number  = {1},
  year    = {1948},
  pages   = {287--308}
}

@article{vonneumann1947numerical,
  author  = {von Neumann, John and Goldstine, Herman H.},
  title   = {Numerical Inverting of Matrices of High Order},
  journal = {Bulletin of the American Mathematical Society},
  volume  = {53},
  number  = {11},
  year    = {1947},
  pages   = {1021--1099}
}

@book{wilkinson1963rounding,
  author    = {Wilkinson, James H.},
  title     = {Rounding Errors in Algebraic Processes},
  publisher = {Prentice-Hall},
  year      = {1963}
}

@book{wilkinson1965eigenvalue,
  author    = {Wilkinson, James H.},
  title     = {The Algebraic Eigenvalue Problem},
  publisher = {Oxford University Press},
  year      = {1965}
}

@book{trefethenbau1997,
  author    = {Trefethen, Lloyd N. and Bau, David},
  title     = {Numerical Linear Algebra},
  publisher = {SIAM},
  year      = {1997}
}

@book{golubvanloan2013,
  author    = {Golub, Gene H. and Van Loan, Charles F.},
  title     = {Matrix Computations},
  edition   = {4},
  publisher = {Johns Hopkins University Press},
  year      = {2013}
}

@book{higham2002accuracy,
  author    = {Higham, Nicholas J.},
  title     = {Accuracy and Stability of Numerical Algorithms},
  edition   = {2},
  publisher = {SIAM},
  year      = {2002}
}

@book{demmel1997applied,
  author    = {Demmel, James W.},
  title     = {Applied Numerical Linear Algebra},
  publisher = {SIAM},
  year      = {1997}
}

Prerequisites

01.01.12
02.05.03

Tier anchors

beginner: Trefethen-Bau 1997 *Numerical Linear Algebra* (SIAM) Lecture 12 (the opening 'condition of a problem' discussion); Higham 2002 *Accuracy and Stability of Numerical Algorithms* 2e (SIAM) §1.5-1.6 (informal conditioning)
intermediate: Trefethen-Bau 1997 *Numerical Linear Algebra* (SIAM) Lectures 12-13 (absolute/relative condition number, condition of matrix-vector product and of Ax=b); Higham 2002 *Accuracy and Stability of Numerical Algorithms* 2e (SIAM) Ch. 1 and §6.1
master: Higham 2002 *Accuracy and Stability of Numerical Algorithms* 2e (SIAM) Ch. 1-2, Ch. 6; Trefethen-Bau 1997 *Numerical Linear Algebra* (SIAM) Lectures 12-13; Golub-Van Loan 2013 *Matrix Computations* 4e (Johns Hopkins) §2.6 (the singular value/condition number theory); Demmel 1997 *Applied Numerical Linear Algebra* (SIAM) Ch. 1-2

References

Trefethen, L. N. & Bau, D. — Numerical Linear Algebra · SIAM, 1997. Lectures 12-13: the condition of a problem, absolute and relative condition numbers, condition of matrix-vector multiplication and of the linear system Ax=b, the matrix condition number kappa(A)=||A|| ||A^{-1}|| and its SVD value sigma_1/sigma_n.
Higham, N. J. — Accuracy and Stability of Numerical Algorithms (2nd ed.) · SIAM, 2002. Ch. 1 (principles of finite-precision computation, conditioning vs stability), §1.6 (condition numbers), Ch. 2 (floating point), Ch. 6 (condition of linear systems, componentwise condition numbers, the Bauer-Skeel number).
Golub, G. H. & Van Loan, C. F. — Matrix Computations (4th ed.) · Johns Hopkins University Press, 2013. §2.6: matrix and vector norms, the singular value decomposition, kappa_2(A)=sigma_1/sigma_n, distance to the nearest singular matrix.
Wilkinson, J. H. — Rounding Errors in Algebraic Processes · Prentice-Hall, 1963. The conditioning of polynomial roots and the famous perturbation analysis of the degree-20 'Wilkinson polynomial' demonstrating extreme ill-conditioning of root-finding.
Turing, A. M. — Rounding-off errors in matrix processes · Quarterly Journal of Mechanics and Applied Mathematics 1 (1948), 287-308. The first formal condition number of a matrix (the 'N-condition number' and 'M-condition number') for the sensitivity of a linear system.

Estimated time

beginner: 20m
intermediate: 45m
master: 90m