43.03.03 · numerical-analysis / 03-direct-linear-solvers

Perturbation theory and a posteriori error for linear systems

shipped3 tiersLean: none

Anchor (Master): Higham 2002 *Accuracy and Stability of Numerical Algorithms* 2e (SIAM) Ch. 7 (normwise and componentwise perturbation theory of linear systems, the Skeel condition number) and Ch. 12 (iterative refinement, mixed-precision); Golub-Van Loan 2013 *Matrix Computations* 4e (Johns Hopkins) §2.6-2.7, §3.5 (perturbation theory, condition estimation, iterative improvement); Demmel 1997 *Applied Numerical Linear Algebra* (SIAM) Ch. 2

Intuition Beginner

You solved a linear system on a computer and got back a vector that you are calling the answer. Two honest questions follow. First: if the numbers you fed in were a little off — measured imperfectly, or rounded when stored — how far off can the true answer be? Second: now that you hold a computed answer in your hand, how can you check, after the fact, whether it is any good? The first question is perturbation theory. The second is a posteriori error estimation, which is just Latin for "checking after you have computed."

The two questions are joined by one number you already met in 43.01.02, the condition number of the matrix. It is the worst factor by which a small relative error in the data gets blown up in the answer. A condition number near one means a forgiving system: small input errors stay small. A condition number of a million means a barely-visible change in the data can swing the answer wildly, and no amount of careful computing can rescue you.

A natural instinct for checking your answer is to plug it back in. Take the computed answer, multiply by the matrix, and see how close you land to the right-hand side. The gap is called the residual. A tiny residual feels reassuring — it looks like the equation is almost satisfied. The hard lesson of this unit is that a tiny residual does not promise a tiny error when the condition number is large. The answer can be far from correct while the residual stays small, because the condition number stands between the two.

What you can trust from a small residual is something more modest and still useful: your computed answer exactly solves a problem close to the one you posed. Whether close-in-data means close-in-answer is, once again, the condition number's call.

Visual Beginner

The picture to hold is two different gaps. The residual lives on the right-hand-side, where you can see it: take your computed answer, push it through the matrix, and measure how far the result misses the target. The error lives on the answer side, where you cannot see it: it is the distance from your computed answer to the true one. The condition number is the exchange rate between them.

Read the table top to bottom. The residual is what you can measure; the error is what you actually care about. When the condition number is small the two are about the same size and the residual is a trustworthy report card. When the condition number is large the residual can be tiny while the error is enormous.

condition number	residual you can measure	true error you cannot see	residual a reliable report?
about $1$	small	small	yes
about $100$	small	up to $100$ times larger	weak
about $1000000$	small	up to a million times larger	no

The takeaway: measuring the residual is easy and the residual is genuine information, but to convert it into a statement about how wrong your answer is you must multiply by the condition number. A small residual alone is not a clean bill of health.

Worked example Beginner

Let us watch a small residual hide a large error. Take the system $A x = b$ with $$ A = \begin{pmatrix} 1 & 1 \ 1 & 1.0001 \end{pmatrix}, \qquad b = \begin{pmatrix} 2 \ 2.0001 \end{pmatrix}. $$ The true answer is $x = (1, 1)$ , because the two rows give $x_{1} + x_{2} = 2$ and $x_{1} + 1.0001 x_{2} = 2.0001$ , which together force $x_{2} = 1$ and then $x_{1} = 1$ .

Now suppose a computation hands you the candidate answer $\tilde{x} = (2, 0)$ instead. This is badly wrong: it misses the true answer $(1, 1)$ by a full unit in each coordinate. Let us compute its residual anyway, by plugging it back in. The first row gives $1 \times 2 + 1 \times 0 = 2$ , matching $b_{1} = 2$ exactly. The second row gives $1 \times 2 + 1.0001 \times 0 = 2$ , against $b_{2} = 2.0001$ .

So the residual is $$ r = b - A\tilde x = \begin{pmatrix} 2 \ 2.0001 \end{pmatrix} - \begin{pmatrix} 2 \ 2 \end{pmatrix} = \begin{pmatrix} 0 \ 0.0001 \end{pmatrix}. $$ The residual has size about $0.0001$ — tiny, one part in twenty thousand of the right-hand side. Yet the error in the answer is about $1.4$ , the distance from $(2, 0)$ to $(1, 1)$ . A residual of $0.0001$ sits beside an error of order one.

What this tells us: the residual was small, the error was huge, and the gap between them is the condition number of this matrix, which is roughly $40000$ . The two rows of $A$ are almost the same line, so the system barely pins down where the answer is — a small miss on the right-hand side corresponds to a large move in the answer. The residual told the truth about the equation and lied about the answer.

Check your understanding Beginner

Exercise (easy, multiple choice).

You compute a solution $\tilde{x}$ to $A x = b$ and find the residual $r = b - A \tilde{x}$ is very small. The matrix is badly ill-conditioned. What can you safely conclude?

A. The error $\tilde{x} - x$ is also very small. B. $\tilde{x}$ exactly solves a system close to the one you posed, but it may still be far from the true answer. C. The matrix must actually be well-conditioned. D. Nothing at all can be learned from the residual.

Hint

A small residual is a statement about the data side of the equation. Converting it to a statement about the answer requires multiplying by the condition number.

Answer

B. Feedback-correct: correct; a small residual certifies that $\tilde{x}$ exactly solves a nearby problem (small backward error), but the forward error can still be as large as the condition number times the residual. Feedback-wrong: a small residual does not bound the error when the condition number is large (A); the condition number is a property of $A$ that the residual does not change (C); the residual is genuine, useful information about backward error, not nothing (D).

Formal definition Intermediate+

Let $A \in F^{n \times n}$ with $F = R$ or $C$ be invertible, let $b \in F^{n}$ be nonzero, and let $x = A^{- 1} b$ be the exact solution of the linear system $A x = b$ . Throughout, $∥ \cdot ∥$ is a fixed vector norm with its induced operator norm on matrices, and $κ (A) = ∥ A ∥ ∥ A^{- 1} ∥$ is the condition number of 43.01.02. Let $\tilde{x}$ denote a computed or otherwise approximate solution.

Residual. The residual of $\tilde{x}$ is the vector $$ r = b - A\tilde x. $$ It measures how nearly $\tilde{x}$ satisfies the equation, and it is computable from $A$ , $b$ , and $\tilde{x}$ alone — no knowledge of the true $x$ is needed. The error is $e = x - \tilde{x}$ , which is not computable without $x$ . The two are linked by $A e = A (x - \tilde{x}) = b - A \tilde{x} = r$ , so $e = A^{- 1} r$ .

A posteriori residual bound. From $e = A^{- 1} r$ and $b = A x$ one obtains the a posteriori relative error bound $$ \frac{|x - \tilde x|}{|x|} \le \kappa(A),\frac{|r|}{|b|}, $$ which converts a measured relative residual $∥ r ∥/∥ b ∥$ into a guaranteed relative error, scaled by the condition number. When $κ (A)$ is modest the residual is a faithful proxy for the error; when $κ (A)$ is large the same residual certifies far less.

Perturbation of the data. Suppose the data is perturbed: $A \mapsto A + δ A$ and $b \mapsto b + δ b$ , and let $x + δ x$ solve the perturbed system $(A + δ A) (x + δ x) = b + δ b$ . Provided $κ (A) ∥ δ A ∥/∥ A ∥ < 1$ , the solution perturbation obeys the normwise perturbation bound $$ \frac{|\delta x|}{|x|} \le \frac{\kappa(A)}{1 - \kappa(A),|\delta A|/|A|}\left(\frac{|\delta A|}{|A|} + \frac{|\delta b|}{|b|}\right). $$ The leading factor is $κ (A)$ ; the denominator $1 - κ (A) ∥ δ A ∥/∥ A ∥$ is a second-order correction that matters only when the perturbation of $A$ is large enough to threaten invertibility, that is, when $κ (A) ∥ δ A ∥/∥ A ∥$ approaches $1$ .

Normwise backward error. The normwise backward error of $\tilde{x}$ is the size of the smallest data perturbation that makes $\tilde{x}$ exact, $$ \eta(\tilde x) = \min{\varepsilon : (A + \delta A)\tilde x = b + \delta b,\ |\delta A| \le \varepsilon|A|,\ |\delta b| \le \varepsilon|b|}. $$ The Rigal-Gaches theorem (proved at the Master tier) evaluates this minimum in closed form as $η (\tilde{x}) = ∥ r ∥/ (∥ A ∥ ∥ \tilde{x} ∥ + ∥ b ∥)$ . A solver is backward stable in the sense of 43.01.03 precisely when $η (\tilde{x}) = O (ϵ_{mach})$ .

Componentwise (Skeel) conditioning. When the entries of $A$ and $b$ vary over many orders of magnitude, a single norm flattens that structure and $κ (A)$ overstates the sensitivity. Measuring perturbations entrywise — $∣ δ A ∣ \leq ε ∣ A ∣$ and $∣ δ b ∣ \leq ε ∣ b ∣$ , with absolute values and inequalities read componentwise — the governing quantity is the Skeel condition number $$ \mathrm{cond}(A, x) = \frac{\big|,|A^{-1}|,|A|,|x|,\big|\infty}{|x|\infty} \le \mathrm{cond}(A) = \big|,|A^{-1}|,|A|,\big|\infty, $$ which is invariant under row scaling of $A$ and can be far smaller than $\kappa\infty(A)$ for badly-scaled but otherwise benign systems.

Counterexamples to common slips

A small residual is not a small error. The matrix and right-hand side of the Beginner worked example have $\tilde{x} = (2, 0)$ with relative residual about $5 \times 1 0^{- 5}$ yet relative error about $1$ ; the gap is $κ (A) \approx 4 \times 1 0^{4}$ . The residual bounds the backward error, not the forward error.
The residual depends on the measuring matrix. Scaling a single equation by a large factor inflates that component of $r$ without changing $\tilde{x}$ or the true error. Only the relative residual $∥ r ∥/ (∥ A ∥ ∥ \tilde{x} ∥ + ∥ b ∥)$ — the backward error — is a scaling-aware report; the raw residual norm is not.
Backward stability does not imply accuracy. A backward-stable solve guarantees $η (\tilde{x}) = O (ϵ_{mach})$ , hence a forward error of order $κ (A) ϵ_{mach}$ . For an ill-conditioned $A$ this forward error can be large; the inaccuracy is charged to the conditioning of 43.01.02, not to the algorithm.
Normwise and componentwise conditioning can disagree by orders of magnitude. A diagonal scaling can send $κ_{\infty} (A)$ to $1 0^{8}$ while $cond (A, x)$ stays near $1$ ; the Skeel number, not $κ$ , predicts the achievable accuracy after row equilibration.

Key theorem with proof Intermediate+

The signature result is the normwise perturbation bound: it isolates the condition number as the single amplification factor governing how the solution of $A x = b$ responds to simultaneous perturbations of $A$ and $b$ .

Theorem (normwise perturbation bound for $A x = b$ ). Let $A \in F^{n \times n}$ be invertible, $b \neq = 0$ , $x = A^{- 1} b$ . Let $δ A, δ b$ satisfy $κ (A) ∥ δ A ∥/∥ A ∥ < 1$ , and let $x + δ x$ solve $(A + δ A) (x + δ x) = b + δ b$ . Then $$ \frac{|\delta x|}{|x|} \le \frac{\kappa(A)}{1 - \kappa(A),|\delta A|/|A|}\left(\frac{|\delta A|}{|A|} + \frac{|\delta b|}{|b|}\right), $$ and to first order in the perturbations the bound reduces to $∥ δ x ∥/∥ x ∥ \leq κ (A) (∥ δ A ∥/∥ A ∥ + ∥ δ b ∥/∥ b ∥) + O (∥ δ ∥^{2})$ ^{[Higham, N. J. — Accuracy and Stability of Numerical Algorithms (2nd ed.)]}.

Proof. Subtract $A x = b$ from the perturbed equation $(A + δ A) (x + δ x) = b + δ b$ . The left side expands to $A x + A δ x + δ A x + δ A δ x$ , so cancelling $A x = b$ on both sides leaves $$ A,\delta x + \delta A,(x + \delta x) = \delta b, \qquad\text{hence}\qquad \delta x = A^{-1}\big(\delta b - \delta A,(x + \delta x)\big). $$ Take norms and apply submultiplicativity of the operator norm: $$ |\delta x| \le |A^{-1}|\big(|\delta b| + |\delta A|,|x| + |\delta A|,|\delta x|\big). $$ Collect the $∥ δ x ∥$ terms on the left, using $∥ A^{- 1} ∥ ∥ δ A ∥ = κ (A) ∥ δ A ∥/∥ A ∥$ : $$ \Big(1 - \kappa(A)\tfrac{|\delta A|}{|A|}\Big)|\delta x| \le |A^{-1}|\big(|\delta b| + |\delta A|,|x|\big). $$ The hypothesis $κ (A) ∥ δ A ∥/∥ A ∥ < 1$ makes the left bracket positive. Divide both sides by it and by $∥ x ∥$ . On the right, write $∥ A^{- 1} ∥ ∥ δ A ∥ ∥ x ∥/∥ x ∥ = κ (A) ∥ δ A ∥/∥ A ∥$ , and for the $∥ δ b ∥$ term use $∥ b ∥ = ∥ A x ∥ \leq ∥ A ∥ ∥ x ∥$ , so $1/∥ x ∥ \leq ∥ A ∥/∥ b ∥$ and $∥ A^{- 1} ∥ ∥ δ b ∥/∥ x ∥ \leq ∥ A^{- 1} ∥ ∥ A ∥ ∥ δ b ∥/∥ b ∥ = κ (A) ∥ δ b ∥/∥ b ∥$ . Assembling, $$ \frac{|\delta x|}{|x|} \le \frac{1}{1 - \kappa(A)|\delta A|/|A|},\kappa(A)\Big(\frac{|\delta A|}{|A|} + \frac{|\delta b|}{|b|}\Big), $$ the stated bound. Expanding the prefactor $(1 - κ (A) ∥ δ A ∥/∥ A ∥)^{- 1} = 1 + O (∥ δ A ∥)$ recovers the first-order form. $□$

Bridge. This theorem is the foundational reason the residual is a usable check on a computed solution: setting $δ A = 0$ and $δ b = - r$ turns the perturbed system into the exact statement $A (x - \tilde{x}) = r$ , so the bound collapses to the a posteriori residual estimate $∥ x - \tilde{x} ∥/∥ x ∥ \leq κ (A) ∥ r ∥/∥ b ∥$ , and this is exactly the conditioning bound of 43.01.02 read off the computable residual rather than an abstract data perturbation. The result builds toward the backward-error theory of the Master tier, where the Rigal-Gaches identity reads the residual as the smallest data perturbation consistent with $\tilde{x}$ ; it appears again in iterative refinement, where the residual of one solve becomes the right-hand side of the next, so the same map $e = A^{- 1} r$ drives the correction loop. The perturbation factor $κ (A)$ generalises the per-problem sensitivity of 43.01.02 to the full data $(A, b)$ , and putting these together with the backward-error bound of 43.03.01 gives the forward error $κ (A) ρ_{n} ϵ_{mach}$ of a Gaussian-elimination solve: the central insight is that conditioning supplies the amplification and the solver supplies the residual, and the bridge is that their product is the accuracy delivered.

Exercises Intermediate+

Exercise 2 (easy, numeric).

For the system of the Beginner worked example, with $A = (11 1 1.0001)$ , $b = (2, 2.0001)$ , and candidate $\tilde{x} = (2, 0)$ , the residual is $r = (0, 0.0001)$ . Taking $κ_{\infty} (A) \approx 4 \times 1 0^{4}$ and $∥ b ∥_{\infty} = 2.0001$ , compute the a posteriori bound on the $\infty$ -norm relative error and confirm it is consistent with the true relative error of about $0.5$ .

Hint

$∥ r ∥_{\infty} = 1 0^{- 4}$ . Plug into $κ_{\infty} (A) ∥ r ∥_{\infty} /∥ b ∥_{\infty}$ .

Answer

The bound is $κ_{\infty} (A) ∥ r ∥_{\infty} /∥ b ∥_{\infty} \approx 4 \times 1 0^{4} \times 1 0^{- 4} /2.0001 \approx 2.0$ . The true relative error is $∥ x - \tilde{x} ∥_{\infty} /∥ x ∥_{\infty} = ∥ (1, - 1) ∥_{\infty} /∥ (1, 1) ∥_{\infty} = 1/1 = 1$ . The bound $2.0$ correctly upper-bounds the actual relative error $1$ ; the residual alone ( $5 \times 1 0^{- 5}$ relative) would have wildly underestimated the error, and only multiplying by $κ$ recovers an honest guarantee.

Exercise 7 (hard, symbolic).

Prove the perturbation bound for the inverse: if $A$ is invertible and $∥ A^{- 1} ∥ ∥ δ A ∥ < 1$ , then $A + δ A$ is invertible and $∥ (A + δ A)^{- 1} - A^{- 1} ∥/∥ A^{- 1} ∥ \leq κ (A) ∥ δ A ∥/∥ A ∥/ (1 - κ (A) ∥ δ A ∥/∥ A ∥)$ .

Hint

Write $A + δ A = A (I + A^{- 1} δ A)$ and use the Neumann series for $(I + F)^{- 1}$ when $∥ F ∥ < 1$ , which gives $∥ (I + F)^{- 1} ∥ \leq 1/ (1 - ∥ F ∥)$ .

Answer

Set $F = A^{- 1} δ A$ , so $∥ F ∥ \leq ∥ A^{- 1} ∥ ∥ δ A ∥ < 1$ . Then $A + δ A = A (I + F)$ is invertible because $I + F$ is, with $(I + F)^{- 1} = \sum_{k \geq 0} (- F)^{k}$ (Neumann series), giving $∥ (I + F)^{- 1} ∥ \leq \sum_{k} ∥ F ∥^{k} = 1/ (1 - ∥ F ∥)$ . Now $(A + δ A)^{- 1} - A^{- 1} = (I + F)^{- 1} A^{- 1} - A^{- 1} = ((I + F)^{- 1} - I) A^{- 1}$ , and $(I + F)^{- 1} - I = - F (I + F)^{- 1}$ , so $∥ (A + δ A)^{- 1} - A^{- 1} ∥ \leq ∥ F ∥ ∥ (I + F)^{- 1} ∥ ∥ A^{- 1} ∥ \leq ∥ A^{- 1} ∥ ∥ δ A ∥ ∥ A^{- 1} ∥/ (1 - ∥ F ∥)$ . Divide by $∥ A^{- 1} ∥$ and use $∥ F ∥ \leq ∥ A^{- 1} ∥∥ δ A ∥ = κ (A) ∥ δ A ∥/∥ A ∥$ : the right side becomes $κ (A) ∥ δ A ∥/∥ A ∥/ (1 - κ (A) ∥ δ A ∥/∥ A ∥)$ , as claimed.

Exercise 8 (hard, symbolic).

One step of iterative refinement computes the residual $r = b - A \tilde{x}$ , solves $A d = r$ for a correction $d$ , and updates $\tilde{x}_{new} = \tilde{x} + d$ . Show that if the correction were solved exactly, one step would return the true solution. Then explain, in terms of the residual and the working precision, why in practice refinement converges geometrically with ratio about $κ (A) ϵ_{mach}$ until it stalls.

Hint

Exactly, $d = A^{- 1} r = A^{- 1} (b - A \tilde{x}) = x - \tilde{x}$ . For the practical analysis, the residual is computed and the correction solved both in finite precision.

Answer

Exactly, $d = A^{- 1} r = A^{- 1} (b - A \tilde{x}) = A^{- 1} b - \tilde{x} = x - \tilde{x}$ , so $\tilde{x}_{new} = \tilde{x} + (x - \tilde{x}) = x$ : one exact correction lands on the true solution, because the error map $e = A^{- 1} r$ is exactly what the correction solve inverts. In finite precision the correction is computed only to relative accuracy of order $κ (A) ϵ_{mach}$ (a backward-stable solve of $A d = r$ , by 43.03.01), so each step reduces the error by roughly this factor: $∥ e_{k + 1} ∥ \approx κ (A) ϵ_{mach} ∥ e_{k} ∥$ . Refinement therefore converges geometrically as long as $κ (A) ϵ_{mach} < 1$ , and it stalls once the error reaches the level set by the precision in which the residual is formed — at the working precision the limiting accuracy is $O (cond (A, x) ϵ_{mach})$ componentwise, and computing $r$ in higher precision lowers that floor.

Advanced results Master

Theorem 1 (a posteriori residual bound and its sharpness). For invertible $A$ , nonzero $b$ , exact solution $x = A^{- 1} b$ , and any $\tilde{x}$ with residual $r = b - A \tilde{x}$ , $$ \frac{|x - \tilde x|}{|x|} \le \kappa(A),\frac{|r|}{|b|}, $$ and there exist $b$ and $\tilde{x}$ for which equality holds in the $2$ -norm. The lower companion $∥ x - \tilde{x} ∥/∥ x ∥ \geq κ (A)^{- 1} ∥ r ∥/∥ b ∥$ also holds, so the relative error and the relative residual agree to within a factor $κ (A)$ on both sides: the residual determines the error exactly when $κ (A) = 1$ and becomes an unreliable proxy as $κ (A)$ grows ^{[Higham, N. J. — Accuracy and Stability of Numerical Algorithms (2nd ed.)]}.

Theorem 2 (Rigal-Gaches: the normwise backward error is the scaled residual). For invertible $A$ , nonzero $b$ , and any $\tilde{x}$ , the normwise relative backward error $$ \eta(\tilde x) = \min{\varepsilon : (A + \delta A)\tilde x = b + \delta b,\ |\delta A|_2 \le \varepsilon|A|_2,\ |\delta b|_2 \le \varepsilon|b|_2} $$ equals the scaled residual $$ \eta(\tilde x) = \frac{|r|_2}{|A|_2,|\tilde x|2 + |b|2}, $$ attained by the rank-one optimal perturbation $\delta A* = \tfrac{\eta,|A|}{|\tilde x|},\tfrac{r}{|r|}\tfrac{\tilde x^*}{|\tilde x|} $an d$ \delta b* = -\tfrac{\eta,|b|}{|r|},r $. T h u s t h er es i d u a l i s n o t m er e l y an er r or in d i c a t or : sc a l e d b y$ |A|,|\tilde x| + |b| $, i t i s * e x a c tl y * t h es ma l l es t r e l a t i v e d a t a p er t u r ba t i o n f or w hi c h$ \tilde x$ is a true solution ^{[Rigal, J. L. & Gaches, J. — On the compatibility of a given solution with the data of a linear system]}.

Theorem 3 (Oettli-Prager: componentwise backward error). Fix entrywise tolerance matrices $E \geq 0$ and $f \geq 0$ . There exist $δ A, δ b$ with $∣ δ A ∣ \leq εE$ , $∣ δ b ∣ \leq ε f$ (entrywise) and $(A + δ A) \tilde{x} = b + δ b$ if and only if $$ |r| \le \varepsilon,(E,|\tilde x| + f) \quad\text{(entrywise)}, \qquad\text{equivalently}\qquad \omega(\tilde x) = \max_i \frac{|r_i|}{(E,|\tilde x| + f)_i} $$ is the smallest such $ε$ . Taking $E = ∣ A ∣$ and $f = ∣ b ∣$ gives the componentwise relative backward error $ω = max_{i} ∣ r_{i} ∣/ (∣ A ∣ ∣ \tilde{x} ∣ + ∣ b ∣)_{i}$ , computable in $O (n^{2})$ , which is the componentwise analogue of Rigal-Gaches and the natural stopping test for iterative refinement ^{[Higham, N. J. — Accuracy and Stability of Numerical Algorithms (2nd ed.)]}.

Theorem 4 (iterative refinement: convergence and limiting accuracy). Let $\overset{x}{^}_{0}$ be a computed solution of $A x = b$ from a backward-stable solver, and iterate: compute $r_{k} = b - A \overset{x}{^}_{k}$ , solve $A d_{k} = r_{k}$ with the existing factors, set $\overset{x}{^}_{k + 1} = \overset{x}{^}_{k} + d_{k}$ . If the residual is formed in precision $ϵ_{r}$ and the rest in working precision $ϵ_{w}$ , and $κ_{\infty} (A) ϵ_{w}$ is bounded below $1$ , then the forward error contracts geometrically, $$ \frac{|x - \hat x_{k+1}|\infty}{|x|\infty} \lesssim (\kappa_\infty(A),\epsilon_w),\frac{|x - \hat x_k|\infty}{|x|\infty} + \mathrm{cond}(A,x),\epsilon_r, $$ so refinement converges to a limiting relative error of order $cond (A, x) ϵ_{r}$ . With $ϵ_{r} = ϵ_{w}$ (fixed-precision refinement), Skeel's theorem guarantees that a single step makes Gaussian elimination componentwise backward stable whenever $cond (A, x) ϵ_{w}$ is modest; with $ϵ_{r}$ at twice the working precision, refinement drives the forward error down to working-precision accuracy even for ill-conditioned $A$ ^{[Skeel, R. D. — Scaling for numerical stability in Gaussian elimination]}.

Synthesis. The whole chapter converges here: the conditioning of 43.01.02 and the backward error of 43.03.01 multiply, and this unit is where their product becomes an a posteriori statement about a solution actually in hand. The foundational reason a computed answer can be certified is the residual identity $A (x - \tilde{x}) = r$ , which is exactly the perturbation theorem with $δ A = 0$ , $δ b = - r$ ; the residual is the only computable handle on the error, and the condition number is the exact exchange rate, sharp on both sides by Theorem 1. The Rigal-Gaches and Oettli-Prager theorems are dual to that estimate: where the residual bound reads the error forward through $κ (A)$ , they read the residual backward as the minimal data perturbation, and the central insight is that a small residual is always a small backward error and only a small forward error when conditioning permits.

This is exactly the conditioning-times-stability decomposition of 43.01.03, now closed into a loop by iterative refinement, where the residual of one solve becomes the data of the next and the same map $A^{- 1}$ contracts the error by $κ (A) ϵ_{mach}$ per step until the precision floor is hit. The normwise theory generalises to the componentwise Skeel conditioning that survives badly-scaled data, which is dual to the row-equilibration that restores normwise sanity. Putting these together, the perturbation bound supplies the worst case the data forces, the backward error supplies what the solver achieved, and refinement supplies the mechanism that recovers the accuracy conditioning still allows; this builds toward every solver downstream, and appears again in least squares and the iterative methods, and the bridge to the rest of numerical linear algebra is that each is judged by precisely this residual-and-condition-number ledger.

Full proof set Master

Proposition 1 (a posteriori residual bound, both-sided). Let $A$ be invertible, $b \neq = 0$ , $x = A^{- 1} b$ , and $r = b - A \tilde{x}$ . Then $κ (A)^{- 1} ∥ r ∥/∥ b ∥ \leq ∥ x - \tilde{x} ∥/∥ x ∥ \leq κ (A) ∥ r ∥/∥ b ∥$ , and the upper bound is attained in the $2$ -norm.

Proof. The residual identity is $A (x - \tilde{x}) = b - A \tilde{x} = r$ , so $x - \tilde{x} = A^{- 1} r$ and $r = A (x - \tilde{x})$ . For the upper bound, $∥ x - \tilde{x} ∥ = ∥ A^{- 1} r ∥ \leq ∥ A^{- 1} ∥ ∥ r ∥$ and $∥ b ∥ = ∥ A x ∥ \leq ∥ A ∥ ∥ x ∥$ , hence $∥ x - \tilde{x} ∥/∥ x ∥ \leq ∥ A^{- 1} ∥ ∥ r ∥ \cdot ∥ A ∥/∥ b ∥ = κ (A) ∥ r ∥/∥ b ∥$ . For the lower bound, $∥ r ∥ = ∥ A (x - \tilde{x}) ∥ \leq ∥ A ∥ ∥ x - \tilde{x} ∥$ and $∥ x ∥ = ∥ A^{- 1} b ∥ \leq ∥ A^{- 1} ∥ ∥ b ∥$ , so $∥ r ∥/∥ b ∥ \leq ∥ A ∥ ∥ x - \tilde{x} ∥ \cdot ∥ A^{- 1} ∥/∥ x ∥ = κ (A) ∥ x - \tilde{x} ∥/∥ x ∥$ , which rearranges to the stated lower bound. For attainment in the $2$ -norm, take the SVD $A = U Σ V^{*}$ ; choose $x - \tilde{x} = v_{n}$ (the last right singular vector), so $r = A v_{n} = σ_{n} u_{n}$ with $∥ r ∥_{2} = σ_{n}$ , and choose $b = σ_{1} u_{1}$ so $x = A^{- 1} b = σ_{1} \cdot σ_{1}^{- 1} v_{1} = v_{1}$ with $∥ x ∥_{2} = 1$ and $∥ b ∥_{2} = σ_{1}$ . Then $∥ x - \tilde{x} ∥_{2} /∥ x ∥_{2} = 1$ and $κ_{2} (A) ∥ r ∥_{2} /∥ b ∥_{2} = (σ_{1} / σ_{n}) (σ_{n} / σ_{1}) = 1$ , so equality holds. $□$

Proposition 2 (normwise perturbation bound). Let $A$ be invertible, $b \neq = 0$ , $x = A^{- 1} b$ , and let $(A + δ A) (x + δ x) = b + δ b$ with $κ (A) ∥ δ A ∥/∥ A ∥ < 1$ . Then $∥ δ x ∥/∥ x ∥ \leq \frac{κ ( A )}{1 - κ ( A ) ∥ δ A ∥/∥ A ∥} (∥ δ A ∥/∥ A ∥ + ∥ δ b ∥/∥ b ∥)$ .

Proof. Subtracting $A x = b$ from the perturbed equation gives $A δ x + δ A (x + δ x) = δ b$ , so $δ x = A^{- 1} (δ b - δ A (x + δ x))$ . Taking norms with submultiplicativity, $∥ δ x ∥ \leq ∥ A^{- 1} ∥ (∥ δ b ∥ + ∥ δ A ∥ ∥ x ∥ + ∥ δ A ∥ ∥ δ x ∥)$ . Moving the $∥ δ x ∥$ term and writing $∥ A^{- 1} ∥∥ δ A ∥ = κ (A) ∥ δ A ∥/∥ A ∥$ , $(1 - κ (A) ∥ δ A ∥/∥ A ∥) ∥ δ x ∥ \leq ∥ A^{- 1} ∥ (∥ δ b ∥ + ∥ δ A ∥ ∥ x ∥)$ . The coefficient on the left is positive by hypothesis. Divide by it and by $∥ x ∥$ ; on the right, $∥ A^{- 1} ∥∥ δ A ∥∥ x ∥/∥ x ∥ = κ (A) ∥ δ A ∥/∥ A ∥$ , and $∥ A^{- 1} ∥∥ δ b ∥/∥ x ∥ \leq κ (A) ∥ δ b ∥/∥ b ∥$ via $1/∥ x ∥ \leq ∥ A ∥/∥ b ∥$ . The result follows. $□$

Proposition 3 (Rigal-Gaches normwise backward error). For invertible $A$ , nonzero $b$ , and any $\tilde{x} \neq = 0$ with $r = b - A \tilde{x}$ , the normwise relative backward error in the $2$ -norm is $η (\tilde{x}) = ∥ r ∥_{2} / (∥ A ∥_{2} ∥ \tilde{x} ∥_{2} + ∥ b ∥_{2})$ .

Proof. Lower bound. Suppose $(A + δ A) \tilde{x} = b + δ b$ with $∥ δ A ∥_{2} \leq ε ∥ A ∥_{2}$ and $∥ δ b ∥_{2} \leq ε ∥ b ∥_{2}$ . Then $r = b - A \tilde{x} = δ A \tilde{x} - δ b$ , so $∥ r ∥_{2} \leq ∥ δ A ∥_{2} ∥ \tilde{x} ∥_{2} + ∥ δ b ∥_{2} \leq ε (∥ A ∥_{2} ∥ \tilde{x} ∥_{2} + ∥ b ∥_{2})$ , giving $ε \geq ∥ r ∥_{2} / (∥ A ∥_{2} ∥ \tilde{x} ∥_{2} + ∥ b ∥_{2})$ . Hence $η (\tilde{x}) \geq ∥ r ∥_{2} / (∥ A ∥_{2} ∥ \tilde{x} ∥_{2} + ∥ b ∥_{2})$ . Upper bound (attainment). Set $η = ∥ r ∥_{2} / (∥ A ∥_{2} ∥ \tilde{x} ∥_{2} + ∥ b ∥_{2})$ and define $$ \delta A_* = \frac{\eta,|A|_2}{|\tilde x|_2},\frac{r,\tilde x^}{|r|_2,|\tilde x|2}, \qquad \delta b = -\frac{\eta,|b|_2}{|r|2},r. $$ Then $|\delta A|_2 = \eta|A|_2 $(a r ank - o n e ma t r i x$ uv^ $ha s$ 2 $- n or m$ |u||v| $) an d$ |\delta b_*|_2 = \eta|b|2 $, so t h e p er t u r ba t i o n s m ee tt h e t o l er an ce w i t h e q u a l i t y . C o m p u t e$ \delta A,\tilde x = \tfrac{\eta|A|_2}{|\tilde x|_2},r,(\tilde x^\tilde x)/(|r|_2|\tilde x|_2) = \tfrac{\eta|A|_2|\tilde x|2}{|r|2},r $, an d$ \delta A*,\tilde x - \delta b* = \tfrac{\eta(|A|2|\tilde x|2 + |b|2)}{|r|2},r = r $b y t h e d e f ini t i o n o f$ \eta $. T h er e f or e$ (A + \delta A*)\tilde x = A\tilde x + \delta A*\tilde x = A\tilde x + r + \delta b* = b + \delta b* $, so$ \tilde x $i se x a c t f or t h e p er t u r b e dd a t aa tl e v e l$ \eta $. T h e tw o b o u n d sco in c i d e .$ \square$

Proposition 4 (Oettli-Prager componentwise criterion). Fix $E \geq 0$ , $f \geq 0$ entrywise. There exist $δ A, δ b$ with $∣ δ A ∣ \leq εE$ , $∣ δ b ∣ \leq ε f$ , and $(A + δ A) \tilde{x} = b + δ b$ if and only if $∣ r ∣ \leq ε (E ∣ \tilde{x} ∣ + f)$ entrywise; the least such $ε$ is $ω = max_{i} ∣ r_{i} ∣/ (E ∣ \tilde{x} ∣ + f)_{i}$ (with the convention $0/0 = 0$ and $ρ /0 = \infty$ for $ρ > 0$ ).

Proof. Necessity. If such perturbations exist, then $r = b - A \tilde{x} = δ A \tilde{x} - δ b$ , so componentwise $∣ r_{i} ∣ = ∣ \sum_{j} δ A_{ij} \tilde{x}_{j} - δ b_{i} ∣ \leq \sum_{j} ∣ δ A_{ij} ∣∣ \tilde{x}_{j} ∣ + ∣ δ b_{i} ∣ \leq ε \sum_{j} E_{ij} ∣ \tilde{x}_{j} ∣ + ε f_{i} = ε (E ∣ \tilde{x} ∣ + f)_{i}$ , which is the stated entrywise inequality. Sufficiency. Suppose $∣ r_{i} ∣ \leq ε (E ∣ \tilde{x} ∣ + f)_{i}$ for every $i$ . For each $i$ with $(E ∣ \tilde{x} ∣ + f)_{i} > 0$ , distribute $r_{i}$ proportionally: set $δ A_{ij} = ε E_{ij} sgn (\tilde{x}_{j}) t_{i}$ and $δ b_{i} = - ε f_{i} t_{i}$ , where $t_{i} = r_{i} / (E ∣ \tilde{x} ∣ + f)_{i} \in [- 1, 1]$ . Then $∣ δ A_{ij} ∣ = ε E_{ij} ∣ t_{i} ∣ \leq ε E_{ij}$ and $∣ δ b_{i} ∣ = ε f_{i} ∣ t_{i} ∣ \leq ε f_{i}$ , and $(δ A \tilde{x} - δ b)_{i} = ε t_{i} (\sum_{j} E_{ij} ∣ \tilde{x}_{j} ∣ + f_{i}) = ε t_{i} (E ∣ \tilde{x} ∣ + f)_{i} = r_{i}$ , so $(A + δ A) \tilde{x} = A \tilde{x} + r = b + δ b$ with $δ b_{i} = ε f_{i} t_{i}$ chosen to absorb the bookkeeping; the construction realises the required identity. The least admissible $ε$ is the smallest one for which $∣ r_{i} ∣ \leq ε (E ∣ \tilde{x} ∣ + f)_{i}$ holds in every row, namely $ω = max_{i} ∣ r_{i} ∣/ (E ∣ \tilde{x} ∣ + f)_{i}$ . $□$

Connections Master

Conditioning and condition numbers 43.01.02 supplies the amplification factor $κ (A) = ∥ A ∥ ∥ A^{- 1} ∥$ that this unit converts into a computable a posteriori guarantee. That unit proves the right-hand-side perturbation bound $∥ δ x ∥/∥ x ∥ \leq κ (A) ∥ δ b ∥/∥ b ∥$ and the closed form $κ_{2} (A) = σ_{1} / σ_{n}$ ; this unit completes it by perturbing $A$ as well, deriving the residual estimate $∥ x - \tilde{x} ∥/∥ x ∥ \leq κ (A) ∥ r ∥/∥ b ∥$ , and is the solver-level payoff of the abstract conditioning theory built there.
Gaussian elimination, LU factorization, and its stability 43.03.01 provides the backward-error half of the ledger: GEPP returns $\tilde{x}$ with $(A + Δ A) \tilde{x} = b$ , $∥Δ A ∥ = O (ρ_{n} ϵ_{mach} ∥ A ∥)$ , which is exactly a normwise backward error of order $ρ_{n} ϵ_{mach}$ . Feeding that backward error into this unit's perturbation bound yields the forward error $κ (A) ρ_{n} ϵ_{mach}$ ; the LU factors are also what make iterative refinement cheap, since the correction solve $A d = r$ reuses the existing factorization at $O (n^{2})$ cost.
Backward stability and backward-error analysis 43.01.03 is the framework this unit instantiates at the linear-system level: the Rigal-Gaches identity $η (\tilde{x}) = ∥ r ∥/ (∥ A ∥∥ \tilde{x} ∥ + ∥ b ∥)$ is the concrete computation of the backward error that unit defines abstractly, and the fundamental theorem "forward error $\leq$ condition number $\times$ backward error" of that unit is realised here as the residual bound. That unit owns the definition of backward stability; this unit owns its a posteriori certificate for $A x = b$ .
Cholesky factorization and the symmetric positive-definite solve 43.03.02 inherits this unit's perturbation and residual theory verbatim: a Cholesky solve of an SPD system is backward stable with an a priori growth bound, so its forward error is again $κ (A) ϵ_{mach}$ , and the same residual estimate and iterative-refinement loop certify and improve its solutions. The conditioning $κ (A) = κ_{2} (A)$ of an SPD matrix is the ratio of extreme eigenvalues, sharpening the residual bound in that symmetric setting.
Least squares: normal equations vs QR vs SVD 43.04.01 extends this unit's perturbation analysis from the square solve to the overdetermined case, where the conditioning acquires a second term — the residual angle — and the normal-equations route squares $κ (A)$ . The residual and backward-error language built here is the prototype that the least-squares perturbation theorem generalises, which is why normal equations are abandoned in favour of QR and SVD when $κ (A)$ is large.

Historical & philosophical context Master

The separation of the sensitivity of a problem from the errors of an algorithm was forced into the open by the first large linear solves on stored-program computers. Alan Turing's Rounding-off errors in matrix processes (1948) introduced a matrix condition number precisely to explain why some systems lost accuracy regardless of the arithmetic, and John von Neumann and Herman Goldstine's 1947 analysis of matrix inversion studied the propagation of rounding through elimination. The perturbation bound $∥ δ x ∥/∥ x ∥ \leq κ (A) (∥ δ A ∥/∥ A ∥ + ∥ δ b ∥/∥ b ∥)$ , in the form used today, was consolidated by the numerical-analysis school of the 1950s and 1960s and given its definitive textbook treatment by James Wilkinson.

The a posteriori side — what one can certify about a solution already computed — has two landmark results. Jean-Louis Rigal and Jean Gaches, in On the compatibility of a given solution with the data of a linear system (Journal of the ACM 14, 1967, 543–548), proved that the normwise backward error of an approximate solution is exactly the scaled residual $∥ r ∥/ (∥ A ∥ ∥ \tilde{x} ∥ + ∥ b ∥)$ ^{[Rigal, J. L. & Gaches, J. — On the compatibility of a given solution with the data of a linear system]}, turning the residual from an informal indicator into a sharp optimality statement. Werner Oettli and William Prager gave the componentwise analogue in 1964. The componentwise conditioning that survives badly-scaled data is due to Robert Skeel, whose Scaling for numerical stability in Gaussian elimination (Journal of the ACM 26, 1979, 494–526) ^{[Skeel, R. D. — Scaling for numerical stability in Gaussian elimination]} introduced the condition number $cond (A, x) = ∣ A^{- 1} ∣∣ A ∣∣ x ∣ /∥ x ∥$ and proved that a single step of fixed-precision iterative refinement makes Gaussian elimination componentwise backward stable when that number is modest. Iterative refinement itself traces to Wilkinson's Rounding Errors in Algebraic Processes (Prentice-Hall, 1963) ^{[Wilkinson, J. H. — Rounding Errors in Algebraic Processes]}, and the comprehensive modern synthesis of normwise and componentwise perturbation theory, backward error, and refinement is Nicholas Higham's Accuracy and Stability of Numerical Algorithms (SIAM, 1996; 2nd ed. 2002).

Bibliography Master

@article{turing1948rounding,
  author  = {Turing, Alan M.},
  title   = {Rounding-off Errors in Matrix Processes},
  journal = {The Quarterly Journal of Mechanics and Applied Mathematics},
  volume  = {1},
  number  = {1},
  year    = {1948},
  pages   = {287--308}
}

@article{rigalgaches1967,
  author  = {Rigal, Jean-Louis and Gaches, Jean},
  title   = {On the Compatibility of a Given Solution with the Data of a Linear System},
  journal = {Journal of the ACM},
  volume  = {14},
  number  = {3},
  year    = {1967},
  pages   = {543--548}
}

@article{oettliprager1964,
  author  = {Oettli, Werner and Prager, William},
  title   = {Compatibility of Approximate Solution of Linear Equations with Given Error Bounds for Coefficients and Right-hand Sides},
  journal = {Numerische Mathematik},
  volume  = {6},
  number  = {1},
  year    = {1964},
  pages   = {405--409}
}

@article{skeel1979scaling,
  author  = {Skeel, Robert D.},
  title   = {Scaling for Numerical Stability in Gaussian Elimination},
  journal = {Journal of the ACM},
  volume  = {26},
  number  = {3},
  year    = {1979},
  pages   = {494--526}
}

@book{wilkinson1963rounding,
  author    = {Wilkinson, James H.},
  title     = {Rounding Errors in Algebraic Processes},
  publisher = {Prentice-Hall},
  year      = {1963}
}

@book{higham2002accuracy,
  author    = {Higham, Nicholas J.},
  title     = {Accuracy and Stability of Numerical Algorithms},
  edition   = {2},
  publisher = {Society for Industrial and Applied Mathematics},
  year      = {2002}
}

@book{trefethenbau1997,
  author    = {Trefethen, Lloyd N. and Bau, David},
  title     = {Numerical Linear Algebra},
  publisher = {Society for Industrial and Applied Mathematics},
  year      = {1997}
}

@book{golubvanloan2013,
  author    = {Golub, Gene H. and Van Loan, Charles F.},
  title     = {Matrix Computations},
  edition   = {4},
  publisher = {Johns Hopkins University Press},
  year      = {2013}
}

@book{demmel1997applied,
  author    = {Demmel, James W.},
  title     = {Applied Numerical Linear Algebra},
  publisher = {Society for Industrial and Applied Mathematics},
  year      = {1997}
}

Prerequisites

43.03.01
43.01.02
43.01.03

Tier anchors

beginner: Trefethen-Bau 1997 *Numerical Linear Algebra* (SIAM) Lectures 12-14 (conditioning of Ax=b, the residual, and the perturbation bound — opening discussion); Higham 2002 *Accuracy and Stability of Numerical Algorithms* 2e (SIAM) §7.1 (why a small residual is not a small error)
intermediate: Trefethen-Bau 1997 *Numerical Linear Algebra* (SIAM) Lectures 12-14 (the kappa(A) perturbation bound for Ax=b, the residual, and backward stability of the solve); Higham 2002 *Accuracy and Stability of Numerical Algorithms* 2e (SIAM) §7.1-7.3, §7.7 (forward/backward error, the residual bound, the Oettli-Prager and Rigal-Gaches backward-error theorems)
master: Higham 2002 *Accuracy and Stability of Numerical Algorithms* 2e (SIAM) Ch. 7 (normwise and componentwise perturbation theory of linear systems, the Skeel condition number) and Ch. 12 (iterative refinement, mixed-precision); Golub-Van Loan 2013 *Matrix Computations* 4e (Johns Hopkins) §2.6-2.7, §3.5 (perturbation theory, condition estimation, iterative improvement); Demmel 1997 *Applied Numerical Linear Algebra* (SIAM) Ch. 2

References

Trefethen, L. N. & Bau, D. — Numerical Linear Algebra · SIAM, 1997. Lectures 12-14: the conditioning of the linear system Ax=b, the perturbation bound for perturbations of A and b with the kappa(A) amplification factor, the residual r=b-A x-tilde, and the statement that backward stability plus conditioning gives the forward-error estimate kappa(A) eps_mach.
Higham, N. J. — Accuracy and Stability of Numerical Algorithms (2nd ed.) · SIAM, 2002. Ch. 7: normwise perturbation theory of Ax=b (the (1 - kappa ||dA||/||A||)^{-1} bound), the residual-based forward-error bound ||dx||/||x|| <= kappa(A) ||r||/||b||, the Rigal-Gaches normwise backward error eta = ||r||/(||A|| ||x-tilde|| + ||b||), the Oettli-Prager componentwise backward error, and the Skeel/Bauer condition number cond(A,x)=|| |A^{-1}| |A| |x| ||/||x||. Ch. 12: iterative refinement in fixed and mixed precision and its convergence/limiting-accuracy theory.
Skeel, R. D. — Scaling for numerical stability in Gaussian elimination · Journal of the ACM 26 (1979), 494-526. The componentwise (Skeel) condition number cond(A,x) and the proof that one step of iterative refinement in working precision makes Gaussian elimination componentwise backward stable when cond(A,x) is modest.
Rigal, J. L. & Gaches, J. — On the compatibility of a given solution with the data of a linear system · Journal of the ACM 14 (1967), 543-548. The theorem that the smallest normwise relative perturbation of (A,b) for which x-tilde is an exact solution equals the scaled residual eta(x-tilde)=||r||/(||A|| ||x-tilde|| + ||b||).
Wilkinson, J. H. — Rounding Errors in Algebraic Processes · Prentice-Hall, 1963. The original fixed-precision iterative-refinement analysis: computing the residual, solving for a correction with the existing LU factors, and the digit-recovery heuristic governed by kappa(A) eps_mach.

Estimated time

beginner: 20m
intermediate: 45m
master: 90m