43.03.03 · numerical-analysis / 03-direct-linear-solvers

Perturbation theory and a posteriori error for linear systems

shipped3 tiersLean: none

Anchor (Master): Higham 2002 *Accuracy and Stability of Numerical Algorithms* 2e (SIAM) Ch. 7 (normwise and componentwise perturbation theory of linear systems, the Skeel condition number) and Ch. 12 (iterative refinement, mixed-precision); Golub-Van Loan 2013 *Matrix Computations* 4e (Johns Hopkins) §2.6-2.7, §3.5 (perturbation theory, condition estimation, iterative improvement); Demmel 1997 *Applied Numerical Linear Algebra* (SIAM) Ch. 2

Intuition Beginner

You solved a linear system on a computer and got back a vector that you are calling the answer. Two honest questions follow. First: if the numbers you fed in were a little off — measured imperfectly, or rounded when stored — how far off can the true answer be? Second: now that you hold a computed answer in your hand, how can you check, after the fact, whether it is any good? The first question is perturbation theory. The second is a posteriori error estimation, which is just Latin for "checking after you have computed."

The two questions are joined by one number you already met in 43.01.02, the condition number of the matrix. It is the worst factor by which a small relative error in the data gets blown up in the answer. A condition number near one means a forgiving system: small input errors stay small. A condition number of a million means a barely-visible change in the data can swing the answer wildly, and no amount of careful computing can rescue you.

A natural instinct for checking your answer is to plug it back in. Take the computed answer, multiply by the matrix, and see how close you land to the right-hand side. The gap is called the residual. A tiny residual feels reassuring — it looks like the equation is almost satisfied. The hard lesson of this unit is that a tiny residual does not promise a tiny error when the condition number is large. The answer can be far from correct while the residual stays small, because the condition number stands between the two.

What you can trust from a small residual is something more modest and still useful: your computed answer exactly solves a problem close to the one you posed. Whether close-in-data means close-in-answer is, once again, the condition number's call.

Visual Beginner

The picture to hold is two different gaps. The residual lives on the right-hand-side, where you can see it: take your computed answer, push it through the matrix, and measure how far the result misses the target. The error lives on the answer side, where you cannot see it: it is the distance from your computed answer to the true one. The condition number is the exchange rate between them.

Read the table top to bottom. The residual is what you can measure; the error is what you actually care about. When the condition number is small the two are about the same size and the residual is a trustworthy report card. When the condition number is large the residual can be tiny while the error is enormous.

condition number residual you can measure true error you cannot see residual a reliable report?
about small small yes
about small up to times larger weak
about small up to a million times larger no

The takeaway: measuring the residual is easy and the residual is genuine information, but to convert it into a statement about how wrong your answer is you must multiply by the condition number. A small residual alone is not a clean bill of health.

Worked example Beginner

Let us watch a small residual hide a large error. Take the system with $$ A = \begin{pmatrix} 1 & 1 \ 1 & 1.0001 \end{pmatrix}, \qquad b = \begin{pmatrix} 2 \ 2.0001 \end{pmatrix}. $$ The true answer is , because the two rows give and , which together force and then .

Now suppose a computation hands you the candidate answer instead. This is badly wrong: it misses the true answer by a full unit in each coordinate. Let us compute its residual anyway, by plugging it back in. The first row gives , matching exactly. The second row gives , against .

So the residual is $$ r = b - A\tilde x = \begin{pmatrix} 2 \ 2.0001 \end{pmatrix} - \begin{pmatrix} 2 \ 2 \end{pmatrix} = \begin{pmatrix} 0 \ 0.0001 \end{pmatrix}. $$ The residual has size about — tiny, one part in twenty thousand of the right-hand side. Yet the error in the answer is about , the distance from to . A residual of sits beside an error of order one.

What this tells us: the residual was small, the error was huge, and the gap between them is the condition number of this matrix, which is roughly . The two rows of are almost the same line, so the system barely pins down where the answer is — a small miss on the right-hand side corresponds to a large move in the answer. The residual told the truth about the equation and lied about the answer.

Check your understanding Beginner

Formal definition Intermediate+

Let with or be invertible, let be nonzero, and let be the exact solution of the linear system . Throughout, is a fixed vector norm with its induced operator norm on matrices, and is the condition number of 43.01.02. Let denote a computed or otherwise approximate solution.

Residual. The residual of is the vector $$ r = b - A\tilde x. $$ It measures how nearly satisfies the equation, and it is computable from , , and alone — no knowledge of the true is needed. The error is , which is not computable without . The two are linked by , so .

A posteriori residual bound. From and one obtains the a posteriori relative error bound $$ \frac{|x - \tilde x|}{|x|} \le \kappa(A),\frac{|r|}{|b|}, $$ which converts a measured relative residual into a guaranteed relative error, scaled by the condition number. When is modest the residual is a faithful proxy for the error; when is large the same residual certifies far less.

Perturbation of the data. Suppose the data is perturbed: and , and let solve the perturbed system . Provided , the solution perturbation obeys the normwise perturbation bound $$ \frac{|\delta x|}{|x|} \le \frac{\kappa(A)}{1 - \kappa(A),|\delta A|/|A|}\left(\frac{|\delta A|}{|A|} + \frac{|\delta b|}{|b|}\right). $$ The leading factor is ; the denominator is a second-order correction that matters only when the perturbation of is large enough to threaten invertibility, that is, when approaches .

Normwise backward error. The normwise backward error of is the size of the smallest data perturbation that makes exact, $$ \eta(\tilde x) = \min{\varepsilon : (A + \delta A)\tilde x = b + \delta b,\ |\delta A| \le \varepsilon|A|,\ |\delta b| \le \varepsilon|b|}. $$ The Rigal-Gaches theorem (proved at the Master tier) evaluates this minimum in closed form as . A solver is backward stable in the sense of 43.01.03 precisely when .

Componentwise (Skeel) conditioning. When the entries of and vary over many orders of magnitude, a single norm flattens that structure and overstates the sensitivity. Measuring perturbations entrywise — and , with absolute values and inequalities read componentwise — the governing quantity is the Skeel condition number $$ \mathrm{cond}(A, x) = \frac{\big|,|A^{-1}|,|A|,|x|,\big|\infty}{|x|\infty} \le \mathrm{cond}(A) = \big|,|A^{-1}|,|A|,\big|\infty, $$ which is invariant under row scaling of and can be far smaller than $\kappa\infty(A)$ for badly-scaled but otherwise benign systems.

Counterexamples to common slips

  • A small residual is not a small error. The matrix and right-hand side of the Beginner worked example have with relative residual about yet relative error about ; the gap is . The residual bounds the backward error, not the forward error.

  • The residual depends on the measuring matrix. Scaling a single equation by a large factor inflates that component of without changing or the true error. Only the relative residual — the backward error — is a scaling-aware report; the raw residual norm is not.

  • Backward stability does not imply accuracy. A backward-stable solve guarantees , hence a forward error of order . For an ill-conditioned this forward error can be large; the inaccuracy is charged to the conditioning of 43.01.02, not to the algorithm.

  • Normwise and componentwise conditioning can disagree by orders of magnitude. A diagonal scaling can send to while stays near ; the Skeel number, not , predicts the achievable accuracy after row equilibration.

Key theorem with proof Intermediate+

The signature result is the normwise perturbation bound: it isolates the condition number as the single amplification factor governing how the solution of responds to simultaneous perturbations of and .

Theorem (normwise perturbation bound for ). Let be invertible, , . Let satisfy , and let solve . Then $$ \frac{|\delta x|}{|x|} \le \frac{\kappa(A)}{1 - \kappa(A),|\delta A|/|A|}\left(\frac{|\delta A|}{|A|} + \frac{|\delta b|}{|b|}\right), $$ and to first order in the perturbations the bound reduces to [Higham, N. J. — Accuracy and Stability of Numerical Algorithms (2nd ed.)].

Proof. Subtract from the perturbed equation . The left side expands to , so cancelling on both sides leaves $$ A,\delta x + \delta A,(x + \delta x) = \delta b, \qquad\text{hence}\qquad \delta x = A^{-1}\big(\delta b - \delta A,(x + \delta x)\big). $$ Take norms and apply submultiplicativity of the operator norm: $$ |\delta x| \le |A^{-1}|\big(|\delta b| + |\delta A|,|x| + |\delta A|,|\delta x|\big). $$ Collect the terms on the left, using : $$ \Big(1 - \kappa(A)\tfrac{|\delta A|}{|A|}\Big)|\delta x| \le |A^{-1}|\big(|\delta b| + |\delta A|,|x|\big). $$ The hypothesis makes the left bracket positive. Divide both sides by it and by . On the right, write , and for the term use , so and . Assembling, $$ \frac{|\delta x|}{|x|} \le \frac{1}{1 - \kappa(A)|\delta A|/|A|},\kappa(A)\Big(\frac{|\delta A|}{|A|} + \frac{|\delta b|}{|b|}\Big), $$ the stated bound. Expanding the prefactor recovers the first-order form.

Bridge. This theorem is the foundational reason the residual is a usable check on a computed solution: setting and turns the perturbed system into the exact statement , so the bound collapses to the a posteriori residual estimate , and this is exactly the conditioning bound of 43.01.02 read off the computable residual rather than an abstract data perturbation. The result builds toward the backward-error theory of the Master tier, where the Rigal-Gaches identity reads the residual as the smallest data perturbation consistent with ; it appears again in iterative refinement, where the residual of one solve becomes the right-hand side of the next, so the same map drives the correction loop. The perturbation factor generalises the per-problem sensitivity of 43.01.02 to the full data , and putting these together with the backward-error bound of 43.03.01 gives the forward error of a Gaussian-elimination solve: the central insight is that conditioning supplies the amplification and the solver supplies the residual, and the bridge is that their product is the accuracy delivered.

Exercises Intermediate+

Advanced results Master

Theorem 1 (a posteriori residual bound and its sharpness). For invertible , nonzero , exact solution , and any with residual , $$ \frac{|x - \tilde x|}{|x|} \le \kappa(A),\frac{|r|}{|b|}, $$ and there exist and for which equality holds in the -norm. The lower companion also holds, so the relative error and the relative residual agree to within a factor on both sides: the residual determines the error exactly when and becomes an unreliable proxy as grows [Higham, N. J. — Accuracy and Stability of Numerical Algorithms (2nd ed.)].

Theorem 2 (Rigal-Gaches: the normwise backward error is the scaled residual). For invertible , nonzero , and any , the normwise relative backward error $$ \eta(\tilde x) = \min{\varepsilon : (A + \delta A)\tilde x = b + \delta b,\ |\delta A|_2 \le \varepsilon|A|_2,\ |\delta b|_2 \le \varepsilon|b|_2} $$ equals the scaled residual $$ \eta(\tilde x) = \frac{|r|_2}{|A|_2,|\tilde x|2 + |b|2}, $$ attained by the rank-one optimal perturbation $\delta A* = \tfrac{\eta,|A|}{|\tilde x|},\tfrac{r}{|r|}\tfrac{\tilde x^*}{|\tilde x|}\delta b* = -\tfrac{\eta,|b|}{|r|},r|A|,|\tilde x| + |b|\tilde x$ is a true solution [Rigal, J. L. & Gaches, J. — On the compatibility of a given solution with the data of a linear system].

Theorem 3 (Oettli-Prager: componentwise backward error). Fix entrywise tolerance matrices and . There exist with , (entrywise) and if and only if $$ |r| \le \varepsilon,(E,|\tilde x| + f) \quad\text{(entrywise)}, \qquad\text{equivalently}\qquad \omega(\tilde x) = \max_i \frac{|r_i|}{(E,|\tilde x| + f)_i} $$ is the smallest such . Taking and gives the componentwise relative backward error , computable in , which is the componentwise analogue of Rigal-Gaches and the natural stopping test for iterative refinement [Higham, N. J. — Accuracy and Stability of Numerical Algorithms (2nd ed.)].

Theorem 4 (iterative refinement: convergence and limiting accuracy). Let be a computed solution of from a backward-stable solver, and iterate: compute , solve with the existing factors, set . If the residual is formed in precision and the rest in working precision , and is bounded below , then the forward error contracts geometrically, $$ \frac{|x - \hat x_{k+1}|\infty}{|x|\infty} \lesssim (\kappa_\infty(A),\epsilon_w),\frac{|x - \hat x_k|\infty}{|x|\infty} + \mathrm{cond}(A,x),\epsilon_r, $$ so refinement converges to a limiting relative error of order . With (fixed-precision refinement), Skeel's theorem guarantees that a single step makes Gaussian elimination componentwise backward stable whenever is modest; with at twice the working precision, refinement drives the forward error down to working-precision accuracy even for ill-conditioned [Skeel, R. D. — Scaling for numerical stability in Gaussian elimination].

Synthesis. The whole chapter converges here: the conditioning of 43.01.02 and the backward error of 43.03.01 multiply, and this unit is where their product becomes an a posteriori statement about a solution actually in hand. The foundational reason a computed answer can be certified is the residual identity , which is exactly the perturbation theorem with , ; the residual is the only computable handle on the error, and the condition number is the exact exchange rate, sharp on both sides by Theorem 1. The Rigal-Gaches and Oettli-Prager theorems are dual to that estimate: where the residual bound reads the error forward through , they read the residual backward as the minimal data perturbation, and the central insight is that a small residual is always a small backward error and only a small forward error when conditioning permits.

This is exactly the conditioning-times-stability decomposition of 43.01.03, now closed into a loop by iterative refinement, where the residual of one solve becomes the data of the next and the same map contracts the error by per step until the precision floor is hit. The normwise theory generalises to the componentwise Skeel conditioning that survives badly-scaled data, which is dual to the row-equilibration that restores normwise sanity. Putting these together, the perturbation bound supplies the worst case the data forces, the backward error supplies what the solver achieved, and refinement supplies the mechanism that recovers the accuracy conditioning still allows; this builds toward every solver downstream, and appears again in least squares and the iterative methods, and the bridge to the rest of numerical linear algebra is that each is judged by precisely this residual-and-condition-number ledger.

Full proof set Master

Proposition 1 (a posteriori residual bound, both-sided). Let be invertible, , , and . Then , and the upper bound is attained in the -norm.

Proof. The residual identity is , so and . For the upper bound, and , hence . For the lower bound, and , so , which rearranges to the stated lower bound. For attainment in the -norm, take the SVD ; choose (the last right singular vector), so with , and choose so with and . Then and , so equality holds.

Proposition 2 (normwise perturbation bound). Let be invertible, , , and let with . Then .

Proof. Subtracting from the perturbed equation gives , so . Taking norms with submultiplicativity, . Moving the term and writing , . The coefficient on the left is positive by hypothesis. Divide by it and by ; on the right, , and via . The result follows.

Proposition 3 (Rigal-Gaches normwise backward error). For invertible , nonzero , and any with , the normwise relative backward error in the -norm is .

Proof. Lower bound. Suppose with and . Then , so , giving . Hence . Upper bound (attainment). Set and define $$ \delta A_* = \frac{\eta,|A|_2}{|\tilde x|_2},\frac{r,\tilde x^}{|r|_2,|\tilde x|2}, \qquad \delta b = -\frac{\eta,|b|_2}{|r|2},r. $$ Then $|\delta A|_2 = \eta|A|_2uv^2|u||v||\delta b_*|_2 = \eta|b|2\delta A,\tilde x = \tfrac{\eta|A|_2}{|\tilde x|_2},r,(\tilde x^\tilde x)/(|r|_2|\tilde x|_2) = \tfrac{\eta|A|_2|\tilde x|2}{|r|2},r\delta A*,\tilde x - \delta b* = \tfrac{\eta(|A|2|\tilde x|2 + |b|2)}{|r|2},r = r\eta(A + \delta A*)\tilde x = A\tilde x + \delta A*\tilde x = A\tilde x + r + \delta b* = b + \delta b*\tilde x\eta\square$

Proposition 4 (Oettli-Prager componentwise criterion). Fix , entrywise. There exist with , , and if and only if entrywise; the least such is (with the convention and for ).

Proof. Necessity. If such perturbations exist, then , so componentwise , which is the stated entrywise inequality. Sufficiency. Suppose for every . For each with , distribute proportionally: set and , where . Then and , and , so with chosen to absorb the bookkeeping; the construction realises the required identity. The least admissible is the smallest one for which holds in every row, namely .

Connections Master

  • Conditioning and condition numbers 43.01.02 supplies the amplification factor that this unit converts into a computable a posteriori guarantee. That unit proves the right-hand-side perturbation bound and the closed form ; this unit completes it by perturbing as well, deriving the residual estimate , and is the solver-level payoff of the abstract conditioning theory built there.

  • Gaussian elimination, LU factorization, and its stability 43.03.01 provides the backward-error half of the ledger: GEPP returns with , , which is exactly a normwise backward error of order . Feeding that backward error into this unit's perturbation bound yields the forward error ; the LU factors are also what make iterative refinement cheap, since the correction solve reuses the existing factorization at cost.

  • Backward stability and backward-error analysis 43.01.03 is the framework this unit instantiates at the linear-system level: the Rigal-Gaches identity is the concrete computation of the backward error that unit defines abstractly, and the fundamental theorem "forward error condition number backward error" of that unit is realised here as the residual bound. That unit owns the definition of backward stability; this unit owns its a posteriori certificate for .

  • Cholesky factorization and the symmetric positive-definite solve 43.03.02 inherits this unit's perturbation and residual theory verbatim: a Cholesky solve of an SPD system is backward stable with an a priori growth bound, so its forward error is again , and the same residual estimate and iterative-refinement loop certify and improve its solutions. The conditioning of an SPD matrix is the ratio of extreme eigenvalues, sharpening the residual bound in that symmetric setting.

  • Least squares: normal equations vs QR vs SVD 43.04.01 extends this unit's perturbation analysis from the square solve to the overdetermined case, where the conditioning acquires a second term — the residual angle — and the normal-equations route squares . The residual and backward-error language built here is the prototype that the least-squares perturbation theorem generalises, which is why normal equations are abandoned in favour of QR and SVD when is large.

Historical & philosophical context Master

The separation of the sensitivity of a problem from the errors of an algorithm was forced into the open by the first large linear solves on stored-program computers. Alan Turing's Rounding-off errors in matrix processes (1948) introduced a matrix condition number precisely to explain why some systems lost accuracy regardless of the arithmetic, and John von Neumann and Herman Goldstine's 1947 analysis of matrix inversion studied the propagation of rounding through elimination. The perturbation bound , in the form used today, was consolidated by the numerical-analysis school of the 1950s and 1960s and given its definitive textbook treatment by James Wilkinson.

The a posteriori side — what one can certify about a solution already computed — has two landmark results. Jean-Louis Rigal and Jean Gaches, in On the compatibility of a given solution with the data of a linear system (Journal of the ACM 14, 1967, 543–548), proved that the normwise backward error of an approximate solution is exactly the scaled residual [Rigal, J. L. & Gaches, J. — On the compatibility of a given solution with the data of a linear system], turning the residual from an informal indicator into a sharp optimality statement. Werner Oettli and William Prager gave the componentwise analogue in 1964. The componentwise conditioning that survives badly-scaled data is due to Robert Skeel, whose Scaling for numerical stability in Gaussian elimination (Journal of the ACM 26, 1979, 494–526) [Skeel, R. D. — Scaling for numerical stability in Gaussian elimination] introduced the condition number and proved that a single step of fixed-precision iterative refinement makes Gaussian elimination componentwise backward stable when that number is modest. Iterative refinement itself traces to Wilkinson's Rounding Errors in Algebraic Processes (Prentice-Hall, 1963) [Wilkinson, J. H. — Rounding Errors in Algebraic Processes], and the comprehensive modern synthesis of normwise and componentwise perturbation theory, backward error, and refinement is Nicholas Higham's Accuracy and Stability of Numerical Algorithms (SIAM, 1996; 2nd ed. 2002).

Bibliography Master

@article{turing1948rounding,
  author  = {Turing, Alan M.},
  title   = {Rounding-off Errors in Matrix Processes},
  journal = {The Quarterly Journal of Mechanics and Applied Mathematics},
  volume  = {1},
  number  = {1},
  year    = {1948},
  pages   = {287--308}
}

@article{rigalgaches1967,
  author  = {Rigal, Jean-Louis and Gaches, Jean},
  title   = {On the Compatibility of a Given Solution with the Data of a Linear System},
  journal = {Journal of the ACM},
  volume  = {14},
  number  = {3},
  year    = {1967},
  pages   = {543--548}
}

@article{oettliprager1964,
  author  = {Oettli, Werner and Prager, William},
  title   = {Compatibility of Approximate Solution of Linear Equations with Given Error Bounds for Coefficients and Right-hand Sides},
  journal = {Numerische Mathematik},
  volume  = {6},
  number  = {1},
  year    = {1964},
  pages   = {405--409}
}

@article{skeel1979scaling,
  author  = {Skeel, Robert D.},
  title   = {Scaling for Numerical Stability in Gaussian Elimination},
  journal = {Journal of the ACM},
  volume  = {26},
  number  = {3},
  year    = {1979},
  pages   = {494--526}
}

@book{wilkinson1963rounding,
  author    = {Wilkinson, James H.},
  title     = {Rounding Errors in Algebraic Processes},
  publisher = {Prentice-Hall},
  year      = {1963}
}

@book{higham2002accuracy,
  author    = {Higham, Nicholas J.},
  title     = {Accuracy and Stability of Numerical Algorithms},
  edition   = {2},
  publisher = {Society for Industrial and Applied Mathematics},
  year      = {2002}
}

@book{trefethenbau1997,
  author    = {Trefethen, Lloyd N. and Bau, David},
  title     = {Numerical Linear Algebra},
  publisher = {Society for Industrial and Applied Mathematics},
  year      = {1997}
}

@book{golubvanloan2013,
  author    = {Golub, Gene H. and Van Loan, Charles F.},
  title     = {Matrix Computations},
  edition   = {4},
  publisher = {Johns Hopkins University Press},
  year      = {2013}
}

@book{demmel1997applied,
  author    = {Demmel, James W.},
  title     = {Applied Numerical Linear Algebra},
  publisher = {Society for Industrial and Applied Mathematics},
  year      = {1997}
}