43.04.08 · numerical-analysis / 04-least-squares-qr

Updating and downdating matrix factorisations

shipped3 tiersLean: none

Anchor (Master): Golub-Van Loan 2013 *Matrix Computations* 4e (Johns Hopkins) §6.5; Bjorck 1996 *Numerical Methods for Least Squares Problems* (SIAM) Ch. 3; Stewart 1998 *Matrix Algorithms I: Basic Decompositions* (SIAM) Ch. 4-5 (orthogonal-transformation updating, hyperbolic rotations and the downdating instability)

Intuition Beginner

You have already paid for an answer, and now the question changes a little. You fit a line to a thousand measurements and got the coefficients. A new measurement arrives. You could throw away the fit and redo all thousand-and-one from scratch — or you could nudge the answer you already have. Nudging is far cheaper, and the whole subject of this unit is how to nudge a factorization safely.

A factorization is a pre-digested form of a matrix: the triangular factor from a QR or a Cholesky decomposition is the expensive part, the part that took most of the work. When the underlying data changes by a small amount — one new row of measurements, one row removed, one rank-one tweak — the triangular factor changes only a little. There is a way to walk the old factor to the new one using a chain of tiny rotations, each touching just two rows, and the total cost is proportional to the square of the matrix size rather than its cube. For a large problem that is the difference between a fraction of a second and a minute.

Adding information is the easy direction. You fold the new row in, and a sequence of small rotations cleans the factor back into triangular shape. Every rotation here is a plain turn that preserves lengths, so nothing blows up.

Removing information is the dangerous direction. Taking a row out of a fit means subtracting, and subtraction can cancel away almost everything you had, leaving a tiny remainder swamped by rounding error. The tool for removal is a different kind of rotation — a stretch-and-turn rather than a plain turn — and it can lose accuracy exactly when the row you remove was carrying most of the weight. So updating is steady and downdating is delicate, and knowing which one you are doing is half the battle.

Visual Beginner

The table follows the same fit through the two directions — folding a new observation in, then peeling an old one out — and tracks what each direction costs and how safe it is.

Operation What changes Cost How safe
Refactor from scratch recompute the whole triangular factor highest (size cubed) always safe, always slow
Update (add a row) fold one new row in, rotate back to triangular low (size squared) very safe — rotations preserve length
Downdate (remove a row) subtract one old row, stretch-rotate back low (size squared) delicate — subtraction can cancel
Sliding window add the newest, remove the oldest low (size squared) as delicate as its downdate step

The last two columns carry the lesson. Updating and downdating both cost the same cheap amount, far below a full refactor, so the temptation is to treat them as equally routine. They are not. The plain rotations of an update never lengthen anything, so the old accuracy survives. The stretch-rotations of a downdate can magnify error without bound when the removed row was a large part of the fit, because you are subtracting two nearly equal quantities and keeping the small leftover.

The sliding-window picture at the bottom is the practical case: a fit that always uses the most recent thousand measurements, dropping the oldest as each new one arrives. It is an update glued to a downdate, and it inherits the delicacy of the downdate half.

Worked example Beginner

Suppose a fit has been reduced to a tiny upper-triangular factor and we add one new observation. We will watch a single rotation clean it back to triangular.

Start with the triangular factor and a new row stacked underneath it:

The top two rows are already triangular (a zero sits in the lower-left). The new bottom row spoils that, because it has a number in the first column where we want a zero. One rotation of the first and third rows will zero it out.

Step 1. Pick the rotation. We combine row one, which begins with , and row three, which begins with . The length of that pair is . The rotation that turns the pair into uses the fractions and .

Step 2. Apply it to the first column. The new first entry of row one is . The new first entry of row three is . The first column of row three is now zero, as wanted.

Step 3. Apply it to the second column. Row one's second entry was and row three's was : the new row-one entry is , and the new row-three entry is .

After the rotation the stack reads, top to bottom, , then the untouched , then . The first column below the diagonal is clean. A second rotation, of rows two and three, would zero the and finish the job.

What this tells us: one new observation was absorbed by rotating one pair of rows at a time, never rebuilding the factor. Each rotation kept lengths fixed — the pair of length stayed length — so no accuracy was lost in the adding direction.

Check your understanding Beginner

Formal definition Intermediate+

Let with , , of full column rank, with reduced QR factorization , having orthonormal columns and upper triangular with positive diagonal 43.04.01. A modification of is a structured change producing , and the updating problem is to compute the factorization from in (or when is maintained) operations rather than recomputing it in . The four standard modifications are: a rank-one change ; a row append ; a row delete (the inverse of append); and the column analogues.

The least-squares problem couples these to the Cholesky factor of the normal-equations matrix. Writing , the QR factor is the Cholesky factor of up to the diagonal sign convention 43.03.02, so updating and updating the Cholesky factor of are the same computation viewed through two lenses.

Givens rotation. A Givens rotation is the identity except in the submatrix on rows and columns , where it equals with , . It is unitary, and given a vector with entries in positions , the choice , , maps , introducing a zero. A product of such rotations chases the spike introduced by a modification back to triangular form.

Cholesky rank-one update and downdate. Given the Cholesky factorization of a Hermitian positive-definite and a vector , the update computes the Cholesky factor of $$ \tilde{C} = C + w w^* = R^* R + w w^, $$ and the downdate (with ) computes the factor of $$ \tilde{C} = C - w w^ = R^* R - w w^, $$ which is positive-definite — and so possesses a Cholesky factor — precisely when $|R^{-} w|_2 < 1|R^{-*} w|_21$.

Hyperbolic rotation. A hyperbolic rotation is -orthogonal for the signature matrix : it satisfies rather than , and takes the form . Given with , it maps , the indefinite-norm analogue of the Givens zeroing. Its instability is structural: and are both unbounded, so as , and the transformation amplifies rounding error by exactly that factor.

Counterexamples to common slips

  • The downdate is not always possible. Removing a row from a least-squares problem yields a valid positive-definite only if the row genuinely belonged to a fit with rows to spare; if lies on the boundary of the row space, and is singular, while would make indefinite, with no Cholesky factor. The quantity is the gatekeeper.

  • Hyperbolic rotations are not orthogonal. They preserve the indefinite form , not the Euclidean norm. Treating one as a length-preserving turn and expecting it to be backward stable is the classic error; its norm is and grows without bound near the breakdown boundary.

  • Update and downdate are not symmetric in stability. Both cost , which tempts one to treat them alike, but the update appends to a sum of squares (monotone, well-conditioned) while the downdate subtracts from it (cancellation-prone). The same algorithm run backwards is not equally stable.

  • A QR update does not require forming $A^ A\hat{R}\kappa_2(A)\hat{R}A^* A\kappa_2(A)^2$ squaring the QR route was chosen to avoid.

Key theorem with proof Intermediate+

The signature result is that a row appended to a QR factorization is reabsorbed by exactly Givens rotations, each length-preserving, so the row update is both and backward stable.

Theorem (Givens-chasing row update of QR). Let have full column rank with reduced QR factorization , and let a row $w^ \in \mathbb{F}^{1 \times n}\tilde{A} = \binom{A}{w^}$. Stack the triangular factor over the new row, $$ M = \begin{pmatrix} \hat{R} \ w^* \end{pmatrix} \in \mathbb{F}^{(n+1) \times n}. $$ Then there is a product of Givens rotations , each acting on row paired with one of rows , such that with upper triangular, and is the QR factor of . The cost is flops, and because each is unitary the computed is backward stable: it is the exact factor of a row-perturbed with . [Golub, G. H. & Van Loan, C. F. — Matrix Computations (4th ed.)]

Proof. First, the appended problem has the same triangular factor as the stacked array. Because has orthonormal columns, , so . The QR factor of is the Cholesky factor of with positive diagonal 43.03.02, 43.04.01, and likewise the triangular factor of is the Cholesky factor of . Since , their positive-diagonal triangular factors coincide; computing the factor of the small array therefore yields .

Now triangularize by Givens rotations. Row holds ; the rows above are the triangular . Apply to rows and , chosen to zero the entry against the pivot : with , , , the entry becomes and becomes . This rotation alters row and row across all columns , leaving a new spike in row at column . Apply to rows and to zero against the updated , and continue. After the entries are zero and row retains a spike at column . After the entire bottom row vanishes and rows form the upper-triangular . Each touches the trailing columns of two rows, costing flops, and summing over gives .

For stability, each is unitary, so applying it in floating-point arithmetic is equivalent to applying the exact rotation to a matrix perturbed by relative to its norm, and unitary maps do not amplify perturbations: with . The perturbations accumulate additively to a single of order , which is the backward-stability claim.

Bridge. This theorem builds toward the recursive-least-squares and sliding-window algorithms of the Advanced section, and the same Givens-chasing pattern appears again in 43.07.03 (GMRES), whose inner Hessenberg least-squares solve is maintained by one Givens rotation per Arnoldi step for exactly this reason. The foundational reason the update is cheap is that the modification is rank one, so the QR factor changes by a single spike that plane rotations chase away; this is exactly the structural fact that a rank-one change to is a rank-one change to its Cholesky factor's defining equation, putting these together with 43.03.02's identification of the QR and Cholesky triangular factors. The update generalises to the rank- case (chase spikes) and is dual to the downdate, where the appended row is instead removed and the length-preserving Givens rotation is replaced by the indefinite-norm hyperbolic rotation whose conditioning the Master tier quantifies; the central insight is that adding data is a unitary operation on the augmented array while removing it is not, and the bridge from one to the other is the exchange of the Euclidean form for the signature form .

Exercises Intermediate+

Advanced results Master

Theorem (recursive least squares as a sequence of row updates). Let observations arrive sequentially, , , and let minimize . Maintaining the QR factor of the data matrix together with the transformed right-hand side , each new observation is absorbed by appending to and chasing the spike with Givens rotations, updating and in flops; the solution is recovered by one back substitution . The procedure is backward stable because every step is unitary, and it is mathematically equivalent to the information form of the recursive estimator while avoiding the explicit inversion of that the covariance form requires [Bjorck, A. — Numerical Methods for Least Squares Problems].

Theorem (the sliding-window problem and the necessity of downdating). A fixed-width window keeps only the most recent observations, so each step appends and removes . The removal is a Cholesky downdate with , well-posed iff , and realized by hyperbolic rotations whose conditioning is governed by . When the removed observation carried a large share of the window's information — a near-collinear or dominant row — this quantity is large and the downdate loses digits; the standard remedies are the mixed (orthogonal/hyperbolic) downdating scheme of Bojanczyk-Brent-Van Dooren-de Hoog, which reformulates the hyperbolic step as a sequence of better-conditioned orthogonal operations, and Saunders' corrected seminormal approach that refines the downdated factor by one step of iterative refinement [Stewart, G. W. — Matrix Algorithms, Volume I: Basic Decompositions].

Theorem (the Kalman filter as square-root factorization updating). The discrete Kalman filter propagates a state estimate and its error covariance through a measurement update (a Cholesky downdate-like contraction by the gained information) and a time update (an inflation by the process noise). Forming explicitly can lose positive-definiteness to rounding; the square-root filter instead propagates a Cholesky factor with , performing each step as a factorization update via orthogonal and hyperbolic transformations of an augmented array. The measurement update is exactly a row append to a QR factorization, and the information-filter variant is exactly the recursive-least-squares update above; the square-root form guarantees a positive-definite covariance at every step by construction, the decisive numerical advantage that made it the standard in aerospace navigation [Golub, G. H. & Van Loan, C. F. — Matrix Computations (4th ed.)].

Synthesis. The four modifications — rank-one change, row append, row delete, column add/delete — are one operation seen through the identity that the QR factor of is the Cholesky factor of 43.03.02, 43.04.01, and the foundational reason updating is cheap is that a rank-one change to is a rank-one change to , which displaces its triangular factor by a single spike that plane rotations chase away. The central insight is the asymmetry between adding and removing: appending data is a unitary operation on an augmented array — monotone in the sum of squares, unconditionally backward stable — while removing data subtracts from that sum and is dual to it through the exchange of the Euclidean form for the signature form , whose -orthogonal (hyperbolic) rotations are unbounded near the boundary where positive-definiteness is lost. This is exactly the boundary at which a sliding window or a Kalman square-root filter must be guarded, and putting these together, recursive least squares, the sliding-window estimator, the square-root Kalman filter, and the BFGS Hessian update are the same algebra of low-rank factorization modification, distinguished only by which terms are added and which subtracted. The update generalises the batch QR of 43.04.01 to the online setting, and the bridge from batch to streaming is that a factorization is a reusable object: the is paid once and amortized over an unbounded stream of modifications.

Full proof set Master

Proposition (Givens-chasing produces the updated triangular factor). Let be upper triangular with positive diagonal and . There is a product of Givens rotations, acting on rows and of the array, with $G\binom{\hat{R}}{w^} = \binom{\tilde{R}}{0}\tilde{R}^\tilde{R} = \hat{R}^\hat{R} + w w^$.

Proof. Construct the inductively. Before step the array has its first bottom-row entries zeroed and rows in upper-triangular form with a working spike at position (for the spike is the original ). Let the current diagonal entry be and the spike ; set , , , and let be the rotation on rows . It sends , zeroing , and acts on columns of those two rows, leaving a spike at . After the bottom row is zero and rows are upper triangular; call them . Each is unitary, so is unitary and , and the left side equals . The diagonal of is made positive by absorbing unit-modulus phases, which does not change .

Proposition (well-posedness and conditioning of the downdate). Let be Hermitian positive-definite and . The downdated matrix is positive-definite iff $|R^{-}w|2 < 1\lambda{\min}(\tilde{C}) \ge \lambda_{\min}(C),(1 - |R^{-}w|_2^2)|R^{-}w|_2 \to 1$.*

Proof. Set . Then , a congruence of by the nonsingular , hence is positive-definite iff is. The Hermitian matrix has eigenvalue on the -dimensional space and eigenvalue on ; positivity of all eigenvalues is the single condition . For the eigenvalue bound, Sylvester's law of inertia and the congruence give, for any unit , , using . Minimizing over gives . As the lower bound collapses to : the downdated matrix approaches singularity and the hyperbolic rotation that realizes the downdate has norm , the source of the instability.

Proposition (hyperbolic rotation realizes the scalar downdate with amplification ). For real , the -orthogonal with mapping to has , unbounded as .

Proof. Take , with ; then , so is a valid hyperbolic angle, and . The matrix has trace and determinant , so its eigenvalues are and and . As the denominator while the numerator stays near , so , and any rounding error present in the input is amplified by this factor.

Proposition (column delete via Givens retriangularization). Deleting column from yields with column removed, an upper-Hessenberg-with-a-bulge array that Givens rotations restore to upper triangular in flops, producing the QR factor of the reduced matrix.

Proof. Removing column of leaves an array whose columns are still upper triangular but whose columns (the former ) each carry one subdiagonal entry, because the removed column's diagonal pivot is gone: the array is upper triangular except for a single subdiagonal in each of the trailing columns (upper Hessenberg in the trailing block). Apply a Givens rotation to rows to zero the subdiagonal entry in column ; this introduces no new nonzero below the diagonal in earlier columns and shifts the bulge to column . Repeating for rows through zeroes each subdiagonal in turn, rotations total, each touching columns, for flops. The accumulated unitary satisfies with upper triangular, and , so is the QR factor of the column-deleted matrix.

Connections Master

  • The batch factorizations this unit modifies are exactly the QR and SVD machinery of 43.04.01 and the Cholesky factorization of 43.03.02: the identity that the QR factor of equals the positive-diagonal Cholesky factor of is what makes a row append to QR and a rank-one update to Cholesky the same computation, and the -versus- conditioning lesson of 43.04.01 is why the update is performed on directly rather than by re-Choleskying an updated cross-product.

  • The Givens-chasing pattern that retriangularizes an augmented array reappears as the engine of the iterative least-squares solvers: 43.07.03 (GMRES) maintains the QR factorization of its growing upper-Hessenberg matrix by one Givens rotation per Arnoldi step, the streaming analogue of the row update proved here, and 43.07.04 (conjugate gradient) shares the underlying SPD structure whose Cholesky factor the rank-one update preserves.

  • The downdating instability — governed by the ratio of singular values before and after the modification — is a manifestation of the same backward-error-versus-conditioning template as 43.03.02: where Cholesky on a fixed SPD matrix is unconditionally stable, the downdate's stability is conditional on , and the hyperbolic rotation that realizes it is the indefinite-signature analogue of the orthogonal Givens rotation, connecting this unit to the symmetric-indefinite factorization 43.03.08 whose pivots likewise abandon definiteness.

Historical & philosophical context Master

The systematic theory of modifying matrix factorizations was assembled in the 1974 Mathematics of Computation paper of Gill, Golub, Murray, and Saunders, Methods for modifying matrix factorizations, which catalogued the rank-one update and downdate of the Cholesky, QR, and LU factorizations and gave the Givens and hyperbolic-rotation realizations with their operation counts [Gill, P. E., Golub, G. H., Murray, W. & Saunders, M. A. — Methods for modifying matrix factorizations]. The hyperbolic rotation as the -orthogonal downdating tool, and the recognition that its instability is intrinsic rather than incidental, was sharpened over the following two decades; the mixed orthogonal/hyperbolic scheme of Bojanczyk, Brent, Van Dooren, and de Hoog (1987) and the relative-error analysis in Stewart's Matrix Algorithms (SIAM, 1998) gave the downdate its modern stable form [Stewart, G. W. — Matrix Algorithms, Volume I: Basic Decompositions].

The applications drove the theory as much as the reverse. The square-root formulation of the Kalman filter was developed for the Apollo guidance computer by James Potter in the mid-1960s, when the explicit covariance propagation lost positive-definiteness in single-precision aerospace hardware; propagating a Cholesky factor instead, through exactly the factorization updates of this unit, restored numerical reliability. In optimization, the quasi-Newton methods of Broyden, Fletcher, Goldfarb, and Shanno (independently, 1970) maintain an approximate Hessian by a two-rank update whose Cholesky factor is modified rather than recomputed each iteration, the canonical use of update-then-downdate with a curvature guard.

Bibliography Master

@book{golubvanloan2013upd,
  author    = {Golub, Gene H. and Van Loan, Charles F.},
  title     = {Matrix Computations},
  edition   = {4},
  publisher = {Johns Hopkins University Press},
  year      = {2013},
  address   = {Baltimore},
  note      = {Sec. 6.5: updating matrix factorizations -- rank-one QR and Cholesky update/downdate, row/column add/delete, Givens chasing, hyperbolic rotations}
}

@book{bjorck1996lsq,
  author    = {Bj\"{o}rck, {\AA}ke},
  title     = {Numerical Methods for Least Squares Problems},
  publisher = {Society for Industrial and Applied Mathematics},
  year      = {1996},
  address   = {Philadelphia},
  note      = {Ch. 3: modifying QR/Cholesky factorizations, recursive and sliding-window least squares}
}

@book{stewart1998matrixalg,
  author    = {Stewart, G. W.},
  title     = {Matrix Algorithms, Volume I: Basic Decompositions},
  publisher = {Society for Industrial and Applied Mathematics},
  year      = {1998},
  address   = {Philadelphia}
}

@article{ggms1974,
  author  = {Gill, Philip E. and Golub, Gene H. and Murray, Walter and Saunders, Michael A.},
  title   = {Methods for Modifying Matrix Factorizations},
  journal = {Mathematics of Computation},
  volume  = {28},
  number  = {126},
  pages   = {505--535},
  year    = {1974}
}

@article{bbvd1987downdate,
  author  = {Bojanczyk, Adam W. and Brent, Richard P. and Van Dooren, Paul and de Hoog, Frank R.},
  title   = {A Note on Downdating the Cholesky Factorization},
  journal = {SIAM Journal on Scientific and Statistical Computing},
  volume  = {8},
  number  = {3},
  pages   = {210--221},
  year    = {1987}
}

@article{potter1963kalman,
  author  = {Bellantoni, J. F. and Dodge, K. W.},
  title   = {A Square Root Formulation of the {Kalman}-{Schmidt} Filter},
  journal = {AIAA Journal},
  volume  = {5},
  number  = {7},
  pages   = {1309--1314},
  year    = {1967}
}

@article{fletcher1970bfgs,
  author  = {Fletcher, Roger},
  title   = {A New Approach to Variable Metric Algorithms},
  journal = {The Computer Journal},
  volume  = {13},
  number  = {3},
  pages   = {317--322},
  year    = {1970}
}