43.04.08 · numerical-analysis / 04-least-squares-qr

Updating and downdating matrix factorisations

shipped3 tiersLean: none

Anchor (Master): Golub-Van Loan 2013 *Matrix Computations* 4e (Johns Hopkins) §6.5; Bjorck 1996 *Numerical Methods for Least Squares Problems* (SIAM) Ch. 3; Stewart 1998 *Matrix Algorithms I: Basic Decompositions* (SIAM) Ch. 4-5 (orthogonal-transformation updating, hyperbolic rotations and the downdating instability)

Intuition Beginner

You have already paid for an answer, and now the question changes a little. You fit a line to a thousand measurements and got the coefficients. A new measurement arrives. You could throw away the fit and redo all thousand-and-one from scratch — or you could nudge the answer you already have. Nudging is far cheaper, and the whole subject of this unit is how to nudge a factorization safely.

A factorization is a pre-digested form of a matrix: the triangular factor from a QR or a Cholesky decomposition is the expensive part, the part that took most of the work. When the underlying data changes by a small amount — one new row of measurements, one row removed, one rank-one tweak — the triangular factor changes only a little. There is a way to walk the old factor to the new one using a chain of tiny rotations, each touching just two rows, and the total cost is proportional to the square of the matrix size rather than its cube. For a large problem that is the difference between a fraction of a second and a minute.

Adding information is the easy direction. You fold the new row in, and a sequence of small rotations cleans the factor back into triangular shape. Every rotation here is a plain turn that preserves lengths, so nothing blows up.

Removing information is the dangerous direction. Taking a row out of a fit means subtracting, and subtraction can cancel away almost everything you had, leaving a tiny remainder swamped by rounding error. The tool for removal is a different kind of rotation — a stretch-and-turn rather than a plain turn — and it can lose accuracy exactly when the row you remove was carrying most of the weight. So updating is steady and downdating is delicate, and knowing which one you are doing is half the battle.

Visual Beginner

The table follows the same fit through the two directions — folding a new observation in, then peeling an old one out — and tracks what each direction costs and how safe it is.

Operation	What changes	Cost	How safe
Refactor from scratch	recompute the whole triangular factor	highest (size cubed)	always safe, always slow
Update (add a row)	fold one new row in, rotate back to triangular	low (size squared)	very safe — rotations preserve length
Downdate (remove a row)	subtract one old row, stretch-rotate back	low (size squared)	delicate — subtraction can cancel
Sliding window	add the newest, remove the oldest	low (size squared)	as delicate as its downdate step

The last two columns carry the lesson. Updating and downdating both cost the same cheap amount, far below a full refactor, so the temptation is to treat them as equally routine. They are not. The plain rotations of an update never lengthen anything, so the old accuracy survives. The stretch-rotations of a downdate can magnify error without bound when the removed row was a large part of the fit, because you are subtracting two nearly equal quantities and keeping the small leftover.

The sliding-window picture at the bottom is the practical case: a fit that always uses the most recent thousand measurements, dropping the oldest as each new one arrives. It is an update glued to a downdate, and it inherits the delicacy of the downdate half.

Worked example Beginner

Suppose a fit has been reduced to a tiny upper-triangular factor and we add one new observation. We will watch a single rotation clean it back to triangular.

Start with the triangular factor and a new row stacked underneath it:

304450 .

The top two rows are already triangular (a zero sits in the lower-left). The new bottom row $(4, 0)$ spoils that, because it has a number in the first column where we want a zero. One rotation of the first and third rows will zero it out.

Step 1. Pick the rotation. We combine row one, which begins with $3$ , and row three, which begins with $4$ . The length of that pair is $3^{2} + 4^{2} = 9 + 16 = 25 = 5$ . The rotation that turns the pair $(3, 4)$ into $(5, 0)$ uses the fractions $c = 3/5 = 0.6$ and $s = 4/5 = 0.8$ .

Step 2. Apply it to the first column. The new first entry of row one is $c \times 3 + s \times 4 = 0.6 \times 3 + 0.8 \times 4 = 1.8 + 3.2 = 5$ . The new first entry of row three is $- s \times 3 + c \times 4 = - 0.8 \times 3 + 0.6 \times 4 = - 2.4 + 2.4 = 0$ . The first column of row three is now zero, as wanted.

Step 3. Apply it to the second column. Row one's second entry was $4$ and row three's was $0$ : the new row-one entry is $0.6 \times 4 + 0.8 \times 0 = 2.4$ , and the new row-three entry is $- 0.8 \times 4 + 0.6 \times 0 = - 3.2$ .

After the rotation the stack reads, top to bottom, $(5, 2.4)$ , then the untouched $(0, 5)$ , then $(0, - 3.2)$ . The first column below the diagonal is clean. A second rotation, of rows two and three, would zero the $- 3.2$ and finish the job.

What this tells us: one new observation was absorbed by rotating one pair of rows at a time, never rebuilding the factor. Each rotation kept lengths fixed — the pair of length $5$ stayed length $5$ — so no accuracy was lost in the adding direction.

Check your understanding Beginner

Exercise (easy, multiple choice).

Why is removing an observation (downdating) more dangerous than adding one (updating)?

A. Removing one is more expensive — it costs size cubed. B. Removing one subtracts, and subtraction of nearly equal amounts can cancel and magnify error. C. Removing one cannot be done at all. D. There is no difference; both are equally safe.

Hint

Think about what happens to accuracy when you subtract two numbers that are almost the same size.

Answer

B. Feedback-correct: downdating subtracts the removed row's contribution, and when that row carried most of the fit the subtraction cancels nearly everything, leaving a small remainder dominated by rounding error. Feedback-wrong: downdating costs the same cheap size-squared amount as updating (so A is wrong), and it is certainly possible (C is wrong); the two directions are not equally safe (D is wrong).

Formal definition Intermediate+

Let $A \in F^{m \times n}$ with $F \in {R, C}$ , $m \geq n$ , of full column rank, with reduced QR factorization $A = \hat{Q} \hat{R}$ , $\hat{Q} \in F^{m \times n}$ having orthonormal columns and $\hat{R} \in F^{n \times n}$ upper triangular with positive diagonal 43.04.01. A modification of $A$ is a structured change producing $\tilde{A}$ , and the updating problem is to compute the factorization $\tilde{A} = \tilde{\hat{Q}} \tilde{\hat{R}}$ from $\hat{Q}, \hat{R}$ in $O (n^{2})$ (or $O (mn)$ when $\hat{Q}$ is maintained) operations rather than recomputing it in $O (m n^{2})$ . The four standard modifications are: a rank-one change $\tilde{A} = A + u v^{*}$ ; a row append $\tilde{A} = (w ^{*} A)$ ; a row delete (the inverse of append); and the column analogues.

The least-squares problem couples these to the Cholesky factor of the normal-equations matrix. Writing $C = A^{*} A = \hat{R}^{*} \hat{R}$ , the QR factor $\hat{R}$ is the Cholesky factor of $C$ up to the diagonal sign convention 43.03.02, so updating $\hat{R}$ and updating the Cholesky factor of $C$ are the same computation viewed through two lenses.

Givens rotation. A Givens rotation $G (i, k, θ) \in F^{m \times m}$ is the identity except in the $2 \times 2$ submatrix on rows and columns $i, k$ , where it equals $(c - \overset{s}{ˉ} s c)$ with $c = cos θ \in R$ , $∣ c ∣^{2} + ∣ s ∣^{2} = 1$ . It is unitary, and given a vector with entries $a, b$ in positions $i, k$ , the choice $c = a / r$ , $s = b / r$ , $r = ∣ a ∣^{2} + ∣ b ∣^{2}$ maps $(a, b) \mapsto (r, 0)$ , introducing a zero. A product of such rotations chases the spike introduced by a modification back to triangular form.

Cholesky rank-one update and downdate. Given the Cholesky factorization $C = R^{*} R$ of a Hermitian positive-definite $C$ and a vector $w \in F^{n}$ , the update computes the Cholesky factor $\tilde{R}$ of $$ \tilde{C} = C + w w^* = R^* R + w w^, $$ and the downdate (with $σ = - 1$ ) computes the factor of $$ \tilde{C} = C - w w^ = R^* R - w w^, $$ which is positive-definite — and so possesses a Cholesky factor — precisely when $|R^{-} w|_2 < 1 $. T h e u p d a t e i sr e a l i z e d b y G i v e n sr o t a t i o n s an d i s u n co n d i t i o na l l y ba c k w a r d s t ab l e; t h e d o w n d a t e i sr e a l i z e d b y * h y p er b o l i cr o t a t i o n s * an d i ss t ab l eo n l y w h e n$ |R^{-*} w|_2 $i s b o u n d e d a w a y f r o m$ 1$.

Hyperbolic rotation. A hyperbolic rotation $H \in F^{2 \times 2}$ is $J$ -orthogonal for the signature matrix $J = diag (1, - 1)$ : it satisfies $H^{*} J H = J$ rather than $H^{*} H = I$ , and takes the form $(c o s h ϕ - s i n h ϕ - s i n h ϕ c o s h ϕ)$ . Given $(a, b)$ with $∣ a ∣ > ∣ b ∣$ , it maps $(a, b) \mapsto (∣ a ∣^{2} - ∣ b ∣^{2}, 0)$ , the indefinite-norm analogue of the Givens zeroing. Its instability is structural: $cosh ϕ$ and $sinh ϕ$ are both unbounded, so $∥ H ∥_{2} \to \infty$ as $∣ a ∣ \to ∣ b ∣$ , and the transformation amplifies rounding error by exactly that factor.

Counterexamples to common slips

The downdate is not always possible. Removing a row $w^{*}$ from a least-squares problem yields a valid positive-definite $\tilde{C} = C - w w^{*}$ only if the row genuinely belonged to a fit with rows to spare; if $w$ lies on the boundary of the row space, $∥ R^{-*} w ∥_{2} = 1$ and $\tilde{C}$ is singular, while $∥ R^{-*} w ∥_{2} > 1$ would make $\tilde{C}$ indefinite, with no Cholesky factor. The quantity $∥ R^{-*} w ∥_{2}$ is the gatekeeper.
Hyperbolic rotations are not orthogonal. They preserve the indefinite form $x^{*} J x = ∣ x_{1} ∣^{2} - ∣ x_{2} ∣^{2}$ , not the Euclidean norm. Treating one as a length-preserving turn and expecting it to be backward stable is the classic error; its norm is $cosh ϕ \geq 1$ and grows without bound near the breakdown boundary.
Update and downdate are not symmetric in stability. Both cost $O (n^{2})$ , which tempts one to treat them alike, but the update appends to a sum of squares (monotone, well-conditioned) while the downdate subtracts from it (cancellation-prone). The same algorithm run backwards is not equally stable.
A QR update does not require forming $A^ A $. * U p d a t in g$ \hat{R} $d i r ec tl y b y G i v e n sr o t a t i o n so n t h e a ug m e n t e d t r ian g u l a r a r r a y a v o i d s t h e n or ma l - e q u a t i o n s ma t r i x e n t i r e l y, p r eser v in g t h e$ \kappa_2(A) $co n d i t i o nin g o f [43.04.01]; r eco n s t r u c t in g$ \hat{R} $b y r e - C h o l es k y in g an u p d a t e d$ A^* A $w o u l d r e in t r o d u ce t h e$ \kappa_2(A)^2$ squaring the QR route was chosen to avoid.

Key theorem with proof Intermediate+

The signature result is that a row appended to a QR factorization is reabsorbed by exactly $n$ Givens rotations, each length-preserving, so the row update is both $O (n^{2})$ and backward stable.

Theorem (Givens-chasing row update of QR). Let $A \in F^{m \times n}$ have full column rank with reduced QR factorization $A = \hat{Q} \hat{R}$ , and let a row $w^ \in \mathbb{F}^{1 \times n} $b e a pp e n d e d,$ \tilde{A} = \binom{A}{w^}$. Stack the triangular factor over the new row, $$ M = \begin{pmatrix} \hat{R} \ w^* \end{pmatrix} \in \mathbb{F}^{(n+1) \times n}. $$ Then there is a product of $n$ Givens rotations $G = G_{n} \dots G_{1}$ , each acting on row $n + 1$ paired with one of rows $1, \dots, n$ , such that $GM = (0 R ~)$ with $\tilde{R}$ upper triangular, and $\tilde{R}$ is the QR factor of $\tilde{A}$ . The cost is $\frac{3}{2} n^{2} + O (n)$ flops, and because each $G_{k}$ is unitary the computed $\tilde{R}$ is backward stable: it is the exact factor of a row-perturbed $\tilde{A} + δ \tilde{A}$ with $∥ δ \tilde{A} ∥_{2} = O (ϵ_{mach} ∥ \tilde{A} ∥_{2})$ . ^{[Golub, G. H. & Van Loan, C. F. — Matrix Computations (4th ed.)]}

Proof. First, the appended problem has the same triangular factor as the stacked array. Because $\hat{Q}$ has orthonormal columns, $A^{*} A = \hat{R}^{*} \hat{R}$ , so $\tilde{A}^{*} \tilde{A} = A^{*} A + w w^{*} = \hat{R}^{*} \hat{R} + w w^{*} = M^{*} M$ . The QR factor of $\tilde{A}$ is the Cholesky factor of $\tilde{A}^{*} \tilde{A}$ with positive diagonal 43.03.02, 43.04.01, and likewise the triangular factor of $M$ is the Cholesky factor of $M^{*} M$ . Since $\tilde{A}^{*} \tilde{A} = M^{*} M$ , their positive-diagonal triangular factors coincide; computing the factor of the small array $M$ therefore yields $\tilde{R}$ .

Now triangularize $M$ by Givens rotations. Row $n + 1$ holds $w^{*} = (w_{1}, \dots, w_{n})$ ; the rows above are the triangular $\hat{R}$ . Apply $G_{1}$ to rows $1$ and $n + 1$ , chosen to zero the $(n + 1, 1)$ entry $w_{1}$ against the pivot $\hat{R}_{11}$ : with $r = ∣ \hat{R}_{11} ∣^{2} + ∣ w_{1} ∣^{2}$ , $c = \hat{R}_{11} / r$ , $s = w_{1} / r$ , the entry $(n + 1, 1)$ becomes $0$ and $(1, 1)$ becomes $r$ . This rotation alters row $1$ and row $n + 1$ across all columns $\geq 1$ , leaving a new spike in row $n + 1$ at column $2$ . Apply $G_{2}$ to rows $2$ and $n + 1$ to zero $(n + 1, 2)$ against the updated $\hat{R}_{22}$ , and continue. After $G_{k}$ the entries $(n + 1, 1), \dots, (n + 1, k)$ are zero and row $n + 1$ retains a spike at column $k + 1$ . After $G_{n}$ the entire bottom row vanishes and rows $1, \dots, n$ form the upper-triangular $\tilde{R}$ . Each $G_{k}$ touches the trailing $n - k + 1$ columns of two rows, costing $6 (n - k + 1)$ flops, and summing over $k$ gives $\frac{3}{2} n^{2} + O (n)$ .

For stability, each $G_{k}$ is unitary, so applying it in floating-point arithmetic is equivalent to applying the exact rotation to a matrix perturbed by $O (ϵ_{mach})$ relative to its norm, and unitary maps do not amplify perturbations: $∥ G_{k} v + δ ∥_{2} = ∥ v + G_{k}^{*} δ ∥_{2}$ with $∥ G_{k}^{*} δ ∥_{2} = ∥ δ ∥_{2}$ . The $n$ perturbations accumulate additively to a single $δ \tilde{A}$ of order $n ϵ_{mach} ∥ \tilde{A} ∥_{2}$ , which is the backward-stability claim. $□$

Bridge. This theorem builds toward the recursive-least-squares and sliding-window algorithms of the Advanced section, and the same Givens-chasing pattern appears again in 43.07.03 (GMRES), whose inner Hessenberg least-squares solve is maintained by one Givens rotation per Arnoldi step for exactly this reason. The foundational reason the update is cheap is that the modification is rank one, so the QR factor changes by a single spike that $n$ plane rotations chase away; this is exactly the structural fact that a rank-one change to $A^{*} A$ is a rank-one change to its Cholesky factor's defining equation, putting these together with 43.03.02's identification of the QR and Cholesky triangular factors. The update generalises to the rank- $p$ case (chase $p$ spikes) and is dual to the downdate, where the appended row is instead removed and the length-preserving Givens rotation is replaced by the indefinite-norm hyperbolic rotation whose conditioning the Master tier quantifies; the central insight is that adding data is a unitary operation on the augmented array while removing it is not, and the bridge from one to the other is the exchange of the Euclidean form for the signature form $J$ .

Exercises Intermediate+

Exercise 3 (medium, numeric).

A hyperbolic rotation must downdate the pair $(a, b) = (5, 4)$ , mapping it to $(a^{2} - b^{2}, 0)$ . Compute the surviving entry and the amplification $cosh ϕ = a / a^{2} - b^{2}$ . Then repeat for $(a, b) = (5, 4.9)$ and comment on the change.

Hint

The surviving entry is $a^{2} - b^{2}$ ; the amplification is $a$ divided by it.

Answer

For $(5, 4)$ : $25 - 16 = 9 = 3$ , and $cosh ϕ = 5/3 \approx 1.67$ , a mild amplification. For $(5, 4.9)$ : $25 - 24.01 = 0.99 \approx 0.995$ , and $cosh ϕ = 5/0.995 \approx 5.03$ , a threefold-larger amplification. As $b \to a$ the surviving entry collapses toward zero and $cosh ϕ \to \infty$ : this is the downdating instability, the boundary $∥ R^{-*} w ∥_{2} \to 1$ in scalar form. Rubric: full credit for $3$ and $5/3$ , the second pair's $\approx 0.995$ and $\approx 5.03$ , and the observation that the amplification grows as $b \to a$ .

Exercise 4 (medium, symbolic).

Show that appending a row $w^{*}$ to $A$ changes the normal-equations matrix by a rank-one term, $\tilde{A}^{*} \tilde{A} = A^{*} A + w w^{*}$ , and conclude that the updated QR factor equals the Cholesky factor of $A^{*} A + w w^{*}$ .

Hint

Write $\tilde{A} = (w ^{*} A)$ and multiply out $\tilde{A}^{*} \tilde{A}$ in block form.

Answer

With $\tilde{A} = (w ^{*} A)$ , block multiplication gives $\tilde{A}^{*} \tilde{A} = (A^{*} ∣ w) (w ^{*} A) = A^{*} A + w w^{*}$ , a rank-one modification of $A^{*} A$ . The reduced QR factor $\tilde{R}$ of $\tilde{A}$ satisfies $\tilde{A}^{*} \tilde{A} = \tilde{R}^{*} \tilde{R}$ with $\tilde{R}$ upper triangular and positive diagonal, which is precisely the defining property of the Cholesky factor of the positive-definite matrix $A^{*} A + w w^{*}$ 43.03.02. By uniqueness of the positive-diagonal Cholesky factor, the updated QR factor and the Cholesky factor of $A^{*} A + w w^{*}$ coincide. Rubric: full credit for the rank-one identity and the uniqueness argument.

Exercise 7 (hard, symbolic).

Let $H = (c o s h ϕ - s i n h ϕ - s i n h ϕ c o s h ϕ)$ with $J = diag (1, - 1)$ . Verify $H^{*} J H = J$ , compute $∥ H ∥_{2}$ , and show that the worst-case rounding amplification of a hyperbolic downdate of $(a, b)$ with $a > b > 0$ is governed by $a / a^{2} - b^{2}$ .

Hint

Choose $cosh ϕ = a / a^{2} - b^{2}$ , $sinh ϕ = b / a^{2} - b^{2}$ so that $H (a, b)^{⊤} = (a^{2} - b^{2}, 0)^{⊤}$ , and read $∥ H ∥_{2}$ off the singular values.

Answer

Compute $H^{*} J H$ : the $(1, 1)$ entry is $cosh^{2} ϕ - sinh^{2} ϕ = 1$ , the $(2, 2)$ entry is $sinh^{2} ϕ - cosh^{2} ϕ = - 1$ , and the off-diagonal is $- cosh ϕ sinh ϕ + sinh ϕ cosh ϕ = 0$ , so $H^{*} J H = diag (1, - 1) = J$ , confirming $J$ -orthogonality. The eigenvalues of $H^{*} H = (c o s h^{2} + s i n h^{2} - 2 c o s h s i n h - 2 c o s h s i n h c o s h^{2} + s i n h^{2})$ are $(cosh ϕ \pm sinh ϕ)^{2} = e^{\pm 2 ϕ}$ , so the singular values are $e^{ϕ}$ and $e^{- ϕ}$ and $∥ H ∥_{2} = e^{ϕ} = cosh ϕ + sinh ϕ$ . With $cosh ϕ = a / a^{2} - b^{2}$ and $sinh ϕ = b / a^{2} - b^{2}$ , $∥ H ∥_{2} = (a + b) / a^{2} - b^{2} = (a + b) / (a - b)$ , which grows without bound as $b \to a$ ; the amplification of any rounding error already present is $∥ H ∥_{2}$ , of the order $a / a^{2} - b^{2}$ . As $b \to a$ (the removed row carrying nearly all the weight), the downdate amplifies error catastrophically. Rubric: full credit for $H^{*} J H = J$ , the singular values $e^{\pm ϕ}$ , and the $b \to a$ blow-up.

Exercise 8 (hard, symbolic).

The BFGS quasi-Newton method maintains an approximate Hessian $B_{k}$ by the two-rank update $B_{k + 1} = B_{k} - \frac{B _{k} s s ^{*} B _{k}}{s ^{*} B _{k} s} + \frac{y y ^{*}}{y ^{*} s}$ , where $s$ is the step and $y$ the gradient change. Show that if $B_{k}$ is positive-definite and the curvature condition $y^{*} s > 0$ holds, then $B_{k + 1}$ is positive-definite, so its Cholesky factor can be updated rather than recomputed.

Hint

For nonzero $z$ , evaluate $z^{*} B_{k + 1} z$ and use the Cauchy-Schwarz inequality in the $B_{k}$ -inner product for the middle term.

Answer

Let $z \neq = 0$ . Then $z^{*} B_{k + 1} z = z^{*} B_{k} z - \frac{( z ^{*} B _{k} s ) ( s ^{*} B _{k} z )}{s ^{*} B _{k} s} + \frac{( z ^{*} y ) ( y ^{*} z )}{y ^{*} s} = (z^{*} B_{k} z - \frac{∣ s ^{*} B _{k} z ∣ ^{2}}{s ^{*} B _{k} s}) + \frac{∣ y ^{*} z ∣ ^{2}}{y ^{*} s}$ . The bracketed term is non-negative: in the inner product $⟨ u, v ⟩_{B_{k}} = u^{*} B_{k} v$ (positive-definite since $B_{k}$ is), it is $∥ z ∥_{B_{k}}^{2} - ∣ ⟨ s, z ⟩_{B_{k}} ∣^{2} /∥ s ∥_{B_{k}}^{2} \geq 0$ by Cauchy-Schwarz, with equality iff $z ∥ s$ . The second term is $\geq 0$ because $y^{*} s > 0$ . If $z \neq ∥ s$ the first term is strictly positive; if $z ∥ s$ , say $z = α s$ , the first term vanishes but the second is $∣ y^{*} (α s) ∣^{2} / (y^{*} s) = α^{2} (y^{*} s) > 0$ . Either way $z^{*} B_{k + 1} z > 0$ , so $B_{k + 1}$ is positive-definite. Its Cholesky factor is therefore obtained by a downdate (the $- s s^{*}$ term, guarded by the curvature condition) followed by an update (the $+ y y^{*}$ term), each $O (n^{2})$ . Rubric: full credit for the Cauchy-Schwarz bound on the middle term, the $y^{*} s > 0$ use, and the $z ∥ s$ case.

Advanced results Master

Theorem (recursive least squares as a sequence of row updates). Let observations $(a_{k}^{*}, b_{k})$ arrive sequentially, $a_{k} \in F^{n}$ , $b_{k} \in F$ , and let $x_{k}$ minimize $\sum_{j = 1}^{k} ∣ a_{j}^{*} x - b_{j} ∣^{2}$ . Maintaining the QR factor $\hat{R}_{k}$ of the data matrix $A_{k} = (a_{1}, \dots, a_{k})^{*}$ together with the transformed right-hand side $z_{k} = \hat{Q}_{k}^{*} b_{1 : k}$ , each new observation is absorbed by appending $(a_{k + 1}^{*}, b_{k + 1})$ to $(a _{k + 1}^{*} R ^ _{k})$ and chasing the spike with $n$ Givens rotations, updating $\hat{R}_{k} \to \hat{R}_{k + 1}$ and $z_{k} \to z_{k + 1}$ in $O (n^{2})$ flops; the solution is recovered by one back substitution $\hat{R}_{k + 1} x_{k + 1} = z_{k + 1}$ . The procedure is backward stable because every step is unitary, and it is mathematically equivalent to the information form of the recursive estimator while avoiding the explicit inversion of $A_{k}^{*} A_{k}$ that the covariance form requires ^{[Bjorck, A. — Numerical Methods for Least Squares Problems]}.

Theorem (the sliding-window problem and the necessity of downdating). A fixed-width window keeps only the most recent $ℓ$ observations, so each step appends $(a_{k + 1}^{*}, b_{k + 1})$ and removes $(a_{k - ℓ + 1}^{*}, b_{k - ℓ + 1})$ . The removal is a Cholesky downdate $\hat{R}^{*} \hat{R} - w w^{*}$ with $w = a_{k - ℓ + 1}$ , well-posed iff $∥ \hat{R}^{-*} w ∥_{2} < 1$ , and realized by hyperbolic rotations whose conditioning is governed by $1/ 1 - ∥ \hat{R}^{-*} w ∥_{2}^{2}$ . When the removed observation carried a large share of the window's information — a near-collinear or dominant row — this quantity is large and the downdate loses digits; the standard remedies are the mixed (orthogonal/hyperbolic) downdating scheme of Bojanczyk-Brent-Van Dooren-de Hoog, which reformulates the hyperbolic step as a sequence of better-conditioned orthogonal operations, and Saunders' corrected seminormal approach that refines the downdated factor by one step of iterative refinement ^{[Stewart, G. W. — Matrix Algorithms, Volume I: Basic Decompositions]}.

Theorem (the Kalman filter as square-root factorization updating). The discrete Kalman filter propagates a state estimate and its error covariance $P_{k}$ through a measurement update (a Cholesky downdate-like contraction by the gained information) and a time update (an inflation by the process noise). Forming $P_{k}$ explicitly can lose positive-definiteness to rounding; the square-root filter instead propagates a Cholesky factor $S_{k}$ with $P_{k} = S_{k} S_{k}^{*}$ , performing each step as a factorization update via orthogonal and hyperbolic transformations of an augmented array. The measurement update is exactly a row append to a QR factorization, and the information-filter variant is exactly the recursive-least-squares update above; the square-root form guarantees a positive-definite covariance at every step by construction, the decisive numerical advantage that made it the standard in aerospace navigation ^{[Golub, G. H. & Van Loan, C. F. — Matrix Computations (4th ed.)]}.

Synthesis. The four modifications — rank-one change, row append, row delete, column add/delete — are one operation seen through the identity that the QR factor of $A$ is the Cholesky factor of $A^{*} A$ 43.03.02, 43.04.01, and the foundational reason updating is cheap is that a rank-one change to $A$ is a rank-one change to $A^{*} A$ , which displaces its triangular factor by a single spike that $O (n)$ plane rotations chase away. The central insight is the asymmetry between adding and removing: appending data is a unitary operation on an augmented array — monotone in the sum of squares, unconditionally backward stable — while removing data subtracts from that sum and is dual to it through the exchange of the Euclidean form for the signature form $J = diag (I, - 1)$ , whose $J$ -orthogonal (hyperbolic) rotations are unbounded near the boundary $∥ R^{-*} w ∥_{2} = 1$ where positive-definiteness is lost. This is exactly the boundary at which a sliding window or a Kalman square-root filter must be guarded, and putting these together, recursive least squares, the sliding-window estimator, the square-root Kalman filter, and the BFGS Hessian update are the same algebra of low-rank factorization modification, distinguished only by which terms are added and which subtracted. The update generalises the batch QR of 43.04.01 to the online setting, and the bridge from batch to streaming is that a factorization is a reusable object: the $O (n^{3})$ is paid once and amortized over an unbounded stream of $O (n^{2})$ modifications.

Full proof set Master

Proposition (Givens-chasing produces the updated triangular factor). Let $\hat{R} \in F^{n \times n}$ be upper triangular with positive diagonal and $w \in F^{n}$ . There is a product $G = G_{n} \dots G_{1}$ of Givens rotations, $G_{k}$ acting on rows $k$ and $n + 1$ of the $(n + 1) \times n$ array, with $G\binom{\hat{R}}{w^} = \binom{\tilde{R}}{0} $an d$ \tilde{R}^\tilde{R} = \hat{R}^\hat{R} + w w^$.

Proof. Construct the $G_{k}$ inductively. Before step $k$ the array has its first $k - 1$ bottom-row entries zeroed and rows $1, \dots, n$ in upper-triangular form with a working spike at position $(n + 1, k)$ (for $k = 1$ the spike is the original $w_{1}$ ). Let the current diagonal entry be $ρ = \hat{R}_{k k}^{(k - 1)}$ and the spike $ω = w_{k}^{(k - 1)}$ ; set $r = ∣ ρ ∣^{2} + ∣ ω ∣^{2}$ , $c = ρ / r$ , $s = ω / r$ , and let $G_{k}$ be the rotation $(c - s \overset{s}{ˉ} c)$ on rows $k, n + 1$ . It sends $(ρ, ω) \mapsto (r, 0)$ , zeroing $(n + 1, k)$ , and acts on columns $k + 1, \dots, n$ of those two rows, leaving a spike at $(n + 1, k + 1)$ . After $G_{n}$ the bottom row is zero and rows $1, \dots, n$ are upper triangular; call them $\tilde{R}$ . Each $G_{k}$ is unitary, so $G$ is unitary and $(0 R ~)^{*} (0 R ~) = (w ^{*} R ^)^{*} G^{*} G (w ^{*} R ^) = (w ^{*} R ^)^{*} (w ^{*} R ^) = \hat{R}^{*} \hat{R} + w w^{*}$ , and the left side equals $\tilde{R}^{*} \tilde{R}$ . The diagonal of $\tilde{R}$ is made positive by absorbing unit-modulus phases, which does not change $\tilde{R}^{*} \tilde{R}$ . $□$

Proposition (well-posedness and conditioning of the downdate). Let $C = R^{*} R$ be Hermitian positive-definite and $w \in F^{n}$ . The downdated matrix $\tilde{C} = C - w w^{*}$ is positive-definite iff $|R^{-}w|2 < 1 $, in w hi c h c a se i t ss ma l l es t e i g e n v a l u es a t i s f i es$ \lambda{\min}(\tilde{C}) \ge \lambda_{\min}(C),(1 - |R^{-}w|_2^2) $, so t h eco n d i t i o nin g o f t h e d o w n d a t e d e g r a d es a s$ |R^{-}w|_2 \to 1$.*

Proof. Set $p = R^{-*} w$ . Then $\tilde{C} = R^{*} R - w w^{*} = R^{*} (I - p p^{*}) R$ , a congruence of $I - p p^{*}$ by the nonsingular $R$ , hence $\tilde{C}$ is positive-definite iff $I - p p^{*}$ is. The Hermitian matrix $I - p p^{*}$ has eigenvalue $1$ on the $(n - 1)$ -dimensional space $p^{⊥}$ and eigenvalue $1 - ∥ p ∥_{2}^{2}$ on $span (p)$ ; positivity of all eigenvalues is the single condition $∥ p ∥_{2} < 1$ . For the eigenvalue bound, Sylvester's law of inertia and the congruence give, for any unit $z$ , $z^{*} \tilde{C} z = (R z)^{*} (I - p p^{*}) (R z) \geq λ_{m i n} (I - p p^{*}) ∥ R z ∥_{2}^{2} = (1 - ∥ p ∥_{2}^{2}) ∥ R z ∥_{2}^{2} \geq (1 - ∥ p ∥_{2}^{2}) λ_{m i n} (C)$ , using $∥ R z ∥_{2}^{2} = z^{*} C z \geq λ_{m i n} (C)$ . Minimizing over $z$ gives $λ_{m i n} (\tilde{C}) \geq λ_{m i n} (C) (1 - ∥ p ∥_{2}^{2})$ . As $∥ p ∥_{2} \to 1$ the lower bound collapses to $0$ : the downdated matrix approaches singularity and the hyperbolic rotation that realizes the downdate has norm $\to \infty$ , the source of the instability. $□$

Proposition (hyperbolic rotation realizes the scalar downdate with amplification $∥ H ∥_{2} = (a + b) / (a - b)$ ). For real $a > b > 0$ , the $J$ -orthogonal $H$ with $J = diag (1, - 1)$ mapping $(a, b)^{⊤}$ to $(a^{2} - b^{2}, 0)^{⊤}$ has $∥ H ∥_{2} = (a + b) / (a - b)$ , unbounded as $b \to a$ .

Proof. Take $cosh ϕ = a / d$ , $sinh ϕ = b / d$ with $d = a^{2} - b^{2}$ ; then $cosh^{2} ϕ - sinh^{2} ϕ = (a^{2} - b^{2}) / d^{2} = 1$ , so $ϕ$ is a valid hyperbolic angle, and $H (a, b)^{⊤} = (cosh ϕ a - sinh ϕ b, - sinh ϕ a + cosh ϕ b)^{⊤} = ((a^{2} - b^{2}) / d, 0)^{⊤} = (d, 0)^{⊤}$ . The matrix $H^{*} H$ has trace $2 (cosh^{2} ϕ + sinh^{2} ϕ)$ and determinant $(cosh^{2} ϕ - sinh^{2} ϕ)^{2} = 1$ , so its eigenvalues are $e^{2 ϕ}$ and $e^{- 2 ϕ}$ and $∥ H ∥_{2} = e^{ϕ} = cosh ϕ + sinh ϕ = (a + b) / d = (a + b) / a^{2} - b^{2} = (a + b) / (a - b)$ . As $b \to a^{-}$ the denominator $a - b \to 0^{+}$ while the numerator stays near $2 a$ , so $∥ H ∥_{2} \to \infty$ , and any rounding error present in the input is amplified by this factor. $□$

Proposition (column delete via Givens retriangularization). Deleting column $j$ from $A = \hat{Q} \hat{R}$ yields $\hat{R}$ with column $j$ removed, an upper-Hessenberg-with-a-bulge array that $n - j$ Givens rotations restore to upper triangular in $O ((n - j)^{2})$ flops, producing the QR factor of the reduced matrix.

Proof. Removing column $j$ of $\hat{R}$ leaves an $n \times (n - 1)$ array whose columns $1, \dots, j - 1$ are still upper triangular but whose columns $j, \dots, n - 1$ (the former $j + 1, \dots, n$ ) each carry one subdiagonal entry, because the removed column's diagonal pivot is gone: the array is upper triangular except for a single subdiagonal in each of the trailing columns (upper Hessenberg in the trailing block). Apply a Givens rotation to rows $j, j + 1$ to zero the subdiagonal entry in column $j$ ; this introduces no new nonzero below the diagonal in earlier columns and shifts the bulge to column $j + 1$ . Repeating for rows $j + 1, j + 2$ through $n - 1, n$ zeroes each subdiagonal in turn, $n - j$ rotations total, each touching $O (n - j)$ columns, for $O ((n - j)^{2})$ flops. The accumulated unitary $G$ satisfies $G \hat{R}_{∖ j} = (0 R ~)$ with $\tilde{R}$ upper triangular, and $\tilde{R}^{*} \tilde{R} = \hat{R}_{∖ j}^{*} \hat{R}_{∖ j} = A_{∖ j}^{*} A_{∖ j}$ , so $\tilde{R}$ is the QR factor of the column-deleted matrix. $□$

Connections Master

The batch factorizations this unit modifies are exactly the QR and SVD machinery of 43.04.01 and the Cholesky factorization of 43.03.02: the identity that the QR factor of $A$ equals the positive-diagonal Cholesky factor of $A^{*} A$ is what makes a row append to QR and a rank-one update to Cholesky the same computation, and the $κ_{2} (A)$ -versus- $κ_{2} (A)^{2}$ conditioning lesson of 43.04.01 is why the update is performed on $\hat{R}$ directly rather than by re-Choleskying an updated cross-product.
The Givens-chasing pattern that retriangularizes an augmented array reappears as the engine of the iterative least-squares solvers: 43.07.03 (GMRES) maintains the QR factorization of its growing upper-Hessenberg matrix by one Givens rotation per Arnoldi step, the streaming analogue of the row update proved here, and 43.07.04 (conjugate gradient) shares the underlying SPD structure whose Cholesky factor the rank-one update preserves.
The downdating instability — governed by the ratio of singular values before and after the modification — is a manifestation of the same backward-error-versus-conditioning template as 43.03.02: where Cholesky on a fixed SPD matrix is unconditionally stable, the downdate's stability is conditional on $∥ R^{-*} w ∥_{2} < 1$ , and the hyperbolic rotation that realizes it is the indefinite-signature analogue of the orthogonal Givens rotation, connecting this unit to the symmetric-indefinite factorization 43.03.08 whose pivots likewise abandon definiteness.

Historical & philosophical context Master

The systematic theory of modifying matrix factorizations was assembled in the 1974 Mathematics of Computation paper of Gill, Golub, Murray, and Saunders, Methods for modifying matrix factorizations, which catalogued the rank-one update and downdate of the Cholesky, QR, and LU factorizations and gave the Givens and hyperbolic-rotation realizations with their operation counts ^{[Gill, P. E., Golub, G. H., Murray, W. & Saunders, M. A. — Methods for modifying matrix factorizations]}. The hyperbolic rotation as the $J$ -orthogonal downdating tool, and the recognition that its instability is intrinsic rather than incidental, was sharpened over the following two decades; the mixed orthogonal/hyperbolic scheme of Bojanczyk, Brent, Van Dooren, and de Hoog (1987) and the relative-error analysis in Stewart's Matrix Algorithms (SIAM, 1998) gave the downdate its modern stable form ^{[Stewart, G. W. — Matrix Algorithms, Volume I: Basic Decompositions]}.

The applications drove the theory as much as the reverse. The square-root formulation of the Kalman filter was developed for the Apollo guidance computer by James Potter in the mid-1960s, when the explicit covariance propagation lost positive-definiteness in single-precision aerospace hardware; propagating a Cholesky factor instead, through exactly the factorization updates of this unit, restored numerical reliability. In optimization, the quasi-Newton methods of Broyden, Fletcher, Goldfarb, and Shanno (independently, 1970) maintain an approximate Hessian by a two-rank update whose Cholesky factor is modified rather than recomputed each iteration, the canonical use of update-then-downdate with a curvature guard.

Bibliography Master

@book{golubvanloan2013upd,
  author    = {Golub, Gene H. and Van Loan, Charles F.},
  title     = {Matrix Computations},
  edition   = {4},
  publisher = {Johns Hopkins University Press},
  year      = {2013},
  address   = {Baltimore},
  note      = {Sec. 6.5: updating matrix factorizations -- rank-one QR and Cholesky update/downdate, row/column add/delete, Givens chasing, hyperbolic rotations}
}

@book{bjorck1996lsq,
  author    = {Bj\"{o}rck, {\AA}ke},
  title     = {Numerical Methods for Least Squares Problems},
  publisher = {Society for Industrial and Applied Mathematics},
  year      = {1996},
  address   = {Philadelphia},
  note      = {Ch. 3: modifying QR/Cholesky factorizations, recursive and sliding-window least squares}
}

@book{stewart1998matrixalg,
  author    = {Stewart, G. W.},
  title     = {Matrix Algorithms, Volume I: Basic Decompositions},
  publisher = {Society for Industrial and Applied Mathematics},
  year      = {1998},
  address   = {Philadelphia}
}

@article{ggms1974,
  author  = {Gill, Philip E. and Golub, Gene H. and Murray, Walter and Saunders, Michael A.},
  title   = {Methods for Modifying Matrix Factorizations},
  journal = {Mathematics of Computation},
  volume  = {28},
  number  = {126},
  pages   = {505--535},
  year    = {1974}
}

@article{bbvd1987downdate,
  author  = {Bojanczyk, Adam W. and Brent, Richard P. and Van Dooren, Paul and de Hoog, Frank R.},
  title   = {A Note on Downdating the Cholesky Factorization},
  journal = {SIAM Journal on Scientific and Statistical Computing},
  volume  = {8},
  number  = {3},
  pages   = {210--221},
  year    = {1987}
}

@article{potter1963kalman,
  author  = {Bellantoni, J. F. and Dodge, K. W.},
  title   = {A Square Root Formulation of the {Kalman}-{Schmidt} Filter},
  journal = {AIAA Journal},
  volume  = {5},
  number  = {7},
  pages   = {1309--1314},
  year    = {1967}
}

@article{fletcher1970bfgs,
  author  = {Fletcher, Roger},
  title   = {A New Approach to Variable Metric Algorithms},
  journal = {The Computer Journal},
  volume  = {13},
  number  = {3},
  pages   = {317--322},
  year    = {1970}
}

Prerequisites

43.04.01
43.03.02

Tier anchors

beginner: Golub-Van Loan 2013 *Matrix Computations* 4e (Johns Hopkins) §6.5 (updating matrix factorizations, informal); Strang 2016 *Introduction to Linear Algebra* 5e (Wellesley-Cambridge) Ch. 4 (least squares and projections, the recursive-fit picture)
intermediate: Golub-Van Loan 2013 *Matrix Computations* 4e (Johns Hopkins) §6.5.1-6.5.4 (rank-one QR and Cholesky update/downdate, appending/deleting rows, the Givens-chasing algorithm); Bjorck 1996 *Numerical Methods for Least Squares Problems* (SIAM) §3.2-3.3 (recursive and sliding-window least squares)
master: Golub-Van Loan 2013 *Matrix Computations* 4e (Johns Hopkins) §6.5; Bjorck 1996 *Numerical Methods for Least Squares Problems* (SIAM) Ch. 3; Stewart 1998 *Matrix Algorithms I: Basic Decompositions* (SIAM) Ch. 4-5 (orthogonal-transformation updating, hyperbolic rotations and the downdating instability)

References

Golub, G. H. & Van Loan, C. F. — Matrix Computations (4th ed.) · Johns Hopkins University Press, 2013. §6.5: updating matrix factorizations. §6.5.1 (rank-one changes and the Givens-rotation update of a QR factorization), §6.5.2 (appending and deleting a row, the row hyperbolic-downdate of R), §6.5.3 (appending and deleting a column), §6.5.4 (the Cholesky rank-one update via Givens rotations and the rank-one downdate via hyperbolic rotations), with operation counts O(n^2) per modification and the stability discussion of why downdating loses positive-definiteness when the data is barely consistent.
Bjorck, A. — Numerical Methods for Least Squares Problems · SIAM, 1996. Ch. 3: modifying the QR and Cholesky factorizations. §3.2 (recursive least squares, adding observations one at a time by a sequence of Givens rotations), §3.3 (the sliding-window problem: simultaneous up- and downdate; the numerical delicacy of removing the oldest observation), and the connection to the information form of the Kalman filter and the square-root filter.
Stewart, G. W. — Matrix Algorithms, Volume I: Basic Decompositions · SIAM, 1998. Ch. 4-5: orthogonal triangularization and its updating. The Givens-rotation update of an upper-triangular R after a rank-one modification, the hyperbolic-rotation downdate, the loss of orthogonality and the relative-error analysis showing that the downdate is conditioned by the ratio of the perturbed to unperturbed smallest singular value, and the mixed orthogonal/hyperbolic schemes that stabilize it.
Gill, P. E., Golub, G. H., Murray, W. & Saunders, M. A. — Methods for modifying matrix factorizations · Mathematics of Computation 28 (1974), 505-535. The systematic catalogue of rank-one update and downdate algorithms for the Cholesky, QR, and LU factorizations, the Givens and hyperbolic-rotation realizations, and the operation counts; the foundational paper that organized the subject.
Saunders, M. A. — Large-scale linear programming using the Cholesky factorization · Stanford report STAN-CS-72-252 (1972), and the BFGS quasi-Newton literature (Broyden 1970, Fletcher 1970, Goldfarb 1970, Shanno 1970): the two-rank update of an approximate Hessian or its Cholesky factor that maintains positive-definiteness across optimization iterations, the canonical optimization application of factorization updating.

Estimated time

beginner: 20m
intermediate: 45m
master: 90m