Sylvester and Lyapunov matrix equations: the Bartels-Stewart algorithm
Anchor (Master): Golub-Van Loan *Matrix Computations* (4th ed.) §7.6; Bartels-Stewart *Solution of the Matrix Equation AX+XB=C* (Comm. ACM 15, 1972); Hammarling *Numerical Solution of the Stable Non-negative Definite Lyapunov Equation* (IMA J. Numer. Anal. 2, 1982); Lancaster-Tismenetsky *The Theory of Matrices* (2nd ed., Academic Press, 1985) Ch. 12 (matrix equations); Higham *Functions of Matrices* (SIAM, 2008) §B.16 (the Sylvester operator and sep)
Intuition Beginner
You know how to solve a single linear equation like : divide and read off . There is a version of this where the unknown is a whole matrix , not a single number. The Sylvester equation asks you to find a matrix that satisfies , where , , and are given matrices. It looks exotic, but it is still a linear problem in the entries of , so there is exactly one answer in the good case, and the task is to compute it without doing anything reckless.
Why care? This equation is the workhorse of control engineering. When you ask whether a designed system settles down to rest rather than blowing up, the answer comes from a close relative called the Lyapunov equation, where is a mirror image of . Solving it tells you, with certainty, whether every starting state decays over time. Aircraft autopilots, robot balancers, and power-grid regulators all lean on this test.
The naive way to solve is to flatten into one long column of unknowns and solve a giant ordinary linear system. That works, but it is wasteful and clumsy for sizable matrices. The clever way, due to Bartels and Stewart, first reshapes and into a staircase (triangular) form, which makes the flattened system fall apart into tiny pieces you can solve one after another. Then it reshapes back. The whole trick is to make the matrix structure do the work.
Visual Beginner
The picture shows the Bartels-Stewart idea. You cannot read off the answer to when and are full, tangled matrices. So you first rotate and into triangular staircases, which carries along to a new right-hand side. In the staircase form the equation splits column by column: the first column of depends only on itself, the next depends on the first, and so on, so you sweep across solving small pieces.
FULL (hard) TRIANGULAR (easy, solve in order) UNDO
A X + X B = C A' Y + Y B' = C' X = Q Y Z^T
A,B full A' , B' staircases
rotate back
x x x x x x a * * b * * col 1: 1 small solve
x x x x x x ===> . a * . b * col 2: uses col 1
x x x x x x . . a . . b col 3: uses cols 1,2
(tangled) (read off, one block at a time)Two things are worth seeing. First, putting and into staircase form is the same Schur reduction used to find eigenvalues, so the diagonal entries you end up dividing by are exactly the eigenvalues of and . Second, the answer exists and is unique exactly when no eigenvalue of cancels an eigenvalue of with the opposite sign, because that cancellation is the only thing that can make a divisor land on zero.
Worked example Beginner
Solve the small Sylvester equation with
Both and are already diagonal, so this is the staircase case and we can read the answer off entry by entry. Write with unknown entries .
Step 1. Find the equation for one entry. The entry of is because is diagonal, and the entry of is because is diagonal. So each entry obeys , that is .
Step 2. Solve each entry. Divide by the sum :
Step 3. Assemble the answer: . You can check directly that rebuilds .
What this tells us: in the diagonal case every entry of is just the matching entry of divided by a sum of one eigenvalue from and one from . The solution exists because none of those four sums is zero. If one were zero, that entry could not be solved, and the equation would have no unique answer.
Check your understanding Beginner
Formal definition Intermediate+
Let , , and . The Sylvester equation is the linear matrix equation
for the unknown . The left-hand side defines the Sylvester operator , , a linear map on the matrix space; the equation reads .
Definition (vec and Kronecker form). The vec operator stacks the columns of a matrix into one tall vector: lists the first column of , then the second, and so on. The Kronecker product is the block matrix whose block is . With the standard identity , applied to and , the Sylvester equation becomes the ordinary linear system
whose coefficient matrix is the Kronecker matrix of the Sylvester operator. Here realises and realises after column stacking.
Definition (the Lyapunov equation). Taking , , and with gives the Lyapunov equation
the symmetric specialisation of the Sylvester equation; when is symmetric the solution is symmetric. A matrix is Hurwitz (stable) when every eigenvalue has strictly negative real part, the condition under which the linear system of 02.06.03 decays to the origin.
Definition (separation). The separation of and is
the smallest singular value of the Sylvester operator's Kronecker matrix. It is positive exactly when is invertible, and its reciprocal controls how rounding errors and perturbations in , , propagate into .
Notation: is the Sylvester operator; is the column-stacking map; is the Kronecker product; is the spectrum; is the separation; is the smallest singular value; is the Frobenius norm. The spectrum of , computed in the Key theorem, is .
Counterexamples to common slips
Solvability is not about and sharing eigenvalues; it is about meeting . For A = \begin{psmallmatrix} 1 & 0 \\ 0 & 2 \end{psmallmatrix} and B = \begin{psmallmatrix} -1 & 0 \\ 0 & 5 \end{psmallmatrix}, the matrices share no eigenvalue, yet and give , so the Sylvester operator is singular and the equation may have no solution or infinitely many.
The Sylvester operator's spectrum is the set of sums , not products or ratios. A reader who imports the generalised-eigenvalue ratio of
43.06.10here will mis-locate the singular cases; the additive structure comes from being additive in its two terms.A symmetric right-hand side does not by itself force a positive-definite Lyapunov solution. With A = \begin{psmallmatrix} 1 & 0 \\ 0 & 1 \end{psmallmatrix} (not Hurwitz) and , the equation gives , so is negative-definite; definiteness of tracks the stability of , the content of the Lyapunov theorem of
02.12.08, not merely the sign of .
Key theorem with proof Intermediate+
Theorem (solvability via the Bartels-Stewart reduction; Bartels-Stewart 1972 [source pending]; Golub-Van Loan §7.6 [source pending]). The Sylvester equation has a unique solution for every if and only if . When this holds, the solution is computed by the Bartels-Stewart algorithm: reduce to real Schur form and to real Schur form by the QR algorithm of 43.06.03, with lower quasi-triangular and upper quasi-triangular; transform the right-hand side to ; solve the quasi-triangular equation for by back-substitution over and diagonal blocks; and recover . The method costs and is backward stable.
Proof. The eigenvalues of the Sylvester operator are the sums over , ; this spectral fact, proven in the Full proof set, makes invertible exactly when no sum vanishes, that is when no equals , that is when . Invertibility of is precisely the existence of a unique solution for every , establishing the equivalence.
For the algorithm, substitute the Schur factorisations. Replacing , and left-right multiplying by the orthogonal factors:
Multiply on the left by and on the right by , and insert and to expose :
This transformed equation has the same operator spectrum because , are orthogonally similar to , and so share their eigenvalues; the reduction is therefore lossless and preserves solvability. Now read the transformed equation column by column. Let and be the -th columns of and , and let be upper quasi-triangular. The -th column of is , so the -th column of the equation is
The right-hand side uses only earlier columns , so the columns are found in order . Each step solves a quasi-triangular system with coefficient matrix , itself solved by back-substitution up the rows of , the diagonal blocks of (and of , when a block couples two columns) handled as small dense solves. Each such small solve is non-singular because its eigenvalues are under the spectral hypothesis. Reversing the transformation, , returns the solution to the original equation. The Schur reductions cost and , the transformations of and back are , and the back-substitution is ; backward stability follows because every transformation is orthogonal and the small block solves are backward stable.
Bridge. This solvability-plus-algorithm result builds toward the Lyapunov and control-theory applications of the Advanced results, where the same Schur back-substitution computes the stability Gramians of a linear system. The Bartels-Stewart reduction is exactly the same orthogonal Schur machinery that the QR algorithm of 43.06.03 produces, repurposed so that a matrix-equation solve inherits its backward stability; this is the foundational reason the method never forms the giant Kronecker system in full but instead lets the triangular structure split it into small column solves. The operator-spectrum criterion appears again in the matrix-function unit 43.06.11, where the block Parlett recurrence is a Sylvester solve across separated eigenvalue clusters, and this is exactly the well-posedness proposition proven there, which the present unit cross-references rather than reproving. The central insight is that an equation linear in a matrix unknown is an ordinary linear system in disguise, and putting these together, the bridge is the recognition that triangularising the coefficient matrices triangularises that disguised system, turning an dense solve into an structured one.
Exercises Intermediate+
Advanced results Master
Theorem (conditioning via separation; Golub-Van Loan §7.6 [source pending]). Let be invertible, with . If solves the perturbed equation with the perturbations small relative to , then
Thus is the condition number of the Sylvester solve, and the separation can be far smaller than the smallest gap between operator eigenvalues when or is non-normal.
The separation, not the bare eigenvalue gap, is the right conditioning measure because is the smallest singular value of the Kronecker matrix , whereas the eigenvalue gap is its smallest eigenvalue modulus; for a non-normal operator these diverge, and a Sylvester equation with well-separated operator eigenvalues can still be severely ill-conditioned. Estimating without forming the Kronecker matrix is done by a power-iteration on , each application a Bartels-Stewart solve on the Schur factors already in hand, so the condition estimate reuses the same triangular machinery at the cost of a few extra solves.
Theorem (Hurwitz stability and the Lyapunov equation; Golub-Van Loan §7.6.3 [source pending]; Lancaster-Tismenetsky Ch. 12 [source pending]). For the following are equivalent: (i) is Hurwitz; (ii) for some symmetric positive-definite the Lyapunov equation has a symmetric positive-definite solution ; (iii) for every symmetric positive-definite that equation has a unique symmetric positive-definite solution, given by the convergent Gramian . The solution is the observability Gramian when , and the controllability Gramian solves the dual equation .
This is the algebraic face of the Lyapunov direct method of 02.12.08: a positive-definite solving the Lyapunov equation is exactly a quadratic Lyapunov function whose derivative along trajectories of is , certifying decay to the origin. The equivalence converts the analytic stability question — do all trajectories decay? — into a single matrix-equation solve, and the Bartels-Stewart algorithm makes that solve a backward-stable computation; the eigenvalue-free certificate this provides is the basis of how control software verifies stability without ever computing the spectrum of . The controllability and observability Gramians it produces feed balanced truncation, the standard model-reduction method, where the Hankel singular values are the square roots of the eigenvalues of the product of the two Gramians.
Theorem (Hammarling's Cholesky-factor variant; Hammarling 1982 [source pending]). Let be Hurwitz and let the right-hand side be in factored form. Then the unique positive-definite solution of admits a Cholesky factorisation with upper-triangular, and can be computed directly from the real Schur form of and the factor , column block by column block, without ever forming or .
Computing rather than halves the storage of the symmetric solution and, more importantly, guarantees the computed solution is positive-semidefinite to working precision — a property that forming explicitly and then refactoring can lose to rounding. The Hammarling recursion threads the Cholesky factorisation through the same Schur back-substitution as Bartels-Stewart: at each diagonal block it solves a small triangular Lyapunov equation for a block of , and the off-diagonal blocks fall out of a Sylvester solve against the already-computed blocks. This factored form is the numerically preferred route in model-reduction codes, where the Gramian factors, not the Gramians, are what balanced truncation consumes.
Synthesis. The Sylvester equation is a linear system wearing matrix clothing, and the equivalence of its three faces — the matrix equation , the operator equation , and the Kronecker system — is the central insight that organises every algorithm and every conditioning bound for it. This is exactly why the operator spectrum is the additive set and why solvability is the separation condition , the well-posedness proposition that 43.06.11 proves and the present unit cross-references; the Bartels-Stewart algorithm generalises the scalar divide-and-read-off recipe by triangularising and through the Schur reduction of 43.06.03, so the disguised Kronecker matrix triangularises with them and the dense solve collapses into structured column solves of cost total. The Lyapunov specialisation is dual to the Sylvester general case: setting and turns the solve into a stability certificate, and the Gramian integral ties the algebraic equation back to the matrix exponential of 02.06.03 and the analytic Lyapunov direct method of 02.12.08, so the computed solution is simultaneously a number a computer returns and a quadratic Lyapunov function a theorem guarantees.
Putting these together, the bridge from the static spectral theory of and to a production solver is the recognition that the separation — the smallest singular value of the operator, not the smallest gap of its spectrum — is what conditions the answer, the foundational reason a problem with well-separated operator eigenvalues can still be ill-conditioned. Hammarling's factored variant preserves the definiteness the stability application demands, and the entire computation rides on the orthogonal Schur form, never on the eigenvector basis that non-normality would corrupt.
Full proof set Master
Proposition (spectrum of the Sylvester operator). The eigenvalues of on are exactly the sums with and , counted with the products of the corresponding generalised-eigenspace dimensions. Hence is invertible iff .
Proof. Pass to the Kronecker matrix , which represents in the basis ordered by , so . Triangularise both coefficient matrices by Schur: with upper-triangular carrying on its diagonal, and with upper-triangular carrying on its diagonal. Conjugate by the unitary and apply the mixed-product law : $$ (W \otimes U)^{\mathsf H} K (W \otimes U) = I_n \otimes R_A + R_B \otimes I_m, $$ since and similarly the second term gives . The matrix is upper-triangular: is block-diagonal with the upper-triangular block repeated, and is upper-block-triangular with diagonal blocks the scalar multiples . Its diagonal entries are therefore the sums of the -th diagonal of and the -th diagonal of , namely ranging over all . These are the eigenvalues of , hence of . Invertibility fails exactly when some , that is , that is .
Proposition (Bartels-Stewart column recurrence is well-posed). Under , the quasi-triangular equation (with , in real Schur form) is solved column-block by column-block, and each block solve has a non-singular coefficient matrix.
Proof. Order the columns . Expanding the -th column of with upper quasi-triangular gives , so the -th column of reads , that is , whose right-hand side involves only previously computed columns. The coefficient matrix has eigenvalues where and is a diagonal entry of , hence an eigenvalue . Under the hypothesis no vanishes, so is non-singular and is determined. When a block of couples columns , the pair is solved jointly as a system whose coefficient matrix is with the block; its eigenvalues are for the two (complex-conjugate) eigenvalues of the block, again non-zero, so the joint solve is non-singular.
Proposition (uniqueness and symmetry of the Lyapunov solution). For Hurwitz and symmetric , the Lyapunov equation has a unique solution, and that solution is symmetric.
Proof. Uniqueness is the Sylvester criterion with : the operator has spectrum where the first summand ranges over . (Writing the eigenvalues of as , a sum would force ; but Hurwitz makes every real part strictly negative, so every such sum has strictly negative real part and cannot vanish.) Thus the operator is invertible and the solution is unique. For symmetry, transpose the equation: , and , so solves the same Lyapunov equation; uniqueness forces .
Proposition (Gramian formula and positive-definiteness). For Hurwitz and symmetric positive-definite , converges, solves , and is symmetric positive-definite.
Proof. Convergence: since is Hurwitz, for some , so the integrand is bounded in norm by and the integral converges absolutely. That solves the equation follows by integrating the derivative identity from to : the left side telescopes to , the right to . Symmetry: each integrand satisfies because , so . Positive-definiteness: for , ; the integrand is and strictly positive near since and is invertible for all , so the integral over the positive-measure neighbourhood of is strictly positive. Hence .
Connections Master
The Bartels-Stewart algorithm is built directly on the real Schur form produced by the QR algorithm of 43.06.03: the entire method is a Schur reduction of and followed by quasi-triangular back-substitution, so the backward stability and the cost are inherited from the QR reduction, and the diagonal blocks that the real Schur form carries for complex-conjugate eigenvalue pairs are exactly the blocks the column recurrence handles as small dense solves rather than scalar divisions.
The well-posedness of the Sylvester equation is the operator-spectrum proposition that already appears in the matrix-function unit 43.06.11, where the block Parlett recurrence is a Sylvester solve across well-separated eigenvalue clusters; that unit owns and proves the solvability proposition for the block-triangular subroutine, and the present unit specialises it to the general dense and supplies the algorithm for two arbitrary matrices, so the two units are the well-posedness theorem and its general solver.
The Lyapunov equation is the algebraic counterpart of the analytic Lyapunov direct method of 02.12.08: a positive-definite solution of is precisely a quadratic Lyapunov function certifying asymptotic stability of , and the Gramian formula expresses that solution through the matrix exponential of 02.06.03, tying the matrix-equation solve to the flow whose decay it certifies.
The eigenvalue-and-spectrum theory of 01.01.08 underlies the solvability criterion: the Sylvester operator's eigenvalues are the pairwise sums of the eigenvalues of and , so the characteristic-polynomial and spectrum machinery of that foundational unit is what the separation condition rests on, and the degenerate case where a sum vanishes is the matrix-equation analogue of a vanishing characteristic value.
The separation and the generalised-eigenvalue conditioning of 43.06.10 are siblings in invariant-subspace sensitivity: the same separation that conditions the Sylvester solve governs the sensitivity of the invariant subspaces spanned by clustered eigenvalues, so a small signals both an ill-conditioned matrix-equation solve and a poorly determined invariant subspace, linking matrix equations to the perturbation theory of the eigenproblem.
Historical & philosophical context Master
The equation carries the name of James Joseph Sylvester, who studied the operator and its spectrum in the nineteenth-century theory of bilinear forms and showed that its eigenvalues are the pairwise sums of those of and , the result that fixes when the equation is uniquely solvable [Sylvester 1884]. The symmetric specialisation is named for Aleksandr Lyapunov, whose 1892 dissertation on the stability of motion introduced the direct method in which a positive-definite quadratic form decreasing along trajectories certifies stability, the analytic source of the matrix-equation stability criterion [Lyapunov 1892].
The numerical breakthrough was the 1972 algorithm of Richard Bartels and G. W. Stewart, which recognised that reducing and to Schur form by the then-recent QR algorithm turns the dense Sylvester equation into a quasi-triangular system solvable by back-substitution, giving the first backward-stable dense solver and displacing the Kronecker approach [Bartels 1972]. Sven Hammarling's 1982 variant computed the Cholesky factor of the Lyapunov solution directly from the Schur form, preserving positive-semidefiniteness to working precision and supplying the factored Gramians that balanced-truncation model reduction consumes [Hammarling 1982]. The Bartels-Stewart method, with its Hessenberg-Schur refinement due to Golub, Nash, and Van Loan, remains the dense solver behind the Sylvester and Lyapunov routines of LAPACK and SLICOT.
Bibliography Master
@article{BartelsStewart1972,
author = {Bartels, Richard H. and Stewart, G. W.},
title = {Solution of the Matrix Equation {$AX+XB=C$}},
journal = {Communications of the ACM},
volume = {15},
number = {9},
year = {1972},
pages = {820--826}
}
@article{Hammarling1982,
author = {Hammarling, Sven J.},
title = {Numerical Solution of the Stable Non-negative Definite Lyapunov Equation},
journal = {IMA Journal of Numerical Analysis},
volume = {2},
number = {3},
year = {1982},
pages = {303--323}
}
@article{GolubNashVanLoan1979,
author = {Golub, Gene H. and Nash, Stephen and Van Loan, Charles F.},
title = {A {H}essenberg-{S}chur Method for the Problem {$AX+XB=C$}},
journal = {IEEE Transactions on Automatic Control},
volume = {24},
number = {6},
year = {1979},
pages = {909--913}
}
@article{Sylvester1884,
author = {Sylvester, James Joseph},
title = {Sur l'\'equation en matrices $px = xq$},
journal = {Comptes Rendus de l'Acad\'emie des Sciences},
volume = {99},
year = {1884},
pages = {67--71, 115--116}
}
@book{Lyapunov1892,
author = {Lyapunov, Aleksandr M.},
title = {The General Problem of the Stability of Motion},
publisher = {Kharkov Mathematical Society (transl. Taylor \& Francis, 1992)},
year = {1892}
}
@book{LancasterTismenetsky1985,
author = {Lancaster, Peter and Tismenetsky, Miron},
title = {The Theory of Matrices},
edition = {2nd},
publisher = {Academic Press},
address = {Orlando},
year = {1985}
}
@book{HornJohnson1991,
author = {Horn, Roger A. and Johnson, Charles R.},
title = {Topics in Matrix Analysis},
publisher = {Cambridge University Press},
address = {Cambridge},
year = {1991}
}
@book{GolubVanLoan2013,
author = {Golub, Gene H. and Van Loan, Charles F.},
title = {Matrix Computations},
edition = {4th},
publisher = {Johns Hopkins University Press},
address = {Baltimore},
year = {2013}
}