Computing matrix functions: the matrix exponential
Anchor (Master): Higham *Functions of Matrices: Theory and Computation* (SIAM, 2008) Ch. 1, Ch. 4 (Schur-Parlett), Ch. 10 (the exponential); Moler-Van Loan *Nineteen Dubious Ways…, Twenty-Five Years Later* (SIAM Review 45, 2003); Golub-Van Loan *Matrix Computations* (4th ed.) §9.3; Higham *The Scaling and Squaring Method for the Matrix Exponential Revisited* (SIAM J. Matrix Anal. Appl. 26, 2005)
Intuition Beginner
You already know how to plug a number into a function: square it, take its exponential, take its square root. There is a way to do the same thing to a whole matrix. Feeding a matrix into the exponential produces another matrix, written , and this single object packages the entire future of a system of linear differential equations: start at a state, multiply by , and you have the state at time . So computing accurately matters a great deal in practice.
The honest surprise is that the obvious recipes for are dangerous. One obvious idea is to copy the scalar series and add up matrix powers instead. For many matrices this works, but for others the early terms grow enormous before they cancel down to a small answer, and the computer loses almost all its accuracy in the subtraction. A second obvious idea is to find the eigenvalues, exponentiate those, and rebuild the matrix. That falls apart when the matrix is close to having a repeated eigenvalue, because the rebuilding step divides by the tiny gap between eigenvalues.
The method that actually works is gentler. Shrink the matrix by repeatedly halving it until it is small and safe, compute the exponential of the small matrix with a reliable approximation, then undo the halving by squaring the result back up. Halving and squaring are both stable, so the danger is sidestepped. This shrink-then-square recipe is what real software runs when you ask for a matrix exponential.
Visual Beginner
The picture contrasts the reckless route with the safe route. The reckless route feeds the full matrix straight into a long power series, where intermediate numbers balloon to enormous size, the "hump", before collapsing back; the computer cannot track that swing and the answer is corrupted. The safe route first halves the matrix several times so the numbers stay small, exponentiates the small matrix cleanly, then squares the answer back up the same number of times.
RECKLESS SAFE (scaling and squaring)
A -> 1 + A + A^2/2 + ... A -> A/2 -> A/4 -> ... -> A/2^s (small, safe)
| |
terms balloon (the hump), E = exp(A/2^s) (clean)
then cancel; accuracy lost |
| square s times: E -> E^2 -> E^4 -> ...
e^A (corrupted) |
e^A = E^(2^s) (accurate)Two things are worth seeing. First, halving the matrix divides every entry by two, so after a handful of halvings the matrix is small enough that even a short approximation nails its exponential. Second, squaring is the exact inverse of halving for the exponential, because the exponential of twice a matrix is the square of the exponential, so undoing the shrink costs nothing in principle and very little in practice.
Worked example Beginner
Take the matrix
This matrix has a repeated eigenvalue , the kind of case that breaks the eigenvalue recipe. We compute directly and see what happens.
Step 1. Compute the powers. Multiply by itself: . Every higher power is the zero matrix too, so the series stops after two terms.
Step 2. Add the surviving terms. The exponential series is the identity plus plus plus higher terms, and only the first two survive:
Step 3. See why eigenvalues alone fail. Both eigenvalues are , so a naive "exponentiate the eigenvalues" recipe would give for both and reconstruct the identity matrix , missing the off-diagonal entirely. The correct answer keeps that , which records the coupling between the two coordinates.
What this tells us: the exponential of a matrix is not just the exponential of its eigenvalues sprinkled on the diagonal. The off-diagonal structure matters, and a method that throws it away gives the wrong answer. A trustworthy algorithm must respect the whole matrix, not only its spectrum.
Check your understanding Beginner
Formal definition Intermediate+
Let and let be a function analytic on an open set containing the spectrum . There are three standard definitions of the matrix function , and they agree.
Definition (Jordan / interpolating-polynomial form). Write the Jordan canonical form with Jordan blocks of size , in the sense of 01.01.11. Then , where acts block by block by the finite Taylor expansion
using the values of and its derivatives at up to order . Equivalently, for the unique Hermite interpolating polynomial that matches and its derivatives at each eigenvalue to the order of the largest Jordan block there. This makes depend only on the values of on together with finitely many derivatives.
Definition (Cauchy integral form). If is analytic inside and on a contour enclosing ,
the contour integral of the resolvent weighted by . The three definitions coincide for every analytic on a neighbourhood of the spectrum.
Definition (the matrix exponential). For , the matrix exponential is the everywhere-convergent series , the special case whose theory as the flow of the linear system is developed in 02.06.03. The present unit treats its numerical computation.
Definition (scaling and squaring). Fix an integer and a rational diagonal Padé approximant to of degree in numerator and denominator. The scaling-and-squaring approximation is
exact in the relation , with chosen so that lies in the region where is accurate.
Definition (Schur-Parlett). Reduce to upper-triangular Schur form via 43.06.03. Compute by setting the diagonal and filling the strict upper triangle by the Parlett recurrence
then form . The recurrence divides by and so requires a block reordering when eigenvalues cluster.
Notation: is the spectrum; is the nilpotent part of a Jordan block; is the resolvent; is the degree- diagonal Padé approximant; is a submultiplicative matrix norm; , introduced in the Advanced results, is the Fréchet derivative of at . The condition number measures the sensitivity of .
Counterexamples to common slips
- The exponential of a sum is not the product of exponentials unless the matrices commute: in general. For A = \begin{psmallmatrix} 0 & 1 \\ 0 & 0 \end{psmallmatrix} and B = \begin{psmallmatrix} 0 & 0 \\ 1 & 0 \end{psmallmatrix} the two sides differ, because . Algorithms must not assume the scalar splitting.
- A small norm of does not bound the intermediate terms when is non-normal. The truncated Taylor series for can swell to a "hump" of size far exceeding before settling, so a series truncated where looks small is still inaccurate; the hump height is governed by the departure from normality, not by alone.
- Diagonalising and exponentialising the eigenvalues, , is unstable when the eigenvector matrix is ill-conditioned: the error carries the factor , which blows up as approaches a defective (repeated-eigenvalue) matrix, exactly the regime where the Jordan structure of
01.01.11is nontrivial in size.
Key theorem with proof Intermediate+
Theorem (equivalence of the matrix-function definitions; Higham Ch. 1 [source pending]; Golub-Van Loan §9.3 [source pending]). Let and let be analytic on an open set containing . The Jordan/interpolating-polynomial definition and the Cauchy-integral definition of produce the same matrix, and that matrix equals for the Hermite interpolating polynomial matching on the spectrum to the Jordan-block orders. In particular commutes with and with every matrix commuting with , and if then .
Proof. Reduce to a single Jordan block, since gives and both definitions respect the conjugation . For a block of size with , expand the resolvent in a finite geometric series:
valid because is nilpotent of index , so the series terminates and converges for . Substitute into the Cauchy integral and integrate term by term over a contour enclosing :
By the Cauchy differentiation formula, the -th scalar integral equals . Hence the Cauchy definition yields , which is exactly in the Jordan form. Summing over blocks gives , and conjugating gives , identical to the Jordan definition. The interpolating-polynomial form follows because a polynomial with for produces the same block value ; the Hermite conditions at each eigenvalue determine on and so determine . Commutativity with holds because is a polynomial in .
Bridge. This equivalence builds toward the algorithms of the Advanced results, where each definition becomes a computational route: the interpolating-polynomial form is the foundational reason the Schur-Parlett method works on the triangular factor, the Cauchy-integral form is the basis of contour-integral quadrature methods, and the series form specialised to is what scaling-and-squaring tames. The conditioning of appears again in 43.06.04 through the eigenvector factor that wrecks the diagonalisation route, and this is exactly why the stable methods work through the orthogonal Schur form of 43.06.03 rather than the eigenvector basis. The central insight is that depends only on on the spectrum and finitely many derivatives, so computing it is a Hermite-interpolation problem in disguise; putting these together, the numerical task is not to evaluate an infinite series but to interpolate at the eigenvalues without ever dividing by the gaps between close eigenvalues, and the bridge is the recognition that scaling-and-squaring and Schur-Parlett are two disciplined ways to do that interpolation stably where the Jordan form of 01.01.11 is numerically inaccessible.
Exercises Intermediate+
Advanced results Master
Theorem (backward-error analysis of scaling and squaring; Higham 2005 [source pending]; Higham Ch. 10 [source pending]). Let be the degree- diagonal Padé approximant to , and let be chosen so that for the threshold tabulated by the backward-error bound. Then the computed satisfies exactly, with
where is the unit roundoff, provided is the largest value for which the truncation function has . The pair is the standard choice underlying production codes.
The backward-error formulation is decisive: rather than bound the forward error , which entangles the conditioning of the exponential, one shows the computed matrix is the exact exponential of a nearby matrix , and the relative perturbation is at the level of the rounding unit. The function collects the Padé truncation error as a power series with known coefficients ; bounding in terms of converts the scalar Padé error into a backward matrix perturbation, because where is applied as a matrix function of . Squaring times multiplies the argument back to , and the per-step backward error does not amplify under squaring because each squaring is the exact exponential map on the perturbed argument. Higham's analysis selects to minimise cost — squarings plus Padé matrix multiplications — subject to the backward-error bound, and identifies overscaling: choosing too large (a larger threshold would have sufficed) injects extra rounding through the additional squarings of a matrix whose off-diagonal hump is amplified, degrading accuracy for non-normal . The remedy is to base on quantities sharper than , such as for small , which detect the true growth rate.
Theorem (Schur-Parlett with reordering and blocking; Higham Ch. 4 [source pending]; Parlett 1976 [source pending]). Let $A = QTQ^{}TT = (T_{ij})f(T_{ii})\bar\lambda_i$ — and fill the off-diagonal blocks by the block Parlett recurrence*
a Sylvester equation solvable because and have disjoint spectra. Then $f(A) = Q F Q^{}$.*
The block recurrence replaces the scalar division by — catastrophic when two eigenvalues nearly coincide — by the solution of a Sylvester equation , which is well-conditioned precisely when the separation between the two blocks' spectra is bounded away from zero, the condition the clustering enforces. Within a block, where eigenvalues are deliberately close, the diagonal-block evaluation uses a method insensitive to small eigenvalue gaps, such as a truncated Taylor series of about the cluster mean applied to , whose spectrum is tiny. The cost of choosing the clustering — a parameter trading off block size against separation — is the central tuning of the algorithm, and the resulting method computes for a general analytic to backward stability comparable to the Schur reduction itself.
Theorem (conditioning of via the Fréchet derivative; Higham Ch. 3 [source pending]). The map has Fréchet derivative , the unique linear operator with . Its norm gives the relative condition number , and is computable as the block of applied to the block matrix \begin{psmallmatrix} A & E \\ 0 & A \end{psmallmatrix}.
The block-matrix formula f\!\begin{psmallmatrix} A & E \\ 0 & A \end{psmallmatrix} = \begin{psmallmatrix} f(A) & L_f(A,E) \\ 0 & f(A) \end{psmallmatrix} reduces the Fréchet derivative to one matrix-function evaluation at double size, so condition estimation reuses the same algorithm; the identity follows from the interpolating-polynomial definition applied to the block-triangular argument, whose off-diagonal block accumulates exactly the derivative terms. The condition number can be far larger than when is non-normal, which is the quantitative form of the hump: the same departure from normality that makes intermediate Taylor terms swell makes sensitive to perturbations of .
Synthesis. Computing a matrix function is, foundationally, a Hermite-interpolation problem dressed as analysis, and the equivalence of the three definitions is the central insight that organises every algorithm: depends only on and finitely many derivatives on the spectrum, so the task is to evaluate that interpolation without ever dividing by the gaps between close eigenvalues. This is exactly why the naive routes fail and the disciplined ones succeed — the eigendecomposition route divides by the eigenvector conditioning of 43.06.04 and diverges toward the defective matrices where the Jordan structure of 01.01.11 grows, while the truncated series swells over the non-normal hump whose height the Fréchet-derivative condition number measures. The stable methods route around both: scaling-and-squaring generalises the scalar identity into a backward-stable algorithm by keeping the scaled argument small enough that a single Padé approximant carries a backward error at the rounding unit, and Schur-Parlett works on the orthogonal Schur form of 43.06.03, replacing the scalar Parlett division by a Sylvester equation whose conditioning the clustering controls. Putting these together, the bridge from the static theory of as the flow of 02.06.03 to a production algorithm is the recognition that the exponential of a matrix is not the exponential of its spectrum but a spectral interpolation computed through orthogonal reductions and bounded scalings, never through the ill-conditioned eigenvector basis; this is dual to the QR-algorithm story of 43.06.03, where the same orthogonal Schur form converts a static existence theorem into a backward-stable computation.
Full proof set Master
Proposition (the resolvent expansion on a Jordan block). For a Jordan block of size with , the resolvent is for .
Proof. Write . Since is nilpotent of index , the Neumann series for terminates: , because multiplying this finite sum by telescopes to . Dividing by the scalar gives .
Proposition (scaling-and-squaring identity, exact). For every and integer , .
Proof. The matrices and commute, so , using the commuting-addition law for the exponential, itself a consequence of the absolute convergence of the defining series and the binomial rearrangement valid for commuting matrices. Induct: assume . Then . At this is .
Proposition (Padé approximant as an exact exponential of a perturbed argument). Let be the diagonal Padé approximant to and let have spectral radius small enough that is defined by its convergent series. Then where , and consequently .
Proof. The scalar function satisfies and, because matches the Taylor series of through degree , , so is analytic near with series . As matrix functions of the single matrix , all of , , and are polynomials in and hence commute, so , the last step by the commuting-addition law since and commute. Put : . Raising to the power and using the exact scaling identity on the perturbed argument, .
Proposition (block Parlett recurrence is a Sylvester equation). For block upper-triangular and , the off-diagonal blocks satisfy , uniquely solvable for when .
Proof. is a polynomial in , so . Equate the block of each side using block upper-triangularity, . Separate and : . Rearranging the two terms to the left gives the stated Sylvester equation . A Sylvester equation has a unique solution iff and share no eigenvalue, since the associated operator has spectrum , which excludes exactly when the spectra are disjoint. The clustering makes and disjoint, so is determined.
Proposition (Fréchet derivative via the block-triangular argument). For analytic , f\!\begin{psmallmatrix} A & E \\ 0 & A \end{psmallmatrix} = \begin{psmallmatrix} f(A) & L_f(A, E) \\ 0 & f(A) \end{psmallmatrix}, where is the Fréchet derivative of at in direction .
Proof. For a monomial , expand \begin{psmallmatrix} A & E \\ 0 & A \end{psmallmatrix}^k. By induction the and blocks are and the block is , the Fréchet derivative of at in direction : the base case is immediate, and multiplying the -th power by the block matrix produces block , the -term sum. By linearity the identity holds for polynomials, and the off-diagonal block is the polynomial's Fréchet derivative. For analytic , take the Hermite interpolating polynomial agreeing with on \sigma\begin{psmallmatrix} A & E \\ 0 & A\end{psmallmatrix} = \sigma(A) (with doubled multiplicities) to sufficient order; then of the block matrix equals of it, and since the Fréchet derivative depends only on on the spectrum to first order.
Connections Master
The matrix exponential computed here is the numerical realisation of the flow of the linear system defined and analysed in 02.06.03: that unit establishes as the fundamental solution and develops its theory through the Jordan form, whereas the present unit supplies the stable algorithm — scaling-and-squaring with Padé — that every ODE integrator and exponential-integrator scheme calls to advance such a system, so the two units are the theory and the computation of the same object.
The stable methods of this unit are built on the orthogonal Schur reduction produced by the QR algorithm of 43.06.03: Schur-Parlett evaluates on the triangular factor rather than on or its eigenvector basis, reusing the same backward-stable orthogonal similarity that the QR algorithm computes, and the conditioning argument that forbids the eigenvector route is the mirror image of why the QR algorithm itself avoids forming eigenvectors until the spectrum is in hand.
The instability of the eigendecomposition route, , is governed by the eigenvector conditioning that the Bauer-Fike theory of 43.06.04 identifies as the amplification factor for non-normal eigenproblems; the same that bounds eigenvalue sensitivity there bounds the error of the diagonalisation method here, and both diverge toward the defective matrices whose Jordan structure 01.01.11 describes, which is why the Jordan form is a theoretical device and never a numerical one.
The Schur-Parlett block recurrence reduces to a Sylvester equation , the same matrix-equation machinery whose well-posedness rests on the disjointness of two spectra and whose solution underlies the Bartels-Stewart algorithm; the separation that conditions this solve is the quantitative content of the eigenvalue-clustering parameter and connects matrix-function evaluation to the broader theory of invariant-subspace sensitivity.
The interpolating-polynomial definition that unifies the three forms of is Hermite interpolation on the spectrum, linking this unit to the polynomial-interpolation and divided-difference theory of the approximation chapter: the divided differences of at the eigenvalues are exactly the entries the Parlett recurrence computes, and the confluent (repeated-node) case is the numerical face of the Jordan-block derivatives of 01.01.11.
Historical & philosophical context Master
The cautionary framing of this subject is due to Cleve Moler and Charles Van Loan, whose 1978 SIAM Review paper Nineteen Dubious Ways to Compute the Exponential of a Matrix catalogued the candidate methods — Taylor series, Padé approximation, scaling and squaring, ordinary-differential-equation solvers, polynomial methods, matrix-decomposition methods, and splitting methods — and showed that nearly every one has a regime in which it fails, with the eigenvector and naive-series methods failing on non-normal and defective matrices and the hump phenomenon defeating norm-based truncation [Moler 1978]. Their 2003 sequel, Twenty-Five Years Later, revisited the survey and confirmed that scaling-and-squaring with a Padé approximant had become the method of choice, while emphasising that the conditioning of the problem, not merely the method, sets the achievable accuracy [Moler 2003].
The decisive algorithmic advance was Nicholas Higham's 2005 backward-error analysis, which replaced the older heuristic choices of Padé degree and scaling power by a rigorous bound: the computed result is the exact exponential of with , and the optimal minimising cost subject to that bound is determined by the truncation-error function ; this analysis, identifying and correcting the overscaling that had degraded earlier implementations, is the basis of the modern expm [Higham 2005]. The Parlett recurrence for functions of triangular matrices dates to Beresford Parlett's 1976 note, which gave the entrywise recurrence and recognised its breakdown for close eigenvalues [Parlett 1976]; the blocked, reordered Schur-Parlett algorithm that resolves that breakdown by solving Sylvester equations across well-separated clusters was developed by Davies and Higham and codified in Higham's 2008 monograph Functions of Matrices [Higham 2008].
Bibliography Master
@article{MolerVanLoan1978,
author = {Moler, Cleve B. and Van Loan, Charles F.},
title = {Nineteen Dubious Ways to Compute the Exponential of a Matrix},
journal = {SIAM Review},
volume = {20},
number = {4},
year = {1978},
pages = {801--836}
}
@article{MolerVanLoan2003,
author = {Moler, Cleve B. and Van Loan, Charles F.},
title = {Nineteen Dubious Ways to Compute the Exponential of a Matrix, Twenty-Five Years Later},
journal = {SIAM Review},
volume = {45},
number = {1},
year = {2003},
pages = {3--49}
}
@article{Higham2005,
author = {Higham, Nicholas J.},
title = {The Scaling and Squaring Method for the Matrix Exponential Revisited},
journal = {SIAM Journal on Matrix Analysis and Applications},
volume = {26},
number = {4},
year = {2005},
pages = {1179--1193}
}
@book{Higham2008,
author = {Higham, Nicholas J.},
title = {Functions of Matrices: Theory and Computation},
publisher = {SIAM},
address = {Philadelphia},
year = {2008}
}
@article{Parlett1976,
author = {Parlett, Beresford N.},
title = {A Recurrence Among the Elements of Functions of Triangular Matrices},
journal = {Linear Algebra and its Applications},
volume = {14},
number = {2},
year = {1976},
pages = {117--121}
}
@article{DaviesHigham2003,
author = {Davies, Philip I. and Higham, Nicholas J.},
title = {A Schur-Parlett Algorithm for Computing Matrix Functions},
journal = {SIAM Journal on Matrix Analysis and Applications},
volume = {25},
number = {2},
year = {2003},
pages = {464--485}
}
@book{GolubVanLoan2013,
author = {Golub, Gene H. and Van Loan, Charles F.},
title = {Matrix Computations},
edition = {4th},
publisher = {Johns Hopkins University Press},
address = {Baltimore},
year = {2013}
}