37.05.01 · probability / 05-markov-chains

The Markov Property, Transition Matrices, and the Chapman–Kolmogorov Equations

shipped3 tiersLean: none

Anchor (Master): Norris 1997 *Markov Chains* (Cambridge) §1.1-1.4, §1.7 (strong Markov); Meyn-Tweedie 2009 *Markov Chains and Stochastic Stability* 2e Ch. 3; Levin-Peres 2017 *Markov Chains and Mixing Times* 2e Ch. 1

Intuition Beginner

A Markov chain is a model for a system that moves between a list of possible situations, called states, one step at a time, where the only thing that matters for the next move is where the system is right now. The whole history of how it got there is irrelevant. This is the memoryless property: the past affects the future only through the present.

Think of a board game where each square tells you the chances of where you land next. If you are on the "rainy" square, the rules might say you go to "rainy" again with chance seven in ten and to "sunny" with chance three in ten, and these chances are the same no matter how you arrived at "rainy" or how long you have been playing. To run the model you need two things: a starting square, given as a list of chances over all squares, and a table that lists, for every square, the chances of jumping to each other square in one step. That table is the transition matrix.

The transition matrix is just bookkeeping for one-step moves, but it secretly contains all the longer-range information too. Suppose you want the chance of going from "rainy" today to "sunny" two days from now. You list every possible weather tomorrow, multiply the chance of reaching it tomorrow by the chance of then reaching "sunny" the day after, and add up over all the in-between possibilities. This add-up-over-the-middle rule is the Chapman–Kolmogorov equation. It says the two-step table is built by chaining two one-step tables, and more generally the table for any number of steps is built by chaining the one-step table that many times.

The reason this matters is that an enormous range of real processes are memoryless once you choose the state cleverly: shuffling a deck, a random walk on a network, a queue at a counter, the spread of a rumor through groups. The single assumption "the future depends on the past only through the present" turns each of these into an object you can compute with using nothing harder than multiplying tables.

The one-sentence takeaway: a Markov chain is a memoryless step-by-step process described by a starting distribution and a one-step transition table, and the chance of any longer trip is found by the Chapman–Kolmogorov rule of summing over all the intermediate stops.

Visual Beginner

Picture two weather states drawn as two circles, "Sunny" and "Rainy," with arrows showing the one-step chances of moving between them.

Left: the chain as a labeled graph; each arrow weight is a one-step transition probability and the arrows out of each circle sum to one. Right: the Chapman–Kolmogorov rule for two steps from Sunny to Sunny, summing over the two possible middle states. The same picture, with more circles and arrows, describes any finite-state chain; the transition matrix is just the table of arrow weights.

Worked example Beginner

We compute two-step weather chances for the chain above. The one-step table reads: from Sunny, stay Sunny with chance and switch to Rainy with chance ; from Rainy, switch to Sunny with chance and stay Rainy with chance .

Step 1. Find the chance of Sunny today to Sunny in two days. There are two ways to make the trip: Sunny then Sunny then Sunny, or Sunny then Rainy then Sunny. List the middle state and multiply along each path.

Step 2. Path through Sunny: chance to stay Sunny on day one, then chance to stay Sunny on day two, giving .

Step 3. Path through Rainy: chance to switch to Rainy on day one, then chance to switch back to Sunny on day two, giving .

Step 4. Add the two paths, because they are the separate ways the same two-step trip can happen: . So the chance of Sunny-to-Sunny in two steps is .

Step 5. Check the bookkeeping. The other two-step chance from Sunny is Sunny-to-Rainy, which must be because the system has to be somewhere. Directly: , which matches.

What this tells us: the two-step chances came entirely from multiplying and adding the one-step chances, summing over the middle state. That summing-over-the-middle is exactly the Chapman–Kolmogorov rule, and it is the same operation as multiplying the transition table by itself. No new information beyond the one-step table was ever needed.

Check your understanding Beginner

Formal definition Intermediate+

Throughout, is a countable set, the state space, and time is discrete, indexed by . All random objects live on a probability space 37.01.01.

Definition (stochastic matrix). A stochastic matrix on is a family with for all and for every . Each row is thus a probability distribution on , the transition distribution out of . A distribution on is a row vector with and .

Definition (Markov chain; -chain). A sequence of -valued random variables is a time-homogeneous Markov chain with initial distribution and transition matrix — a chain — if and, for every and every with , The first equality is the (simple) Markov property: the conditional law of the next state given the whole past depends on the past only through the present state . The second is time-homogeneity: this conditional law does not depend on .

Definition (finite-dimensional law of the chain). Equivalently, is if and only if for every and every , These finite-dimensional distributions are consistent in the sense of 37.01.01, so the Kolmogorov extension theorem produces the law of on the path space with its cylinder -algebra 02.07.01; the canonical realization takes to be the coordinate maps.

Definition (-step transition matrix). The -step transition probabilities are for with , extended to all by the matrix power. The convention (the identity matrix ) and holds, and denotes the matrix of -step probabilities. Matrix products are defined by the usual rule , the sum converging because the entries are nonnegative with bounded row sums.

Definition (notation for the chain started at ). Write for the law of the chain started deterministically at (initial distribution the point mass ), and for the corresponding expectation. Then .

Counterexamples to common slips Intermediate+

  • The Markov property is not "independence of the past." The future is generally dependent on the past; the property is that this dependence is mediated entirely by the present state. Conditioning on makes the future and the past conditionally independent — not independent.

  • A function of a Markov chain need not be Markov. If is a Markov chain and is not injective, is usually not a Markov chain, because can fail to determine the transition law that determines. Lumping states preserves the Markov property only under the Kemeny–Snell strong-lumpability condition that the collapsed transition probabilities are constant on each block.

  • Time-homogeneity is an extra assumption. Dropping it gives an inhomogeneous chain with a different matrix at each step; then is an ordered product and the clean power is unavailable. The Chapman–Kolmogorov identity survives as , but the matrices no longer commute or coincide.

  • Rows sum to one, not columns. is stochastic by rows. A matrix that is stochastic by columns transports distributions the other way; a doubly stochastic matrix (both row and column sums one) is the special case for which the uniform distribution is stationary.

Key theorem with proof Intermediate+

Theorem (Chapman–Kolmogorov; -step transitions are matrix powers). Let be on a countable state space . Then for all and all , Consequently is the -th matrix power of , and the law of from initial distribution is the row vector : [Norris 1997 §1.1-1.2].

Proof. We first record the finite-dimensional law and then split the trajectory at an intermediate time.

Step 1 (finite-dimensional law). By the definition of a -chain and the multiplication rule for conditional probabilities, each conditional factor reducing to by the Markov property and time-homogeneity. (Terms with a vanishing conditioning probability contribute zero on both sides and may be dropped.)

Step 2 (-step probability as a sum over paths). Summing the Step 1 identity over the intermediate states with , fixed, and dividing by under , which is exactly the entrywise formula for the -fold matrix product, established by induction on from . All sums are of nonnegative terms, so Tonelli permits the rearrangement regardless of .

Step 3 (the index split). Fix and . Decompose the event under according to the state at the intermediate time: By the Markov property the conditional probability depends on the past only through , and by time-homogeneity . Since , we obtain Combined with Step 2 this reads , the semigroup law for the powers of .

Step 4 (the law of ). Marginalizing the Step 1 identity over and over the initial state weighted by , so the distribution propagates by right-multiplication of the row vector by .

Bridge. This theorem builds toward the entire equilibrium and mixing theory of Markov chains and appears again in every spectral computation of long-run behavior, because converts a probabilistic question about steps into a question about the -th power of one matrix. The foundational reason it is the right organizing tool is that the Markov property collapses the path-space integral over all intermediate histories into a single matrix multiplication: this is exactly the move that turns the search for a stationary distribution into the left-eigenvector equation , and that makes a convergence statement about the second eigenvalue. The semigroup law generalises to the continuous-time Chapman–Kolmogorov relation for transition semigroups, and the matrix is dual to the generator that, in the continuous-state diffusion setting, becomes a second-order differential operator 02.15.03. The central insight is that one stochastic matrix encodes every multi-step probability through its powers, and consistency of these powers is automatic once the one-step memoryless rule is imposed; putting these together, the discrete Markov chain is the simplest nontrivial object in the theory of transition semigroups, and Chapman–Kolmogorov is the algebraic shadow of memorylessness.

Exercises Intermediate+

Advanced results Master

Beyond the one-step matrix and its powers, the theory organizes around the path-space shift, the strong Markov property at stopping times, the kernel-theoretic construction that dispenses with topology, the operator-semigroup viewpoint dual to the matrix powers, and the canonical examples that instantiate each.

Theorem 1 (canonical construction and existence; Ionescu-Tulcea route). Given any distribution and stochastic matrix on countable , a chain exists. On the path space with coordinate maps and the product -algebra 02.07.01, the prescribed finite-dimensional laws are consistent, and the Ionescu-Tulcea theorem builds the unique measure realizing them with no topological hypothesis on , the kernel sequence being the constant kernel at every step. The Kolmogorov extension theorem 37.01.01 gives an alternative existence proof when is given its discrete (hence Polish) topology.

Theorem 2 (Markov property via the shift). Let be the shift , and . For every bounded measurable , every , and every starting law , where and with . This functional form contains the elementary Markov property (take ) and exhibits memorylessness as the statement that conditioning the shifted future on the whole past collapses to a function of the present coordinate.

Theorem 3 (strong Markov property). Let be an -stopping time and . Then for bounded measurable , on , The proof decomposes on , where is deterministic and the simple Markov property (Theorem 2) applies, then sums over using and the -measurability bookkeeping. The countability of the time index — takes values in — is what makes the discrete strong Markov property an immediate corollary of the simple one, in contrast with the continuous-time case where right-continuity and the Blumenthal law are required.

Theorem 4 (transition semigroup and the operator dual). The matrices form a discrete semigroup under multiplication: and (Chapman–Kolmogorov). Acting on bounded functions by and on distributions by , the matrix is simultaneously the one-step transition operator on observables and the Markov operator on measures, adjoint under the pairing . The discrete generator is ; the forward equation and backward equation are the discrete Chapman–Kolmogorov equations, the exact analogues of Kolmogorov's forward and backward differential equations for continuous-state diffusions 02.15.03.

Synthesis. The foundational reason the whole subject coheres is that the memoryless one-step rule, encoded in a single stochastic matrix , generates every multi-step law through the powers , and putting these together, Chapman–Kolmogorov is the semigroup law that makes those powers consistent. The canonical construction (Theorem 1) realizes the chain as the coordinate process on path space, where the shift turns the simple Markov property (Theorem 2) into a statement about conditioning the future on the past, and this is exactly what generalises to the strong Markov property (Theorem 3) once a deterministic time is replaced by a stopping time. The central insight is that the transition matrix is dual to two flows at once — forward on distributions, backward on observables — so the operator and the row-vector action are two faces of one semigroup, and the matrix is dual to the second-order differential generator of a diffusion 02.15.03. The strong Markov property is the bridge to the regenerative structure: excursions between visits to a fixed state are i.i.d., the foundational reason recurrence, transience, and convergence to a stationary solving submit to renewal theory, with the second eigenvalue of controlling the geometric rate of . The Chapman–Kolmogorov identity, the matrix-power formula, the shift-form Markov property, and the operator-semigroup duality are four presentations of the single fact that the future depends on the past only through the present.

Full proof set Master

Proposition 1 (powers of a stochastic matrix are stochastic). If is stochastic on then is stochastic for every .

Proof. Let denote the all-ones column vector. Stochasticity is entrywise and . Nonnegativity of is immediate since products and sums of nonnegative numbers are nonnegative. For the row sums, induct: has , and if then . For countable the matrix-vector products are sums of nonnegative terms, so associativity holds by Tonelli. Hence and , i.e. is stochastic.

Proposition 2 (finite-dimensional law characterizes the chain). A sequence is if and only if for all and all states.

Proof. () Step 1 of the Key theorem derives the product formula from the conditional definition. () Assume the product formula. Taking gives . For the conditional, when the conditioning event has positive probability, which depends on the past only through and is independent of . Thus the simple Markov property and time-homogeneity hold, so is .

Proposition 3 (Chapman–Kolmogorov, full statement). For , with .

Proof. Step 2 of the Key theorem proves by induction; Step 3 proves the index split by conditioning on the intermediate state and invoking the Markov property and time-homogeneity. Together, by associativity of matrix multiplication, the sums converging by Tonelli on nonnegative terms.

Proposition 4 (forward and backward equations). Let be the law of and, for fixed target and horizon , let for . Then (forward) and (backward), with and .

Proof. Forward: by conditioning on and the Markov property. Iterating from gives . Backward: , using Chapman–Kolmogorov with a one-step first move; the terminal condition holds.

Proposition 5 (strong Markov from simple Markov). Let be a stopping time and bounded measurable. Then on .

Proof. Let . Decompose on the value of : Because , the simple Markov property (Theorem 2) gives . On one has , so . Summing over restores : Since this holds for all and is -measurable (it is a function of , which is -measurable), the defining property of conditional expectation gives the claim.

Proposition 6 (two-state spectral formula). For , , one has .

Proof. Eigenvalues solve , factoring as , so , . The left eigenvector for normalized to a distribution is . Writing with the rank-one stationary projection and , the projections are orthogonal idempotents, so . The entry is .

Connections Master

  • The Kolmogorov extension theorem 37.01.01 supplies the existence half of the construction: the consistent finite-dimensional family is exactly the projective system the extension theorem stitches into a measure on path space, with the discrete topology on furnishing the standard-Borel hypothesis; the Markov chain is the first nontrivial process built by feeding kernel-composed marginals to that machine.

  • The -algebra and measurable-space foundations 02.07.01 define the cylinder -algebra on , the filtration , the stopping-time -algebra , and the shift map used to state the simple and strong Markov properties; every measurability claim in this unit is an instance of that framework.

  • The continuous-state diffusion and SDE generator 02.15.03 is the differential analogue of the discrete transition matrix: the discrete Chapman–Kolmogorov semigroup law becomes the transition semigroup of a diffusion, the row-vector forward equation becomes the Fokker–Planck/Kolmogorov forward equation, and the discrete generator becomes the second-order differential generator ; the contrast is the countable state space and discrete time here versus the continuum and continuous paths there.

  • The elementary rules and named distributions of probability 26.02.01 are the concrete shadow: conditional probability, the multiplication rule used in the finite-dimensional law, and the Bernoulli increments of the simple random walk are the discrete-probability primitives this unit lifts into the matrix-and-path-space formalism, with the stationary distribution generalizing the equilibrium of an elementary two-state model.

Historical & philosophical context Master

The concept originates with Andrei A. Markov, who in his 1906 paper [Markov 1906] extended the law of large numbers to sequences of dependent random variables linked by the now-eponymous chain condition, motivated in part by his analysis of the alternation of vowels and consonants in Pushkin's Eugene Onegin. Markov's chains were finite and his interest was the persistence of limit theorems under dependence, not the construction of processes; the matrix formalism and the transition-probability language were consolidated later.

The identity bearing the joint name of Chapman and Kolmogorov has a twofold origin. Sydney Chapman derived the corresponding integral relation in 1928 [Chapman 1928] while studying Brownian displacement and thermal diffusion of suspended grains, in the continuous setting of diffusion physics. Andrei N. Kolmogorov, in his 1931 Mathematische Annalen memoir on the analytic methods of probability [Kolmogorov 1931], placed the relation at the center of a general theory of Markov processes and derived from it the forward and backward differential equations for transition probabilities, founding the analytic theory of continuous-time, continuous-state Markov processes. The discrete matrix version presented here is the elementary case of Kolmogorov's semigroup relation, with matrix multiplication replacing the integral over an intermediate state.

The modern textbook synthesis on countable state spaces, with the simple and strong Markov properties stated via the shift on path space and the construction via consistent finite-dimensional distributions, follows Norris [Norris 1997] and Levin–Peres [Levin-Peres 2017]; the kernel-theoretic construction by Ionescu-Tulcea removes the topological hypothesis that the general extension theorem requires.

Bibliography Master

@book{Norris1997,
  author    = {Norris, James R.},
  title     = {Markov Chains},
  series    = {Cambridge Series in Statistical and Probabilistic Mathematics},
  publisher = {Cambridge University Press},
  year      = {1997}
}

@article{Markov1906,
  author  = {Markov, Andrei A.},
  title   = {Rasprostranenie zakona bol'shih chisel na velichiny, zavisyashchie drug ot druga},
  journal = {Izvestiya Fiziko-matematicheskogo obshchestva pri Kazanskom universitete (2)},
  volume  = {15},
  year    = {1906},
  pages   = {135--156}
}

@article{Chapman1928,
  author  = {Chapman, Sydney},
  title   = {On the {B}rownian displacements and thermal diffusion of grains suspended in a non-uniform fluid},
  journal = {Proceedings of the Royal Society of London A},
  volume  = {119},
  year    = {1928},
  pages   = {34--54}
}

@article{Kolmogorov1931,
  author  = {Kolmogorov, Andrei N.},
  title   = {\"Uber die analytischen {M}ethoden in der {W}ahrscheinlichkeitsrechnung},
  journal = {Mathematische Annalen},
  volume  = {104},
  year    = {1931},
  pages   = {415--458}
}

@book{Durrett2019mc,
  author    = {Durrett, Rick},
  title     = {Probability: Theory and Examples},
  edition   = {5},
  publisher = {Cambridge University Press},
  year      = {2019}
}

@book{LevinPeres2017,
  author    = {Levin, David A. and Peres, Yuval},
  title     = {Markov Chains and Mixing Times},
  edition   = {2},
  publisher = {American Mathematical Society},
  year      = {2017}
}

@book{MeynTweedie2009,
  author    = {Meyn, Sean P. and Tweedie, Richard L.},
  title     = {Markov Chains and Stochastic Stability},
  edition   = {2},
  publisher = {Cambridge University Press},
  year      = {2009}
}