37.05.01 · probability / 05-markov-chains

The Markov Property, Transition Matrices, and the Chapman–Kolmogorov Equations

shipped3 tiersLean: none

Anchor (Master): Norris 1997 *Markov Chains* (Cambridge) §1.1-1.4, §1.7 (strong Markov); Meyn-Tweedie 2009 *Markov Chains and Stochastic Stability* 2e Ch. 3; Levin-Peres 2017 *Markov Chains and Mixing Times* 2e Ch. 1

Intuition Beginner

A Markov chain is a model for a system that moves between a list of possible situations, called states, one step at a time, where the only thing that matters for the next move is where the system is right now. The whole history of how it got there is irrelevant. This is the memoryless property: the past affects the future only through the present.

Think of a board game where each square tells you the chances of where you land next. If you are on the "rainy" square, the rules might say you go to "rainy" again with chance seven in ten and to "sunny" with chance three in ten, and these chances are the same no matter how you arrived at "rainy" or how long you have been playing. To run the model you need two things: a starting square, given as a list of chances over all squares, and a table that lists, for every square, the chances of jumping to each other square in one step. That table is the transition matrix.

The transition matrix is just bookkeeping for one-step moves, but it secretly contains all the longer-range information too. Suppose you want the chance of going from "rainy" today to "sunny" two days from now. You list every possible weather tomorrow, multiply the chance of reaching it tomorrow by the chance of then reaching "sunny" the day after, and add up over all the in-between possibilities. This add-up-over-the-middle rule is the Chapman–Kolmogorov equation. It says the two-step table is built by chaining two one-step tables, and more generally the table for any number of steps is built by chaining the one-step table that many times.

The reason this matters is that an enormous range of real processes are memoryless once you choose the state cleverly: shuffling a deck, a random walk on a network, a queue at a counter, the spread of a rumor through groups. The single assumption "the future depends on the past only through the present" turns each of these into an object you can compute with using nothing harder than multiplying tables.

The one-sentence takeaway: a Markov chain is a memoryless step-by-step process described by a starting distribution and a one-step transition table, and the chance of any longer trip is found by the Chapman–Kolmogorov rule of summing over all the intermediate stops.

Visual Beginner

Picture two weather states drawn as two circles, "Sunny" and "Rainy," with arrows showing the one-step chances of moving between them.

Left: the chain as a labeled graph; each arrow weight is a one-step transition probability and the arrows out of each circle sum to one. Right: the Chapman–Kolmogorov rule for two steps from Sunny to Sunny, summing over the two possible middle states. The same picture, with more circles and arrows, describes any finite-state chain; the transition matrix is just the table of arrow weights.

Worked example Beginner

We compute two-step weather chances for the chain above. The one-step table reads: from Sunny, stay Sunny with chance $0.8$ and switch to Rainy with chance $0.2$ ; from Rainy, switch to Sunny with chance $0.3$ and stay Rainy with chance $0.7$ .

Step 1. Find the chance of Sunny today to Sunny in two days. There are two ways to make the trip: Sunny then Sunny then Sunny, or Sunny then Rainy then Sunny. List the middle state and multiply along each path.

Step 2. Path through Sunny: chance $0.8$ to stay Sunny on day one, then chance $0.8$ to stay Sunny on day two, giving $0.8 \times 0.8 = 0.64$ .

Step 3. Path through Rainy: chance $0.2$ to switch to Rainy on day one, then chance $0.3$ to switch back to Sunny on day two, giving $0.2 \times 0.3 = 0.06$ .

Step 4. Add the two paths, because they are the separate ways the same two-step trip can happen: $0.64 + 0.06 = 0.70$ . So the chance of Sunny-to-Sunny in two steps is $0.70$ .

Step 5. Check the bookkeeping. The other two-step chance from Sunny is Sunny-to-Rainy, which must be $1 - 0.70 = 0.30$ because the system has to be somewhere. Directly: $0.8 \times 0.2 + 0.2 \times 0.7 = 0.16 + 0.14 = 0.30$ , which matches.

What this tells us: the two-step chances came entirely from multiplying and adding the one-step chances, summing over the middle state. That summing-over-the-middle is exactly the Chapman–Kolmogorov rule, and it is the same operation as multiplying the transition table by itself. No new information beyond the one-step table was ever needed.

Check your understanding Beginner

Exercise (easy, multiple choice).

What does the "memoryless" (Markov) property say about a Markov chain?

A. The chance of where it goes next depends only on the current state, not on the earlier history B. The chain never visits the same state twice C. Every state is equally likely at every step D. The chain always returns to its starting state

Hint

Recall the weather board game: the rules for leaving a square depend only on which square you are on now.

Answer

A. The next move depends only on the current state. Memorylessness means the past influences the future only through the present state; how the chain reached its current state is irrelevant. Feedback-correct: this single assumption is the whole definition of a Markov chain. Feedback-wrong: B, C, and D describe special behaviors some chains may or may not have, but none is the Markov property itself.

Formal definition Intermediate+

Throughout, $I$ is a countable set, the state space, and time is discrete, indexed by $Z_{\geq 0}$ . All random objects live on a probability space $(Ω, F, P)$ 37.01.01.

Definition (stochastic matrix). A stochastic matrix on $I$ is a family $P = (p_{ij})_{i, j \in I}$ with $p_{ij} \geq 0$ for all $i, j$ and $\sum_{j \in I} p_{ij} = 1$ for every $i \in I$ . Each row $(p_{ij})_{j \in I}$ is thus a probability distribution on $I$ , the transition distribution out of $i$ . A distribution on $I$ is a row vector $λ = (λ_{i})_{i \in I}$ with $λ_{i} \geq 0$ and $\sum_{i} λ_{i} = 1$ .

Definition (Markov chain; $(λ, P)$ -chain). A sequence of $I$ -valued random variables $(X_{n})_{n \geq 0}$ is a time-homogeneous Markov chain with initial distribution $λ$ and transition matrix $P$ — a $Markov (λ, P)$ chain — if $P (X_{0} = i_{0}) = λ_{i_{0}}$ and, for every $n \geq 0$ and every $i_{0}, \dots, i_{n + 1} \in I$ with $P (X_{0} = i_{0}, \dots, X_{n} = i_{n}) > 0$ , $P (X_{n + 1} = i_{n + 1} ∣ X_{0} = i_{0}, \dots, X_{n} = i_{n}) = P (X_{n + 1} = i_{n + 1} ∣ X_{n} = i_{n}) = p_{i_{n} i_{n + 1}} .$ The first equality is the (simple) Markov property: the conditional law of the next state given the whole past depends on the past only through the present state $X_{n}$ . The second is time-homogeneity: this conditional law does not depend on $n$ .

Definition (finite-dimensional law of the chain). Equivalently, $(X_{n})_{n \geq 0}$ is $Markov (λ, P)$ if and only if for every $n \geq 0$ and every $i_{0}, \dots, i_{n} \in I$ , $P (X_{0} = i_{0}, X_{1} = i_{1}, \dots, X_{n} = i_{n}) = λ_{i_{0}} p_{i_{0} i_{1}} p_{i_{1} i_{2}} \dots p_{i_{n - 1} i_{n}} .$ These finite-dimensional distributions are consistent in the sense of 37.01.01, so the Kolmogorov extension theorem produces the law of $(X_{n})$ on the path space $I^{Z_{\geq 0}}$ with its cylinder $σ$ -algebra 02.07.01; the canonical realization takes $X_{n} = π_{n}$ to be the coordinate maps.

Definition ( $n$ -step transition matrix). The $n$ -step transition probabilities are $p_{ij}^{(n)} = P (X_{n} = j ∣ X_{0} = i)$ for $i$ with $λ_{i} > 0$ , extended to all $i$ by the matrix power. The convention $p_{ij}^{(0)} = δ_{ij}$ (the identity matrix $I_{d}$ ) and $p_{ij}^{(1)} = p_{ij}$ holds, and $P^{(n)} = (p_{ij}^{(n)})$ denotes the matrix of $n$ -step probabilities. Matrix products are defined by the usual rule $(A B)_{ij} = \sum_{k \in I} A_{ik} B_{k j}$ , the sum converging because the entries are nonnegative with bounded row sums.

Definition (notation for the chain started at $i$ ). Write $P_{i} (\cdot) = P (\cdot ∣ X_{0} = i)$ for the law of the chain started deterministically at $i$ (initial distribution the point mass $δ_{i}$ ), and $E_{i}$ for the corresponding expectation. Then $p_{ij}^{(n)} = P_{i} (X_{n} = j)$ .

Counterexamples to common slips Intermediate+

The Markov property is not "independence of the past." The future is generally dependent on the past; the property is that this dependence is mediated entirely by the present state. Conditioning on $X_{n}$ makes the future $σ (X_{n + 1}, X_{n + 2}, \dots)$ and the past $σ (X_{0}, \dots, X_{n - 1})$ conditionally independent — not independent.
A function of a Markov chain need not be Markov. If $(X_{n})$ is a Markov chain and $f : I \to J$ is not injective, $(f (X_{n}))$ is usually not a Markov chain, because $f (X_{n})$ can fail to determine the transition law that $X_{n}$ determines. Lumping states preserves the Markov property only under the Kemeny–Snell strong-lumpability condition that the collapsed transition probabilities are constant on each block.
Time-homogeneity is an extra assumption. Dropping it gives an inhomogeneous chain with a different matrix $P_{n}$ at each step; then $p^{(m + n)} = P_{1} \dots P_{m + n}$ is an ordered product and the clean power $P^{m + n}$ is unavailable. The Chapman–Kolmogorov identity survives as $P^{(m + n)} = P^{(m)} P_{[m, m + n]}$ , but the matrices no longer commute or coincide.
Rows sum to one, not columns. $P$ is stochastic by rows. A matrix that is stochastic by columns transports distributions the other way; a doubly stochastic matrix (both row and column sums one) is the special case for which the uniform distribution is stationary.

Key theorem with proof Intermediate+

Theorem (Chapman–Kolmogorov; $n$ -step transitions are matrix powers). Let $(X_{n})_{n \geq 0}$ be $Markov (λ, P)$ on a countable state space $I$ . Then for all $m, n \geq 0$ and all $i, j \in I$ , $p_{ij}^{(m + n)} = k \in I \sum p_{ik}^{(m)} p_{k j}^{(n)}, equivalently P^{(m + n)} = P^{(m)} P^{(n)} .$ Consequently $P^{(n)} = P^{n}$ is the $n$ -th matrix power of $P$ , and the law of $X_{n}$ from initial distribution $λ$ is the row vector $λ P^{n}$ : $P (X_{n} = j) = (λ P^{n})_{j} = \sum_{i} λ_{i} p_{ij}^{(n)}$ ^{[Norris 1997 §1.1-1.2]}.

Proof. We first record the finite-dimensional law and then split the trajectory at an intermediate time.

Step 1 (finite-dimensional law). By the definition of a $(λ, P)$ -chain and the multiplication rule for conditional probabilities, $P (X_{0} = i_{0}, \dots, X_{n} = i_{n}) = P (X_{0} = i_{0}) r = 0 \prod n - 1 P (X_{r + 1} = i_{r + 1} ∣ X_{0} = i_{0}, \dots, X_{r} = i_{r}) = λ_{i_{0}} r = 0 \prod n - 1 p_{i_{r} i_{r + 1}},$ each conditional factor reducing to $p_{i_{r} i_{r + 1}}$ by the Markov property and time-homogeneity. (Terms with a vanishing conditioning probability contribute zero on both sides and may be dropped.)

Step 2 ( $n$ -step probability as a sum over paths). Summing the Step 1 identity over the intermediate states $i_{1}, \dots, i_{n - 1}$ with $i_{0} = i$ , $i_{n} = j$ fixed, and dividing by $λ_{i} = P_{i} (X_{0} = i)$ under $P_{i}$ , $p_{ij}^{(n)} = P_{i} (X_{n} = j) = i_{1}, \dots, i_{n - 1} \in I \sum p_{i i_{1}} p_{i_{1} i_{2}} \dots p_{i_{n - 1} j} = (P^{n})_{ij},$ which is exactly the entrywise formula for the $n$ -fold matrix product, established by induction on $n$ from $(P^{n})_{ij} = \sum_{k} (P^{n - 1})_{ik} p_{k j}$ . All sums are of nonnegative terms, so Tonelli permits the rearrangement regardless of $∣ I ∣$ .

Step 3 (the index split). Fix $m, n \geq 0$ and $i, j$ . Decompose the event ${X_{m + n} = j}$ under $P_{i}$ according to the state $X_{m} = k$ at the intermediate time: $p_{ij}^{(m + n)} = P_{i} (X_{m + n} = j) = k \in I \sum P_{i} (X_{m} = k, X_{m + n} = j) = k \in I \sum P_{i} (X_{m} = k) P_{i} (X_{m + n} = j ∣ X_{m} = k) .$ By the Markov property the conditional probability depends on the past ${X_{0} = i, \dots, X_{m} = k}$ only through $X_{m} = k$ , and by time-homogeneity $P_{i} (X_{m + n} = j ∣ X_{m} = k) = P_{k} (X_{n} = j) = p_{k j}^{(n)}$ . Since $P_{i} (X_{m} = k) = p_{ik}^{(m)}$ , we obtain $p_{ij}^{(m + n)} = k \in I \sum p_{ik}^{(m)} p_{k j}^{(n)} = (P^{(m)} P^{(n)})_{ij} .$ Combined with Step 2 this reads $P^{m + n} = P^{m} P^{n}$ , the semigroup law for the powers of $P$ .

Step 4 (the law of $X_{n}$ ). Marginalizing the Step 1 identity over $i_{1}, \dots, i_{n - 1}$ and over the initial state weighted by $λ$ , $P (X_{n} = j) = i \in I \sum λ_{i} p_{ij}^{(n)} = i \sum λ_{i} (P^{n})_{ij} = (λ P^{n})_{j},$ so the distribution propagates by right-multiplication of the row vector $λ$ by $P^{n}$ . $□$

Bridge. This theorem builds toward the entire equilibrium and mixing theory of Markov chains and appears again in every spectral computation of long-run behavior, because $p_{ij}^{(n)} = (P^{n})_{ij}$ converts a probabilistic question about $n$ steps into a question about the $n$ -th power of one matrix. The foundational reason it is the right organizing tool is that the Markov property collapses the path-space integral over all intermediate histories into a single matrix multiplication: this is exactly the move that turns the search for a stationary distribution $π$ into the left-eigenvector equation $π P = π$ , and that makes $P^{n} \to 1^{⊤} π$ a convergence statement about the second eigenvalue. The semigroup law $P^{m + n} = P^{m} P^{n}$ generalises to the continuous-time Chapman–Kolmogorov relation $P_{s + t} = P_{s} P_{t}$ for transition semigroups, and the matrix $P$ is dual to the generator that, in the continuous-state diffusion setting, becomes a second-order differential operator 02.15.03. The central insight is that one stochastic matrix encodes every multi-step probability through its powers, and consistency of these powers is automatic once the one-step memoryless rule is imposed; putting these together, the discrete Markov chain is the simplest nontrivial object in the theory of transition semigroups, and Chapman–Kolmogorov is the algebraic shadow of memorylessness.

Exercises Intermediate+

Exercise 3 (medium, symbolic).

Prove the Chapman–Kolmogorov identity $p_{ij}^{(m + n)} = \sum_{k} p_{ik}^{(m)} p_{k j}^{(n)}$ purely from the matrix-power fact $P^{(r)} = P^{r}$ , without re-deriving it probabilistically.

Hint

Use associativity of matrix multiplication: $P^{m + n} = P^{m} P^{n}$ .

Answer

Matrix multiplication is associative, so $P^{m + n} = P^{m} \cdot P^{n}$ . Reading off the $(i, j)$ entry with the definition $(A B)_{ij} = \sum_{k} A_{ik} B_{k j}$ gives $(P^{m + n})_{ij} = \sum_{k \in I} (P^{m})_{ik} (P^{n})_{k j}$ . Substituting $P^{(r)} = P^{r}$ yields $p_{ij}^{(m + n)} = \sum_{k} p_{ik}^{(m)} p_{k j}^{(n)}$ . The probabilistic content is entirely in the identity $P^{(r)} = P^{r}$ (Step 2 of the Key theorem); once that is granted, Chapman–Kolmogorov is associativity of matrix multiplication restated for transition matrices. Convergence of the sum is automatic for finite $I$ and holds by Tonelli for countable $I$ since all terms are nonnegative.

Exercise 4 (medium, symbolic).

Let $(X_{n})$ be $Markov (λ, P)$ . Prove that the reversed-pair conditional law factors: for $n \geq 1$ , given $X_{n} = k$ , the past $(X_{0}, \dots, X_{n - 1})$ and the future $(X_{n + 1}, X_{n + 2}, \dots)$ are conditionally independent.

Hint

Use the finite-dimensional law $P (X_{0} = i_{0}, \dots, X_{n + m} = i_{n + m}) = λ_{i_{0}} \prod p_{i_{r} i_{r + 1}}$ and factor the product at $r = n$ .

Answer

Fix $k$ and condition on ${X_{n} = k}$ (assume positive probability). For a past block $a = (i_{0}, \dots, i_{n - 1})$ ending so that the step into $k$ is allowed and a future block $b = (i_{n + 1}, \dots, i_{n + m})$ , $P (past = a, future = b ∣ X_{n} = k) = \frac{λ _{i_{0}} ( \prod _{r = 0}^{n - 1} p _{i_{r} i_{r + 1}} ) ( \prod _{r = n}^{n + m - 1} p _{i_{r} i_{r + 1}} )}{P ( X _{n} = k )},$ with $i_{n} = k$ . The numerator factors as $[λ_{i_{0}} \prod_{r = 0}^{n - 1} p_{i_{r} i_{r + 1}}] \cdot [\prod_{r = n}^{n + m - 1} p_{i_{r} i_{r + 1}}]$ , the first bracket depending only on $a$ (and $k$ ), the second only on $b$ (and $k$ ). Dividing by $P (X_{n} = k) = \sum_{i_{0}, \dots} λ_{i_{0}} \prod_{r = 0}^{n - 1} p_{i_{r} i_{r + 1}}$ (summed with $i_{n} = k$ ) normalizes the past factor to $P (past = a ∣ X_{n} = k)$ , and the future factor is $P_{k} (X_{1} = i_{n + 1}, \dots, X_{m} = i_{n + m}) = P (future = b ∣ X_{n} = k)$ . Hence the joint conditional is the product of the two conditional laws, which is conditional independence.

Exercise 6 (medium, symbolic).

Consider the simple random walk on $Z$ : $p_{i, i + 1} = p$ , $p_{i, i - 1} = q = 1 - p$ , and $p_{ij} = 0$ otherwise. Show that this is a time-homogeneous Markov chain and compute $p_{00}^{(2)}$ , the chance of returning to the origin in two steps.

Hint

The two-step return passes through $+ 1$ then back, or $- 1$ then back. Sum over the middle state.

Answer

The transition probabilities $p_{ij}$ depend only on the displacement $j - i \in {+ 1, - 1}$ and not on $n$ , so each row is the same shifted distribution and $\sum_{j} p_{ij} = p + q = 1$ ; hence $P$ is a stochastic matrix and the walk is a time-homogeneous Markov chain on $I = Z$ . For the two-step return, Chapman–Kolmogorov gives $p_{00}^{(2)} = \sum_{k} p_{0 k} p_{k 0} = p_{0, 1} p_{1, 0} + p_{0, - 1} p_{- 1, 0} = p \cdot q + q \cdot p = 2 pq$ . For the symmetric walk $p = q = 1/2$ this is $1/2$ . The walk is the prototype memoryless process whose increments are i.i.d.; statelessness of the step distribution is exactly the Markov property here.

Exercise 7 (hard, symbolic).

Diagonalize the two-state chain $P = (1 - a b a 1 - b)$ with $a, b \in (0, 1)$ and derive a closed form for $p_{11}^{(n)}$ . Identify the stationary distribution.

Hint

The eigenvalues are $1$ and $1 - a - b$ . Write $P^{n}$ as a combination of the two eigen-projections.

Answer

The characteristic polynomial gives eigenvalues $λ_{1} = 1$ and $λ_{2} = 1 - a - b$ (since $det P = (1 - a) (1 - b) - ab = 1 - a - b$ and $tr P = 2 - a - b$ ). The stationary row vector solves $π P = π$ with $π_{1} + π_{2} = 1$ , giving $π = (\frac{b}{a + b}, \frac{a}{a + b})$ . Spectral decomposition yields $P^{n} = (π_{1} π_{1} π_{2} π_{2}) + (1 - a - b)^{n} (π_{2} - π_{1} - π_{2} π_{1}),$ so reading the $(1, 1)$ entry, $p_{11}^{(n)} = \frac{b}{a + b} + \frac{a}{a + b} (1 - a - b)^{n}$ . Since $∣ 1 - a - b ∣ < 1$ , $p_{11}^{(n)} \to b / (a + b) = π_{1}$ as $n \to \infty$ , the row of $P^{n}$ converging to $π$ . This is the simplest instance of geometric convergence to equilibrium with rate governed by the second eigenvalue $λ_{2}$ .

Exercise 8 (hard, symbolic).

State the strong Markov property and use it to show that for a state $i$ and the first return time $T_{i} = in f {n \geq 1 : X_{n} = i}$ , the successive return times to $i$ have i.i.d. increments under $P_{i}$ (the regenerative structure).

Hint

Apply the strong Markov property at $T_{i}$ : conditionally on ${T_{i} < \infty}$ , the post- $T_{i}$ chain is $Markov (δ_{i}, P)$ and independent of the pre- $T_{i}$ path.

Answer

The strong Markov property states that if $T$ is a stopping time for the filtration $F_{n} = σ (X_{0}, \dots, X_{n})$ , then conditionally on ${T < \infty}$ and ${X_{T} = i}$ , the post- $T$ process $(X_{T + m})_{m \geq 0}$ is $Markov (δ_{i}, P)$ and is independent of $F_{T}$ . Let $T_{i}^{(1)} = T_{i}$ and inductively $T_{i}^{(r + 1)} = in f {n > T_{i}^{(r)} : X_{n} = i}$ , the successive return times, with increments $S_{r} = T_{i}^{(r + 1)} - T_{i}^{(r)}$ . Each $T_{i}^{(r)}$ is a stopping time with $X_{T_{i}^{(r)}} = i$ on its finiteness. Applying the strong Markov property at $T_{i}^{(r)}$ : the post- $T_{i}^{(r)}$ chain starts afresh at $i$ and is independent of the path up to $T_{i}^{(r)}$ , so $S_{r + 1}$ has the same law as $T_{i}$ under $P_{i}$ and is independent of $S_{1}, \dots, S_{r}$ . Hence the excursion lengths $(S_{r})$ are i.i.d. with the law of $T_{i}$ under $P_{i}$ . This regenerative decomposition is the engine behind recurrence/transience dichotomies and the renewal-theoretic proof of the ergodic theorem for chains; it fails for the simple (non-strong) Markov property because $T_{i}$ is a random time, not a fixed deterministic time.

Advanced results Master

Beyond the one-step matrix and its powers, the theory organizes around the path-space shift, the strong Markov property at stopping times, the kernel-theoretic construction that dispenses with topology, the operator-semigroup viewpoint dual to the matrix powers, and the canonical examples that instantiate each.

Theorem 1 (canonical construction and existence; Ionescu-Tulcea route). Given any distribution $λ$ and stochastic matrix $P$ on countable $I$ , a $Markov (λ, P)$ chain exists. On the path space $Ω = I^{Z_{\geq 0}}$ with coordinate maps $X_{n} = π_{n}$ and the product $σ$ -algebra 02.07.01, the prescribed finite-dimensional laws $P (X_{0} = i_{0}, \dots, X_{n} = i_{n}) = λ_{i_{0}} p_{i_{0} i_{1}} \dots p_{i_{n - 1} i_{n}}$ are consistent, and the Ionescu-Tulcea theorem builds the unique measure $P_{λ}$ realizing them with no topological hypothesis on $I$ , the kernel sequence being the constant kernel $P$ at every step. The Kolmogorov extension theorem 37.01.01 gives an alternative existence proof when $I$ is given its discrete (hence Polish) topology.

Theorem 2 (Markov property via the shift). Let $θ : Ω \to Ω$ be the shift $(θ ω)_{n} = ω_{n + 1}$ , and $θ_{m} = θ^{m}$ . For every bounded measurable $F : Ω \to R$ , every $m \geq 0$ , and every starting law $λ$ , $E_{λ} [F \circ θ_{m} F_{m}] = E_{X_{m}} [F] P_{λ} -a.s.,$ where $F_{m} = σ (X_{0}, \dots, X_{m})$ and $E_{X_{m}} [F] = h (X_{m})$ with $h (i) = E_{i} [F]$ . This functional form contains the elementary Markov property (take $F (ω) = 1 {ω_{n} = j}$ ) and exhibits memorylessness as the statement that conditioning the shifted future on the whole past collapses to a function of the present coordinate.

Theorem 3 (strong Markov property). Let $T$ be an $(F_{n})$ -stopping time and $F_{T} = {A \in F : A \cap {T = n} \in F_{n} \forall n}$ . Then for bounded measurable $F$ , on ${T < \infty}$ , $E_{λ} [F \circ θ_{T} F_{T}] = E_{X_{T}} [F] P_{λ} -a.s. on {T < \infty} .$ The proof decomposes on ${T = n}$ , where $T$ is deterministic and the simple Markov property (Theorem 2) applies, then sums over $n$ using ${T = n} \in F_{n}$ and the $F_{T}$ -measurability bookkeeping. The countability of the time index — $T$ takes values in $Z_{\geq 0} \cup {\infty}$ — is what makes the discrete strong Markov property an immediate corollary of the simple one, in contrast with the continuous-time case where right-continuity and the Blumenthal $0$ – $1$ law are required.

Theorem 4 (transition semigroup and the operator dual). The matrices $(P^{n})_{n \geq 0}$ form a discrete semigroup under multiplication: $P^{0} = I_{d}$ and $P^{m + n} = P^{m} P^{n}$ (Chapman–Kolmogorov). Acting on bounded functions $f : I \to R$ by $(P f) (i) = \sum_{j} p_{ij} f (j) = E_{i} [f (X_{1})]$ and on distributions by $λ \mapsto λ P$ , the matrix $P$ is simultaneously the one-step transition operator on observables and the Markov operator on measures, adjoint under the pairing $⟨ λ, f ⟩ = \sum_{i} λ_{i} f (i)$ . The discrete generator is $P - I_{d}$ ; the forward equation $μ_{n + 1} = μ_{n} P$ and backward equation $f_{n + 1} = P f_{n}$ are the discrete Chapman–Kolmogorov equations, the exact analogues of Kolmogorov's forward and backward differential equations for continuous-state diffusions 02.15.03.

Synthesis. The foundational reason the whole subject coheres is that the memoryless one-step rule, encoded in a single stochastic matrix $P$ , generates every multi-step law through the powers $P^{n}$ , and putting these together, Chapman–Kolmogorov is the semigroup law that makes those powers consistent. The canonical construction (Theorem 1) realizes the chain as the coordinate process on path space, where the shift turns the simple Markov property (Theorem 2) into a statement about conditioning the future on the past, and this is exactly what generalises to the strong Markov property (Theorem 3) once a deterministic time is replaced by a stopping time. The central insight is that the transition matrix is dual to two flows at once — forward on distributions, backward on observables — so the operator $P$ and the row-vector action $λ \mapsto λ P$ are two faces of one semigroup, and the matrix $P - I_{d}$ is dual to the second-order differential generator of a diffusion 02.15.03. The strong Markov property is the bridge to the regenerative structure: excursions between visits to a fixed state are i.i.d., the foundational reason recurrence, transience, and convergence to a stationary $π$ solving $π P = π$ submit to renewal theory, with the second eigenvalue of $P$ controlling the geometric rate of $P^{n} \to 1^{⊤} π$ . The Chapman–Kolmogorov identity, the matrix-power formula, the shift-form Markov property, and the operator-semigroup duality are four presentations of the single fact that the future depends on the past only through the present.

Full proof set Master

Proposition 1 (powers of a stochastic matrix are stochastic). If $P$ is stochastic on $I$ then $P^{n}$ is stochastic for every $n \geq 0$ .

Proof. Let $1$ denote the all-ones column vector. Stochasticity is $P \geq 0$ entrywise and $P 1 = 1$ . Nonnegativity of $P^{n}$ is immediate since products and sums of nonnegative numbers are nonnegative. For the row sums, induct: $P^{0} = I_{d}$ has $I_{d} 1 = 1$ , and if $P^{n} 1 = 1$ then $P^{n + 1} 1 = P (P^{n} 1) = P 1 = 1$ . For countable $I$ the matrix-vector products are sums of nonnegative terms, so associativity holds by Tonelli. Hence $P^{n} \geq 0$ and $P^{n} 1 = 1$ , i.e. $P^{n}$ is stochastic. $□$

Proposition 2 (finite-dimensional law characterizes the chain). A sequence $(X_{n})$ is $Markov (λ, P)$ if and only if $P (X_{0} = i_{0}, \dots, X_{n} = i_{n}) = λ_{i_{0}} p_{i_{0} i_{1}} \dots p_{i_{n - 1} i_{n}}$ for all $n$ and all states.

Proof. ( $\Rightarrow$ ) Step 1 of the Key theorem derives the product formula from the conditional definition. ( $\Leftarrow$ ) Assume the product formula. Taking $n = 0$ gives $P (X_{0} = i_{0}) = λ_{i_{0}}$ . For the conditional, when the conditioning event has positive probability, $P (X_{n + 1} = i_{n + 1} ∣ X_{0} = i_{0}, \dots, X_{n} = i_{n}) = \frac{λ _{i_{0}} p _{i_{0} i_{1}} \dots p _{i_{n} i_{n + 1}}}{λ _{i_{0}} p _{i_{0} i_{1}} \dots p _{i_{n - 1} i_{n}}} = p_{i_{n} i_{n + 1}},$ which depends on the past only through $i_{n}$ and is independent of $n$ . Thus the simple Markov property and time-homogeneity hold, so $(X_{n})$ is $Markov (λ, P)$ . $□$

Proposition 3 (Chapman–Kolmogorov, full statement). For $m, n \geq 0$ , $P^{(m + n)} = P^{(m)} P^{(n)}$ with $P^{(r)} = P^{r}$ .

Proof. Step 2 of the Key theorem proves $P^{(r)} = P^{r}$ by induction; Step 3 proves the index split by conditioning on the intermediate state and invoking the Markov property and time-homogeneity. Together, $P^{(m + n)} = P^{m + n} = P^{m} P^{n} = P^{(m)} P^{(n)}$ by associativity of matrix multiplication, the sums converging by Tonelli on nonnegative terms. $□$

Proposition 4 (forward and backward equations). Let $μ_{n}$ be the law of $X_{n}$ and, for fixed target $j$ and horizon $N$ , let $u_{n} (i) = P_{i} (X_{N} = j) = p_{ij}^{(N - n)}$ for $0 \leq n \leq N$ . Then $μ_{n + 1} = μ_{n} P$ (forward) and $u_{n} = P u_{n + 1}$ (backward), with $μ_{n} = λ P^{n}$ and $u_{N} (i) = δ_{ij}$ .

Proof. Forward: $μ_{n + 1} (j) = P (X_{n + 1} = j) = \sum_{i} P (X_{n} = i) p_{ij} = (μ_{n} P) (j)$ by conditioning on $X_{n}$ and the Markov property. Iterating from $μ_{0} = λ$ gives $μ_{n} = λ P^{n}$ . Backward: $u_{n} (i) = p_{ij}^{(N - n)} = \sum_{k} p_{ik} p_{k j}^{(N - n - 1)} = \sum_{k} p_{ik} u_{n + 1} (k) = (P u_{n + 1}) (i)$ , using Chapman–Kolmogorov with a one-step first move; the terminal condition $u_{N} (i) = p_{ij}^{(0)} = δ_{ij}$ holds. $□$

Proposition 5 (strong Markov from simple Markov). Let $T$ be a stopping time and $F$ bounded measurable. Then $E_{λ} [F \circ θ_{T} ∣ F_{T}] = E_{X_{T}} [F]$ on ${T < \infty}$ .

Proof. Let $A \in F_{T}$ . Decompose on the value of $T$ : $E_{λ} [1_{A} 1_{{T < \infty}} F \circ θ_{T}] = n \geq 0 \sum E_{λ} [1_{A \cap {T = n}} F \circ θ_{n}] .$ Because $A \cap {T = n} \in F_{n}$ , the simple Markov property (Theorem 2) gives $E_{λ} [1_{A \cap {T = n}} F \circ θ_{n}] = E_{λ} [1_{A \cap {T = n}} E_{X_{n}} [F]]$ . On ${T = n}$ one has $X_{n} = X_{T}$ , so $E_{X_{n}} [F] = E_{X_{T}} [F]$ . Summing over $n$ restores ${T < \infty}$ : $E_{λ} [1_{A} 1_{{T < \infty}} F \circ θ_{T}] = E_{λ} [1_{A} 1_{{T < \infty}} E_{X_{T}} [F]] .$ Since this holds for all $A \in F_{T}$ and $E_{X_{T}} [F]$ is $F_{T}$ -measurable (it is a function of $X_{T}$ , which is $F_{T}$ -measurable), the defining property of conditional expectation gives the claim. $□$

Proposition 6 (two-state spectral formula). For $P = (1 - a b a 1 - b)$ , $a + b \in (0, 2)$ , one has $p_{11}^{(n)} = \frac{b}{a + b} + \frac{a}{a + b} (1 - a - b)^{n}$ .

Proof. Eigenvalues solve $det (P - λ I_{d}) = (1 - a - λ) (1 - b - λ) - ab = λ^{2} - (2 - a - b) λ + (1 - a - b) = 0$ , factoring as $(λ - 1) (λ - (1 - a - b))$ , so $λ_{1} = 1$ , $λ_{2} = 1 - a - b$ . The left eigenvector for $λ_{1}$ normalized to a distribution is $π = (b, a) / (a + b)$ . Writing $P = λ_{1} Π_{1} + λ_{2} Π_{2}$ with $Π_{1} = 1 π$ the rank-one stationary projection and $Π_{2} = I_{d} - Π_{1}$ , the projections are orthogonal idempotents, so $P^{n} = Π_{1} + λ_{2}^{n} Π_{2}$ . The $(1, 1)$ entry is $π_{1} + (1 - a - b)^{n} (1 - π_{1}) = \frac{b}{a + b} + \frac{a}{a + b} (1 - a - b)^{n}$ . $□$

Connections Master

The Kolmogorov extension theorem 37.01.01 supplies the existence half of the construction: the consistent finite-dimensional family $P (X_{0} = i_{0}, \dots, X_{n} = i_{n}) = λ_{i_{0}} \prod p_{i_{r} i_{r + 1}}$ is exactly the projective system the extension theorem stitches into a measure on path space, with the discrete topology on $I$ furnishing the standard-Borel hypothesis; the Markov chain is the first nontrivial process built by feeding kernel-composed marginals to that machine.
The $σ$ -algebra and measurable-space foundations 02.07.01 define the cylinder $σ$ -algebra on $I^{Z_{\geq 0}}$ , the filtration $F_{n} = σ (X_{0}, \dots, X_{n})$ , the stopping-time $σ$ -algebra $F_{T}$ , and the shift map used to state the simple and strong Markov properties; every measurability claim in this unit is an instance of that framework.
The continuous-state diffusion and SDE generator 02.15.03 is the differential analogue of the discrete transition matrix: the discrete Chapman–Kolmogorov semigroup law $P_{s + t} = P_{s} P_{t}$ becomes the transition semigroup of a diffusion, the row-vector forward equation $μ_{n + 1} = μ_{n} P$ becomes the Fokker–Planck/Kolmogorov forward equation, and the discrete generator $P - I_{d}$ becomes the second-order differential generator $L$ ; the contrast is the countable state space and discrete time here versus the continuum and continuous paths there.
The elementary rules and named distributions of probability 26.02.01 are the concrete shadow: conditional probability, the multiplication rule used in the finite-dimensional law, and the Bernoulli increments of the simple random walk are the discrete-probability primitives this unit lifts into the matrix-and-path-space formalism, with the stationary distribution generalizing the equilibrium of an elementary two-state model.

Historical & philosophical context Master

The concept originates with Andrei A. Markov, who in his 1906 paper ^{[Markov 1906]} extended the law of large numbers to sequences of dependent random variables linked by the now-eponymous chain condition, motivated in part by his analysis of the alternation of vowels and consonants in Pushkin's Eugene Onegin. Markov's chains were finite and his interest was the persistence of limit theorems under dependence, not the construction of processes; the matrix formalism and the transition-probability language were consolidated later.

The identity bearing the joint name of Chapman and Kolmogorov has a twofold origin. Sydney Chapman derived the corresponding integral relation in 1928 ^{[Chapman 1928]} while studying Brownian displacement and thermal diffusion of suspended grains, in the continuous setting of diffusion physics. Andrei N. Kolmogorov, in his 1931 Mathematische Annalen memoir on the analytic methods of probability ^{[Kolmogorov 1931]}, placed the relation at the center of a general theory of Markov processes and derived from it the forward and backward differential equations for transition probabilities, founding the analytic theory of continuous-time, continuous-state Markov processes. The discrete matrix version presented here is the elementary case of Kolmogorov's semigroup relation, with matrix multiplication replacing the integral over an intermediate state.

The modern textbook synthesis on countable state spaces, with the simple and strong Markov properties stated via the shift on path space and the construction via consistent finite-dimensional distributions, follows Norris ^{[Norris 1997]} and Levin–Peres ^{[Levin-Peres 2017]}; the kernel-theoretic construction by Ionescu-Tulcea removes the topological hypothesis that the general extension theorem requires.

Bibliography Master

@book{Norris1997,
  author    = {Norris, James R.},
  title     = {Markov Chains},
  series    = {Cambridge Series in Statistical and Probabilistic Mathematics},
  publisher = {Cambridge University Press},
  year      = {1997}
}

@article{Markov1906,
  author  = {Markov, Andrei A.},
  title   = {Rasprostranenie zakona bol'shih chisel na velichiny, zavisyashchie drug ot druga},
  journal = {Izvestiya Fiziko-matematicheskogo obshchestva pri Kazanskom universitete (2)},
  volume  = {15},
  year    = {1906},
  pages   = {135--156}
}

@article{Chapman1928,
  author  = {Chapman, Sydney},
  title   = {On the {B}rownian displacements and thermal diffusion of grains suspended in a non-uniform fluid},
  journal = {Proceedings of the Royal Society of London A},
  volume  = {119},
  year    = {1928},
  pages   = {34--54}
}

@article{Kolmogorov1931,
  author  = {Kolmogorov, Andrei N.},
  title   = {\"Uber die analytischen {M}ethoden in der {W}ahrscheinlichkeitsrechnung},
  journal = {Mathematische Annalen},
  volume  = {104},
  year    = {1931},
  pages   = {415--458}
}

@book{Durrett2019mc,
  author    = {Durrett, Rick},
  title     = {Probability: Theory and Examples},
  edition   = {5},
  publisher = {Cambridge University Press},
  year      = {2019}
}

@book{LevinPeres2017,
  author    = {Levin, David A. and Peres, Yuval},
  title     = {Markov Chains and Mixing Times},
  edition   = {2},
  publisher = {American Mathematical Society},
  year      = {2017}
}

@book{MeynTweedie2009,
  author    = {Meyn, Sean P. and Tweedie, Richard L.},
  title     = {Markov Chains and Stochastic Stability},
  edition   = {2},
  publisher = {Cambridge University Press},
  year      = {2009}
}

Prerequisites

37.01.01
02.07.01
26.02.01

Tier anchors

beginner: Norris 1997 *Markov Chains* (Cambridge) §1.1; informal memorylessness as a board game whose next move depends only on the current square
intermediate: Norris 1997 *Markov Chains* (Cambridge) §1.1-1.4; Durrett 2019 *Probability: Theory and Examples* 5e §5.1-5.2
master: Norris 1997 *Markov Chains* (Cambridge) §1.1-1.4, §1.7 (strong Markov); Meyn-Tweedie 2009 *Markov Chains and Stochastic Stability* 2e Ch. 3; Levin-Peres 2017 *Markov Chains and Mixing Times* 2e Ch. 1

References

Norris — Markov Chains · Cambridge University Press 1997, §1.1-1.4, §1.7
Markov — Rasprostranenie zakona bol'shih chisel na velichiny, zavisyashchie drug ot druga · Izvestiya Fiziko-matematicheskogo obshchestva pri Kazanskom universitete (2) 15 (1906), 135-156
Chapman — On the Brownian displacements and thermal diffusion of grains suspended in a non-uniform fluid · Proc. Roy. Soc. London A 119 (1928), 34-54
Kolmogorov — Uber die analytischen Methoden in der Wahrscheinlichkeitsrechnung · Mathematische Annalen 104 (1931), 415-458
Durrett — Probability: Theory and Examples, 5e · §5.1-5.2 (Markov chains, construction, examples)
Levin-Peres — Markov Chains and Mixing Times, 2e · American Mathematical Society 2017, Ch. 1

Estimated time

beginner: 18m
intermediate: 55m
master: 90m