37.01.01 · probability / 01-measure-foundations

Probability Spaces and the Kolmogorov Extension Theorem

shipped3 tiersLean: none

Anchor (Master): Kallenberg 2021 *Foundations of Modern Probability* 3e Ch. 1-3, 8; Bogachev 2007 *Measure Theory* Vol. 2 §7.7; Parthasarathy 1967 *Probability Measures on Metric Spaces* Ch. V

Intuition Beginner

Probability begins with a bookkeeping question: before any experiment runs, what could possibly happen, and how should we weigh each possibility? A probability space is the answer written down carefully. It has three parts. The first is a list of every outcome the experiment could produce, called the sample space. The second is a collection of questions we are allowed to ask about the outcome, called events, where each event is a bundle of outcomes. The third is a rule that assigns each event a weight between zero and one, the probability, with the whole sample space weighing exactly one.

Think of rolling a die. The sample space is the six faces. An event might be "the result is even," which bundles together the faces two, four, and six. The weight rule gives this event the value one half for a fair die. The two requirements that make this honest bookkeeping are that the weight of an impossible event is zero, the weight of the certain event is one, and the weight of a bundle made from separate non-overlapping events is the sum of their individual weights.

The subtle part is what happens with infinitely many outcomes. If you flip a coin forever, a single outcome is an entire endless sequence of heads and tails, and there are uncountably many such sequences. You cannot weigh each one individually and add them up. Instead you only ever directly weigh events that depend on finitely many flips at a time, such as "the first three flips are heads." These finite-window events are called cylinder events, because they pin down a few coordinates and leave the rest free.

The Kolmogorov extension theorem is the promise that this works. If you specify the weights for every finite window in a way that does not contradict itself, then there is one and only one weight rule on the full infinite space of sequences that agrees with all of your finite-window weights at once. The non-contradiction requirement is called consistency: the weights you assign to the first three flips must match what you get from the weights on the first four flips after ignoring the fourth.

The one-sentence takeaway: a probability space is a sample space plus a list of askable events plus a weight rule summing to one, and the Kolmogorov extension theorem builds the weight rule on an infinite space out of consistent weights on finite windows.

Visual Beginner

Picture an infinite sequence of coin flips as an unending row of slots, each slot eventually filled with H or T. You can never see the whole row at once, but you can put a window over the first few slots.

Top: the infinite sequence with a finite window over the first three coordinates; the pattern inside the window is a cylinder event. Bottom: the eight weights for three-slot patterns sum to one. The arrow shows consistency: widening the window to four slots and then summing over the last slot must return the original three-slot weights.

The picture behind the extension theorem is that all these finite-window weight charts, one for each window size, fit together like nested rulers. Consistency is the statement that the nested rulers agree wherever they overlap, and the theorem says the nested rulers determine a single weight rule on the whole infinite row.

Worked example Beginner

We build the probability rule for two independent fair coin flips and check it extends a third flip consistently.

Step 1. The sample space for two flips is the four patterns HH, HT, TH, TT. Each flip is fair and the two flips are independent, so each pattern has weight one quarter: the weight of a two-flip pattern is the weight of its first flip, one half, times the weight of its second flip, one half, giving one quarter.

Step 2. Check the event "first flip is heads." This event bundles HH and HT. Its weight is one quarter plus one quarter, which is one half. That matches the weight of a single heads on the first flip, as it must.

Step 3. Now widen to three flips. The eight patterns each get weight one eighth, again because each is one half times one half times one half.

Step 4. Verify consistency between the three-flip weights and the two-flip weights. Take the two-flip pattern HT. In the three-flip world it corresponds to the two patterns HTH and HTT, since the third flip is free. Their weights are one eighth plus one eighth, which is one quarter. That equals the original two-flip weight of HT.

Step 5. Read off the conclusion. The three-flip weights, when you ignore the third flip by summing over its two values, reproduce the two-flip weights exactly. The family of weights for every window size agrees on overlaps, so it is consistent.

What this tells us: the finite-window weights for independent fair coins are consistent, so the extension theorem guarantees one weight rule on the infinite space of all coin-flip sequences. That rule is the probability model for flipping a coin forever, and it is built entirely from the simple finite-window weights we just checked.

Check your understanding Beginner

Formal definition Intermediate+

Definition (probability space). A probability space is a triple where is a set (the sample space), is a -algebra of subsets of 02.07.01 (the events), and is a probability measure: a countably additive set-function with . Countable additivity means for any countable disjoint family . A probability space is a normalized measure space, so the entire integration apparatus of measure theory applies with total mass one.

Definition (random variable and its law). Given a measurable space , a random variable with values in is a measurable map , that is, for every . The law (or distribution) of is the pushforward measure on , defined by . The pushforward is a probability measure on , and the change-of-variables identity holds for every non-negative measurable .

Definition (independence of -algebras). Sub--algebras are independent if for every choice . An arbitrary family is independent if every finite subfamily is. Random variables are independent when the generated -algebras are independent; equivalently the joint law of any finite subfamily is the product 02.07.07 of the marginal laws.

Definition (cylinder -algebra on a product). Let be a family of measurable spaces indexed by a set , and let with coordinate projections . For a finite subset write . A cylinder set with base is a set for . The cylinder sets over all finite form an algebra . The cylinder (or product) -algebra is , the smallest -algebra making every measurable.

Definition (finite-dimensional distributions and consistency). A finite-dimensional distribution (f.d.d.) family is a collection of probability measures on , one for each finite . The family is consistent (or projective) if it is stable under marginalization: whenever are finite, the canonical projection satisfies . Concretely, integrating out the extra coordinates of returns .

Counterexamples to common slips Intermediate+

  • The cylinder -algebra is much smaller than the power set. For uncountable, every set in depends on only countably many coordinates: if there is a countable with . Hence the singleton for a fixed path is generally not in , and sets like "the path is everywhere continuous" need not be measurable. This is why path-regularity is a separate construction (a modification or a measure on a smaller path space), not automatic from the extension.

  • Consistency is necessary but the topological hypothesis is also load-bearing. A consistent f.d.d. family need not extend to a countably additive measure if the coordinate spaces are arbitrary. The cylinder pre-measure is always finitely additive on ; countable additivity (needed for Carathéodory 02.07.02) requires a regularity/inner-compactness input. The standard sufficient hypothesis is that each is a Polish space (or standard Borel). Andersen and Jessen exhibited consistent families on general spaces with no extension.

  • Pairwise independence is strictly weaker than independence. Three events can be pairwise independent yet not mutually independent: on two fair coin flips let first H, second H, flips differ. Each pair is independent () but . Independence is a condition on the joint law of every finite subfamily, not just pairs.

  • The law lives downstream; the space is interchangeable. Two different probability spaces can carry random variables with the same law. The pushforward records everything statistically observable about ; the underlying is scaffolding. This is why the canonical space with is a universal model: any process is realized on it by the extension theorem.

Key theorem with proof Intermediate+

Theorem (Kolmogorov / Daniell extension; Kolmogorov 1933 Grundbegriffe §III; Daniell 1919 Ann. of Math. 20, 281). Let be an index set and let each be a standard Borel space (a Borel subset of a Polish space with its Borel -algebra); for concreteness take each with . Let be a consistent family of finite-dimensional distributions. Then there exists a unique probability measure on the cylinder -algebra , , such that Equivalently, for all finite and .

Proof. Define the cylinder pre-measure on the algebra of cylinder sets by .

Step 1 (well-definedness). A cylinder set has many representations: whenever and . Consistency gives , so does not depend on the representing base. For two arbitrary representations, pass to the common refinement and apply consistency to both. Hence is well-defined on , and .

Step 2 (finite additivity). If are disjoint, choose a finite base large enough to express both ( with disjoint, by enlarging bases as in Step 1). Then , using additivity of the single measure .

Step 3 (countable additivity on the algebra — the compactness step). By the standard equivalence, a finitely additive probability on an algebra is countably additive if and only if it is continuous at : for every decreasing sequence in with , one has . We prove the contrapositive: suppose for all ; we show .

Each has a finite base; by enlarging we may take the bases to be an increasing sequence with , . Each is a Polish space, so its Borel probability is inner regular by compact sets (tightness on Polish spaces). Choose a compact with , and set . Replacing by (still a compact-based cylinder, still inside ), the inclusion-exclusion estimate gives , so each is nonempty.

Pick . Its projection lies in the compact for all ; pass to a subsequence converging in . Diagonalize over the increasing bases : by repeatedly extracting subsequences, obtain a single subsequence along which converges in for every . The limits are coordinatewise consistent and define a point with for all , hence for all . Thus , the desired contradiction. So is countably additive on .

Step 4 (Carathéodory extension). A countably additive probability on the algebra extends, by the Carathéodory extension theorem 02.07.02, to a measure on . Since , is a probability measure, and gives for every finite .

Step 5 (uniqueness). The cylinder sets form a -system generating , and any two probability measures agreeing on a generating -system agree on the -algebra (Dynkin's - theorem). Two measures with the same f.d.d.'s agree on all cylinders, hence coincide.

Bridge. This theorem builds toward the entire probabilistic spine and appears again in every later construction of a stochastic process from its finite-dimensional laws. The foundational reason it is the right organizing tool is that it converts an analytically intractable object — a measure on an uncountable-dimensional path space — into the elementary data of consistent finite-dimensional marginals. This is exactly the move that constructs an i.i.d. sequence (take for a fixed marginal , automatically consistent by Fubini-Tonelli 02.07.07) and, more generally, any process once its f.d.d.'s are specified. The compactness argument of Step 3 generalises the tightness criterion that recurs in weak-convergence theory, and the Carathéodory backbone is exactly the one used to build Lebesgue and product measures 02.07.02, so putting these together the extension theorem is the probabilistic face of the same outer-measure machine. The central insight is that consistency of marginals plus a topological regularity hypothesis on the coordinate spaces is necessary and sufficient to manufacture a single coherent law, and this law is the canonical model on which the process realizes as the coordinate maps.

Exercises Intermediate+

Advanced results Master

The structure organizing the foundations splits into the canonical-space realization, the projective-limit formulation, the role of regularity, the regular-conditional-distribution refinement, and the special-case constructions (i.i.d. sequences, Markov chains via Ionescu-Tulcea, Gaussian processes) that the extension theorem subsumes or complements.

Theorem 1 (canonical realization). For any consistent f.d.d. family on standard Borel coordinates, the canonical space with coordinate processes realizes a process whose finite-dimensional distributions are exactly the prescribed . Every process with those f.d.d.'s on any other space has the same law as the coordinate process under the law-isomorphism . The canonical space is therefore universal: it is the terminal object among realizations, and statistical statements transfer along the law map.

Theorem 2 (projective-limit formulation; Bochner). A consistent family is a projective system of probability measures along the directed set of finite subsets ordered by inclusion, with bonding maps the marginal projections. The Kolmogorov measure is the projective limit in the category of probability spaces, characterized by together with the universal property that any cone of measure-preserving maps factors uniquely through . Bochner's theorem on projective limits gives existence under the same regularity hypothesis; the cylinder-algebra proof and the projective-limit proof are two presentations of one construction.

Theorem 3 (Nelson regularity / inner-compact criterion; Nelson 1959 Ann. of Math. 69, 630). Countable additivity of the cylinder pre-measure holds whenever each is inner regular by compact sets in a compatible topology on that is compatible across marginalization. Polish coordinate spaces satisfy this; more generally, a perfect family or a Radon family suffices. The criterion isolates the exact analytic input: the algebraic data (consistency) is never enough on its own — a topological tightness ingredient is required, and it is sharp in the sense of the Dieudonné counterexample.

Theorem 4 (regular conditional distributions and disintegration). On standard Borel spaces, for a random variable and a sub--algebra , there is a regular conditional distribution: a kernel with a probability measure and a version of . The disintegration theorem factors a joint law into a marginal and a measurable family of conditionals, . This refines the product construction: independence is the special case constant in . Standard Borel structure is exactly what makes such regular versions exist.

Theorem 5 (Ionescu-Tulcea: extension without topology for kernels). If instead of a consistent family one is given a sequence of probability kernels and an initial law , the Ionescu-Tulcea theorem constructs a measure on with the prescribed conditionals, requiring no topological hypothesis on the . This is the natural tool for Markov chains and for any sequential model defined by transition kernels. The Kolmogorov theorem and the Ionescu-Tulcea theorem are complementary: Kolmogorov needs consistency plus topology and handles arbitrary index sets; Ionescu-Tulcea needs a kernel sequence (an ordered index) and dispenses with topology.

Synthesis. The foundational reason the probability spine can be built at all is that the Kolmogorov extension theorem reduces the construction of a measure on an uncountable-dimensional space to consistent finite-dimensional data, and this is exactly the move that recurs at every level above it. Putting these together, the canonical-space realization (Theorem 1) makes the coordinate process the universal model, the projective-limit formulation (Theorem 2) is dual to the cylinder-algebra construction and exhibits as , and the regularity criterion (Theorem 3) isolates the load-bearing tightness input that the algebraic consistency cannot supply. The central insight is that consistency generalises the Fubini-Tonelli product structure 02.07.07: an i.i.d. law is the product family, a Markov chain is the kernel-composed family, and a Gaussian process is the family of multivariate normals with a consistent covariance — each a different consistent system fed to the same machine. The bridge is between the outer-measure Carathéodory backbone 02.07.02 and the entire theory of stochastic processes: the extension theorem appears again in the construction of Brownian motion (where Kolmogorov supplies the law on and a separate continuity theorem selects a continuous modification), in the strong law and the ergodic theorem (which live on the canonical i.i.d. space), and in the disintegration theory (Theorem 4) that conditions the canonical law. This is the structural fact that organises the foundations: every later existence result is a specialization, and the regularity-versus-kernels dichotomy (Kolmogorov versus Ionescu-Tulcea) is the recurring decision about which hypothesis to pay for.

Full proof set Master

Proposition 1 (countable additivity equals continuity at ). Let be a finitely additive probability content on an algebra . Then is countably additive on (in the sense that with disjoint implies ) if and only if in implies .

Proof. () If , the differences are disjoint with (union in as a tail). Countable additivity gives , so the tail . () Suppose continuity at . Let with disjoint in . The remainders lie in , decrease to , so . Finite additivity gives ; letting yields .

Proposition 2 (well-definedness of the cylinder pre-measure). Under consistency, is independent of the chosen base representation.

Proof. Suppose . Let with projections , . Pulling back, as subsets of (both equal 's base). Consistency gives . Hence the two candidate values coincide.

Proposition 3 (tightness on Polish spaces). Every Borel probability measure on a Polish space is inner regular by compact sets: for every Borel .

Proof. Fix a complete separable metric. For and each , separability gives countably many closed balls of radius covering ; finitely many of them, say with union , satisfy . The set is closed and totally bounded, hence compact by completeness, with . Thus . Inner regularity of Borel sets then follows from outer/inner regularity by closed/open sets (standard for metric measures) intersected with this compact exhaustion.

Proposition 4 (the compactness/diagonal step). With the notation of Step 3 of the Key theorem, if for all then .

Proof. As constructed, are nonempty compact-based cylinders with and . Pick . The points (compact) admit a convergent subsequence; inductively refine so that for each fixed , converges in along a single diagonal subsequence. The limits are consistent under the projections (projections are continuous, limits commute), so they assemble into with . Then -defining sets, giving for every , so .

Proposition 5 (uniqueness via -). A probability measure on is determined by its values on cylinders.

Proof. As in Exercise 7: cylinders form a -system generating ; the agreement set of two probability measures is a -system; Dynkin's theorem forces agreement on (cylinders) .

Proposition 6 (pushforward is a probability measure). If is measurable then is a probability measure on .

Proof. . For disjoint , the preimages are disjoint in and , so countable additivity of transfers: . Non-negativity is inherited.

Proposition 7 (consistency of i.i.d. and of Gaussian families). The product family is consistent (Exercise 3). The Gaussian family with , for a symmetric positive-semidefinite kernel , is consistent.

Proof. For the Gaussian case, marginalizing a multivariate normal onto the coordinates in yields , because the marginal of a jointly Gaussian vector is Gaussian with the corresponding sub-covariance block (the projection of a normal law is normal, with covariance the principal submatrix). Positive-semidefiniteness of guarantees each is a valid covariance, so each exists. Marginal-consistency is the sub-block identity, so the family is projective and the extension yields a centered Gaussian process with covariance .

Connections Master

  • The product-measure and Fubini-Tonelli theory 02.07.07 is the finite-dimensional engine of consistency: an i.i.d. family is the projective system of product measures, and marginalizing a product is precisely a Tonelli integration over the dropped coordinates. The extension theorem is the infinite-index continuation of the finite product construction, so this unit generalises 02.07.07 from finite to arbitrary index sets.

  • The Carathéodory extension and outer-measure construction 02.07.02 supplies the backbone of Step 4: the cylinder pre-measure, once shown countably additive on the algebra, is extended to the generated -algebra by the same outer-measure machine that builds Lebesgue measure. The probabilistic content is entirely in establishing countable additivity; the lift is borrowed verbatim from 02.07.02.

  • The -algebra and Borel-space foundations 02.07.01 define the cylinder -algebra and the standard-Borel hypothesis that makes the regularity step work; measurability of the coordinate maps and of random variables is exactly the framework of 02.07.01, and the countable-coordinate-dependence phenomenon is a structural fact about product -algebras introduced there.

  • The elementary rules and distributions of probability 26.02.01 are the concrete shadow of this unit: the axioms and additivity, independence as a product rule, and named distributions all reappear here as the measure-theoretic objects they abstract, with 26.02.01's finite and discrete cases embedded as the f.d.d.'s the extension theorem stitches together.

Historical & philosophical context Master

The measure-theoretic foundation of probability was settled by Andrei Kolmogorov's 1933 monograph Grundbegriffe der Wahrscheinlichkeitsrechnung [Kolmogorov 1933], which identified a random experiment with a measure space of total mass one and a random variable with a measurable function. The extension theorem occupies the final chapter (§III), where Kolmogorov constructs measures on infinite-dimensional product spaces from consistent finite-dimensional distributions, the existence result that legitimized the study of stochastic processes as honest mathematical objects rather than heuristic limits.

The analytic heart of the construction predates Kolmogorov. Percy Daniell's 1919 paper Integrals in an infinite number of dimensions [Daniell 1919] built integration on infinite products and proved the extension for product-type families, and the result is often called the Daniell-Kolmogorov theorem in recognition. The Carathéodory extension machinery it relies on came from Constantin Carathéodory's 1918 Vorlesungen über reelle Funktionen [Caratheodory 1918], the outer-measure axiomatization used identically for Lebesgue measure.

The topological hypothesis was clarified over the following decades. Jean Dieudonné (1948) and the work surveyed by Erik Sparre Andersen and Børge Jessen produced consistent families on non-regular coordinate spaces with no countably additive extension, showing that consistency alone is insufficient. Edward Nelson's 1959 Regular probability measures on function space [Nelson 1959] gave a clean inner-compactness criterion, and K. R. Parthasarathy's 1967 Probability Measures on Metric Spaces [Parthasarathy 1967] established the theorem in its standard Polish-space form, with the Ionescu-Tulcea kernel construction providing the complementary topology-free route for sequentially defined models.

Bibliography Master

@book{Kolmogorov1933,
  author    = {Kolmogorov, Andrei N.},
  title     = {Grundbegriffe der {W}ahrscheinlichkeitsrechnung},
  series    = {Ergebnisse der Mathematik und ihrer Grenzgebiete},
  publisher = {Springer},
  address   = {Berlin},
  year      = {1933}
}

@article{Daniell1919,
  author  = {Daniell, Percy J.},
  title   = {Integrals in an infinite number of dimensions},
  journal = {Annals of Mathematics (2)},
  volume  = {20},
  year    = {1919},
  pages   = {281--288}
}

@book{Caratheodory1918,
  author    = {Carath\'eodory, Constantin},
  title     = {Vorlesungen \"uber reelle {F}unktionen},
  publisher = {Teubner},
  address   = {Leipzig and Berlin},
  year      = {1918}
}

@article{Nelson1959,
  author  = {Nelson, Edward},
  title   = {Regular probability measures on function space},
  journal = {Annals of Mathematics (2)},
  volume  = {69},
  year    = {1959},
  pages   = {630--643}
}

@book{Parthasarathy1967,
  author    = {Parthasarathy, K. R.},
  title     = {Probability Measures on Metric Spaces},
  publisher = {Academic Press},
  address   = {New York},
  year      = {1967}
}

@book{Durrett2019,
  author    = {Durrett, Rick},
  title     = {Probability: Theory and Examples},
  edition   = {5},
  publisher = {Cambridge University Press},
  year      = {2019}
}

@book{Kallenberg2021,
  author    = {Kallenberg, Olav},
  title     = {Foundations of Modern Probability},
  edition   = {3},
  publisher = {Springer},
  year      = {2021}
}

@book{Billingsley1995,
  author    = {Billingsley, Patrick},
  title     = {Probability and Measure},
  edition   = {3},
  publisher = {Wiley},
  year      = {1995}
}

@book{Bogachev2007v2,
  author    = {Bogachev, Vladimir I.},
  title     = {Measure Theory, Volume 2},
  publisher = {Springer},
  year      = {2007}
}