42.03.05 · mathematical-logic / set-theory-forcing

The Axiom of Choice and Its Equivalents

shipped3 tiersLean: none

Anchor (Master): Kunen 2011 *Set Theory* (College Publications) Ch. I §12 and the independence remarks in Ch. IV-VII (Cohen forcing) and the Fraenkel-Mostowski permutation models; Jech 1973 *The Axiom of Choice* (North-Holland) Ch. 1-5 (equivalents, weak forms ACω and DC, the Banach-Tarski paradox, Vitali sets, independence); Herrlich 2006 *Axiom of Choice* (Springer LNM 1876) for the disasters-with-and-without panorama

Intuition Beginner

Imagine you own infinitely many pairs of socks and you want to grab exactly one sock from each pair. With shoes this is simple: take the left shoe from every pair, and you are done with a single uniform rule. Socks are different. The two socks in a pair are identical, so there is no rule like "take the left one." You can still believe a selection exists — surely you could reach into each pair and pull one out — but you cannot write down a recipe that names your pick in every pair at once. The Axiom of Choice is exactly the assumption that such a simultaneous selection always exists, even when no rule describes it.

For finitely many pairs you never need the axiom: pick from the first, then the second, and stop. The trouble starts only with infinitely many collections, where "keep picking" never finishes. The axiom says the finished result is there anyway, as a single object that chooses one element from each set at once.

This sounds modest, and most working mathematicians use it without a second thought. What makes it remarkable is that it is equivalent to several statements that look completely different: that every collection can be lined up in a row with a first element, a next element, and so on past infinity; and that certain step-by-step building processes always reach a stopping point you cannot improve. In this unit you meet the axiom, three of its famous disguises, and a glimpse of why some of its consequences are strange.

Visual Beginner

The axiom of choice is a selection: from each box in a family of nonempty boxes, it hands you exactly one item, all at once. The picture shows three of infinitely many boxes and the selected element circled in each.

   box A        box B        box C        ...  (infinitely many boxes)
  +-------+    +-------+    +-------+
  | . (o) |    | (o) . |    | . . (o)|     o = the element the
  | .   . |    |  .  . |    | (o). . |         choice function picks
  +-------+    +-------+    +-------+
      |            |            |
      v            v            v
   pick a_A     pick a_B     pick a_C      ...  one pick per box, all at once

   the OUTPUT is a single function:  box  -->  one element of that box

Read it left to right. Each box is a nonempty set; the choice function reaches into every box and returns one of its members. For finitely many boxes you could just point at picks one box at a time. The content of the axiom is that the function exists even when there are infinitely many boxes and no uniform rule tells you which member to take.

family of boxes is a choice function guaranteed without the axiom?
finitely many nonempty boxes yes, pick one at a time
pairs of shoes (left/right) yes, the rule "take the left" works
pairs of identical socks not by any rule; the axiom asserts one exists
every nonempty set of reals not in general; the axiom asserts one exists

The key idea: a choice function is one object that selects from every box simultaneously, and the axiom is the promise that this object is always available.

Worked example Beginner

Let us see the axiom in action on a small family, and then see why a slightly bigger family forces the issue. Write a choice function for a family as a rule that returns one member of each set in the family.

Step 1. Take the family of three sets , , and . A choice function picks one number from each. One valid choice is: from take ; from take ; from take . So the choice function outputs the three numbers .

Step 2. Notice we did not need any axiom here. There are only three sets, so we just made three picks by hand and finished. Every finite family can be handled this way.

Step 3. Now make the family infinite but still ruled: for each whole number , take the set , giving , , , and so on forever. A simple rule selects from every one of these at once: "take the positive member." That rule outputs , one pick per set, with no axiom needed, because the rule names the pick.

Step 4. Finally, break the rule. Split the real number line into groups where two numbers go in the same group when their difference is a fraction. Each group is nonempty, but the groups have no order and no labels, so there is no describable rule like "take the positive one" or "take the smallest." Here a hand-written recipe runs out. The axiom of choice steps in and asserts that a selection across all the groups exists anyway.

What this tells us: when a rule names your pick, you never need the axiom; when the sets are too featureless for any rule, the axiom is the only thing that delivers a simultaneous selection.

Check your understanding Beginner

Formal definition Intermediate+

All of the following takes place inside ZF (cross-ref 42.03.01); the only primitive is , and ordinals, well-orders, and transfinite recursion are as developed in 42.03.02. A choice function on a family of sets is a function with domain such that for every nonempty .

The Axiom of Choice (AC) is the sentence: every family of nonempty sets has a choice function. Three equivalent formulations are standard [Kunen Ch. I §12]: every set of nonempty pairwise-disjoint sets has a transversal (a set meeting each member in exactly one point); every surjection has a right inverse with ; and the Cartesian product of nonempty sets is nonempty.

A partial order is a reflexive, antisymmetric, transitive relation. A chain is a subset linearly ordered by . An element is maximal if no has . A chain is bounded above if some has for all ; a poset is chain-bounded when every chain (the empty chain included, so ) has an upper bound.

The four principal equivalents of AC, all stated over ZF, are:

  • Well-Ordering Theorem (WO). Every set admits a well-ordering: a linear order in which every nonempty subset has a least element.
  • Zorn's Lemma (ZL). Every chain-bounded partial order has a maximal element.
  • Hausdorff Maximal Principle (HMP). In every partial order, each chain is contained in a -maximal chain.
  • Cardinal Comparability (CC). For any two sets , either injects into or injects into .

A family of sets has finite character if exactly when every finite subset of belongs to . The Teichmüller-Tukey Lemma (TT) states that every nonempty family of finite character has a -maximal member; it is the form of the maximal principle tuned to the existence of bases, maximal ideals, and ultrafilters.

A weaker relative is the Boolean Prime Ideal theorem (BPI), equivalently the Ultrafilter Lemma: every Boolean algebra has a prime ideal, equivalently every filter on a set extends to an ultrafilter. Among the fragments analysis uses, Countable Choice (ACω) asserts a choice function for every countable family of nonempty sets, and Dependent Choice (DC) asserts that for any entire relation on a nonempty set () there is a sequence with for all . The strict implications hold and neither reverses [Jech AC Ch. 2].

Counterexamples to common slips Intermediate+

  • "Choice functions always require the axiom." For a family of nonempty sets of natural numbers, "take the least element" is a choice function definable in ZF with no appeal to AC. The axiom is needed only when the members carry no definable selector — a well-ordering of the union supplies one uniformly, which is precisely why WO implies AC.

  • "Zorn's Lemma needs the maximum of a chain, not merely an upper bound." The upper bound of a chain need not lie in the chain; Zorn requires only that some element of dominate the chain. Demanding the bound be the chain's maximum would be false in most applications (the union of a chain of subspaces bounds it but is rarely in the chain).

  • "BPI is just AC for Boolean algebras, hence equivalent to it." BPI is strictly weaker: it holds in the basic Cohen model where AC fails, yet it already yields the Ultrafilter Lemma, the compactness theorem for first-order logic, and Hahn-Banach. The gap is the content of the Halpern-Läuchli and Halpern-Lévy independence results.

  • "Countable choice suffices for everything in analysis." DC, not ACω, is what underwrites the equivalence of sequential and topological continuity on general metric spaces and the existence of choice sequences in the Baire-category and completeness arguments; ACω alone is too weak for some of these, and full AC is needed for a basis of every vector space.

Key theorem with proof Intermediate+

The organising theorem of the subject is that the four headline statements are one statement in four costumes: a fixed choice function can be run along the ordinals to well-order any set, a well-ordering instantly yields choice functions and maximal elements, and a maximal element feeds back a choice function. The proof exhibits a cycle of implications.

Theorem (equivalence of AC, WO, ZL). Over ZF, the following are equivalent: (i) the Axiom of Choice; (ii) the Well-Ordering Theorem; (iii) Zorn's Lemma [Kunen Ch. I §12].

Proof. We show (i) (ii) (iii) (i).

(i) (ii). Let be a set; by AC fix a choice function on , so for every nonempty . Pick a set . Define a class function on ordinals by transfinite recursion (cross-ref 42.03.02): given the values already assigned below , set $$ a_\alpha = G(\langle a_\beta : \beta < \alpha\rangle) = \begin{cases} f\big(A \setminus {a_\beta : \beta < \alpha}\big) & \text{if } A \setminus {a_\beta : \beta < \alpha} \neq \varnothing,\[2pt]

  • & \text{otherwise.} \end{cases} $$ While the enumerated part omits some element of , the recursion picks a fresh element , and these are pairwise distinct (each lies outside ). By Hartogs' theorem there is an ordinal that does not inject into (the proof below in the Advanced section), so the injection cannot run through all of without hitting the value . Let be the least ordinal with ; then and is a bijection , transporting the ordinal order on to a well-ordering of .

(ii) (iii). Let be chain-bounded; assume for contradiction it has no maximal element, so every has a strict successor. Well-order as by (ii). Build a chain by recursion: having chosen an increasing chain , let be an upper bound (chain-boundedness), and set to be the -least-indexed strictly above — one exists because is not maximal. Each strictly exceeds all earlier , so (index of ) is a strictly increasing class function from into the set of indices , impossible since is a proper class (Burali-Forti, cross-ref 42.03.02). Hence has a maximal element.

(iii) (i). Let be a family of nonempty sets. Consider the poset of partial choice functions on — functions with and — ordered by inclusion of graphs. The union of a chain of partial choice functions is again one (single-valuedness is preserved along a chain), so every chain is bounded above by its union; is chain-bounded and nonempty (the empty function lies in it). By ZL it has a maximal element . If some were outside , picking any (which exists, ) extends by the pair , contradicting maximality. So and is a choice function.

Bridge. The equivalence is the foundational reason the cardinal and ordinal machinery of 42.03.02 and 42.03.03 applies to arbitrary sets rather than only to manifestly well-orderable ones: the Well-Ordering Theorem is exactly what lets every set inherit an ordinal length, so the counting theorem assigns it a cardinal. This is exactly the cycle a working mathematician invokes whenever a maximal object is summoned, since Zorn's Lemma is the face of AC dual to direct construction — instead of enumerating along the ordinals one bounds chains and reads off a maximum. The Hartogs aleph, run through Zermelo's recursion, generalises ordinary induction past every infinite stage, and the whole argument builds toward cardinal comparability, where Hartogs again forces one of two sets to inject into the other. Putting these together, the central insight is that a single choice function, fed into transfinite recursion, manufactures a well-ordering, and a well-ordering manufactures every other selection in sight; this circle appears again in the constructible universe 42.03.06, whose internal well-ordering makes AC a theorem rather than an axiom.

Exercises Intermediate+

Advanced results Master

The Hartogs aleph is the engine that converts a choice function into a well-ordering and forces cardinal comparability; the maximal principles are the chain-bounding reformulation; and the independence results mark the boundary the whole circle respects.

Theorem 1 (Hartogs' theorem; ZF). For every set there is a least ordinal that does not inject into [Jech AC Ch. 2]. The well-orderable subsets of carry well-orderings whose order types form a set of ordinals by Replacement (each type is the image of an injection from an ordinal into , and the collection of such injections is a subset of cut out by Separation); is an ordinal, and it cannot inject into lest it appear in and exceed itself. Hartogs' theorem needs no choice — it is the choice-free reservoir of ordinals that Zermelo's recursion exhausts, and it is exactly what makes the enumeration terminate.

Theorem 2 (cardinal comparability is equivalent to AC). Over ZF, AC holds if and only if any two sets are -comparable by injection [Kunen Ch. I §12]. AC gives WO, so any two sets become well-orderable, hence comparable as ordinals project to comparable cardinals. Conversely, given comparability, compare an arbitrary with its Hartogs ordinal : since does not inject into , comparability forces to inject into , and an injection of into an ordinal transports the ordinal's well-ordering back to — so is well-orderable, and WO (hence AC) follows.

Theorem 3 (Tychonoff's theorem is equivalent to AC). The statement "every product of compact topological spaces is compact" is equivalent to AC over ZF (Kelley) [Jech AC Ch. 2]. Tychonoff AC because the one-point compactifications of nonempty sets with the cofinite-plus-point topology are compact, and a point of the nonempty compact product avoiding every on a suitable closed slice yields a choice function. The Tychonoff theorem restricted to Hausdorff spaces drops in strength to exactly BPI — the ultrafilter-convergence proof of compactness needs only an ultrafilter on each index, not a full choice function, which is why the Hausdorff case sits at the BPI level.

Theorem 4 (independence of AC from ZF). If ZF is consistent, so is [Jech AC Ch. 4-5]. Two methods establish this. The Fraenkel-Mostowski permutation models build a model of ZF with atoms in which a group of atom-permutations acts and one restricts to hereditarily symmetric sets; choosing the group so that an infinite set of atoms (e.g. Russell's "socks", a family of unordered pairs) has no symmetric choice function gives , though atoms keep this from being a model of full ZF. Cohen's symmetric extension transplants the construction into pure ZF: forcing adds a denumerable set of mutually generic reals, and the symmetric submodel fixed by a group of automorphisms of the forcing contains the set but no well-ordering of it, so AC fails while ZF holds. The full forcing apparatus is developed in 42.03.06; here it is recorded that is consistent and that the consistency of already yields the consistency of via Gödel's , the converse arrow co-produced in 42.03.06.

Theorem 5 (pathologies of full AC). AC yields a non-Lebesgue-measurable set and the Banach-Tarski paradox [Jech AC Ch. 1]. Vitali's set picks one representative from each coset of in — a choice function on the quotient — and its rational translates partition into countably many congruent pieces, an arrangement no countably additive translation-invariant measure can assign a consistent value, so the set is non-measurable. Banach-Tarski sharpens this: using a free non-abelian subgroup of the rotation group and a choice of orbit representatives, the unit ball decomposes into finitely many pieces reassembled by rigid motions into two unit balls. Both proofs are choice-driven and both fail under the competing axiom of determinacy, where every set of reals is Lebesgue measurable.

Synthesis. The foundational reason the axiom of choice organises the transfinite is that one choice function, threaded through the transfinite recursion of 42.03.02, is exactly enough to well-order an arbitrary set, and a well-ordering extracts every other selection — this is the central insight that makes WO, ZL, HMP, TT, and cardinal comparability a single principle wearing five faces. Hartogs' aleph, the choice-free reservoir of ordinals, closes the loop: it terminates Zermelo's enumeration and forces comparability, so the appearance of an ordinal too large to inject powers both the well-ordering proof and Theorem 2. Zorn's Lemma is the face dual to enumeration, replacing the climb up the ordinals by the bounding of chains, and it is the form the rest of the corpus silently uses — bases, maximal ideals, algebraic closures, and Tychonoff are each a chain-bounding argument, while the Boolean Prime Ideal theorem is the strictly weaker fragment that already buys the ultrafilter lemma, compactness, and Hahn-Banach. Putting these together, AC is at once indispensable and double-edged: it makes cardinal arithmetic total and the hierarchy choice-respecting through 42.03.06, yet it manufactures the Vitali set and the Banach-Tarski decomposition. The bridge is that AC is independent of ZF — consistent to assume and consistent to deny — so this unit fixes the principle the rest of the corpus assumes, while 42.03.06 establishes that assuming it costs nothing in consistency, an independence that appears again in every theorem the corpus proves under the silent ZFC contract.

Full proof set Master

Proposition 1 (Hartogs' aleph; full proof). For every set there is an ordinal admitting no injection into , and a least such ordinal exists.

Proof. Call a pair a well-ordering on if and well-orders . Each such is an element of , so the collection of all well-orderings on is a set by Separation. The order type is an ordinal (counting theorem, 42.03.02), and by Replacement is a set of ordinals; it is an initial segment of , since a type's predecessors are types of initial segments of the same well-order. Let , the least ordinal above every element of . If injected into by , then transports the ordinal well-order of onto , exhibiting as the type of a well-ordering on , so , whence , impossible. Leastness: any smaller ordinal lies in , so is a type of a well-ordering on and does inject into . Hence is the least non-injecting ordinal.

Proposition 2 (Bourbaki-Witt; ZL from a choice function without ordinals). Let be chain-bounded and a function with for all . Then has a fixed point. Consequently every chain-bounded poset has a maximal element.

Proof. Fix any . Call admissible if , , and the supremum of every chain in that has one lies in . The intersection of all admissible sets is admissible and is the least admissible set. One shows is a chain by proving every is comparable to all of : the set of such is itself admissible (the verification uses that for an "extreme" point with no element strictly between and , every satisfies or , established by admissibility of ). Since is a chain in the chain-bounded it has an upper bound, and being admissible it contains the supremum of . Then and , while by hypothesis, so : a fixed point. For the maximality consequence, suppose chain-bounded had no maximal element; using a choice function on (AC) define to be a chosen strict upper bound of , contradicting the fixed point . Thus a maximal element exists.

Proposition 3 (Teichmüller-Tukey from Zorn). Every nonempty family of finite character has a -maximal member.

Proof. Order by . Let be a chain and . Every finite lies in a single member of the chain (each point of comes from some member; finitely many members in a chain have a largest containing all), so by membership in that member together with finite character downward. Since every finite subset of is in , finite character gives , so bounds in . By Zorn, has a maximal element.

Proposition 4 (DC implies ACω; the implication is strict). Dependent Choice implies Countable Choice. The converse fails relative to ZF.

Proof. Let be nonempty sets; we produce a choice function. Let be the set of finite sequences with , including the empty sequence, and define when extends by one term, with . Each has an -successor (since ), so is entire on the nonempty . By DC there is a sequence ; its terms cohere into a function with , a choice function for the family. Strictness: a Cohen-style symmetric model satisfies ACω while failing DC, as DC implies the existence of an infinite -branch through certain trees that the model omits; this is recorded in [Jech AC Ch. 8]. Rubric for the forward direction is the tree-of-partial-choices construction; the strictness is cited, not reproved here.

Proposition 5 (Vitali set; non-measurability under AC). Under AC there is a set that is not Lebesgue measurable.

Proof. Define on by . The equivalence classes partition ; by AC choose one representative from each class, collecting them into . Enumerate and let , the translation of by modulo . The are pairwise disjoint (two representatives differing by a rational would be -equivalent, hence equal) and their union is all of (every is to its class representative, so differs from it by some ). If were measurable with measure , translation invariance gives each measure , and countable additivity forces . But is if and if , never — a contradiction. So is not measurable.

Connections Master

  • Ordinals and transfinite recursion 42.03.02 are the machinery this unit runs: Zermelo's well-ordering proof is a transfinite recursion along driven by a fixed choice function, and the Burali-Forti fact that is a proper class is what forces the recursion to terminate inside any target set. The counting theorem of 42.03.02, assigning each well-order a unique ordinal type, is what lets the Well-Ordering Theorem deliver an ordinal length for every set; without the ordinal recursion of that unit, AC could not be cashed out as well-ordering, and the equivalence circle would have no engine.

  • Cardinals and cardinal arithmetic 42.03.03 depend on this unit for totality: a cardinal is an initial ordinal, and the assignment of a cardinality to an arbitrary set requires that the set be well-orderable, which is exactly the Well-Ordering Theorem proved here. The equivalence of AC with cardinal comparability (Theorem 2) is the precise statement that the cardinal -order is linear if and only if choice holds — under there are incomparable cardinalities and infinite Dedekind-finite sets, so the clean -arithmetic of 42.03.03 is a choice-dependent luxury this unit underwrites.

  • The constructible universe and forcing 42.03.06 complete the foundational picture in both directions. Gödel's carries a definable well-ordering built by recursion along the ordinals, so , giving the arrow co-produced as 42.03.06; conversely Cohen's symmetric extensions, the forcing technique also developed in 42.03.06, establish . Together these make AC independent of ZF, so this unit fixes a principle that the rest of the corpus may assume freely precisely because 42.03.06 shows the assumption is consistency-neutral.

Historical & philosophical context Master

The axiom was isolated by Ernst Zermelo in 1904 to prove that every set can be well-ordered [Zermelo 1904], answering a problem Cantor had posed. Zermelo's Auswahlaxiom — a simultaneous selection of one element from each member of a family of nonempty sets — drew immediate fire from the French analysts Borel, Baire, and Lebesgue, and from Peano, who objected to asserting the existence of a function specified by no rule. Zermelo's 1908 second proof recast the argument through the theory of "-chains," the kernel of the later Bourbaki-Witt tower, and was published alongside his first axiomatisation of set theory, partly to make the use of choice explicit and defensible.

The equivalents accumulated over the following decades: Felix Hausdorff stated his maximal-chain principle in Grundzüge der Mengenlehre (1914); Kazimierz Kuratowski (1922) and Max Zorn (1935) gave the maximal-element form now bearing Zorn's name, Zorn emphasising its use in algebra to avoid explicit transfinite recursion; Oswald Teichmüller (1939) and John Tukey (1940) formulated the finite-character version. Friedrich Hartogs had proved in 1915, without any choice, that every set is dominated by an ordinal — the result that powers cardinal comparability's equivalence with AC. The non-measurable set is due to Giuseppe Vitali (1905) and the paradoxical decomposition to Stefan Banach and Alfred Tarski (1924).

The independence of AC from ZF was settled in two stages. Abraham Fraenkel (1922) and Andrzej Mostowski (1939) built permutation models with atoms refuting AC, and Kurt Gödel (1938) proved the consistency of AC by constructing . Paul Cohen's 1963 invention of forcing produced symmetric models of in pure set theory [Jech AC Ch. 5], closing the question and confirming that the axiom is neither provable nor refutable from the other axioms of Zermelo-Fraenkel set theory.

Bibliography Master

@book{kunen2011set,
  author    = {Kunen, Kenneth},
  title     = {Set Theory},
  series    = {Studies in Logic},
  volume    = {34},
  publisher = {College Publications},
  year      = {2011}
}

@book{jech1973axiom,
  author    = {Jech, Thomas},
  title     = {The Axiom of Choice},
  series    = {Studies in Logic and the Foundations of Mathematics},
  volume    = {75},
  publisher = {North-Holland},
  year      = {1973}
}

@article{zermelo1904beweis,
  author  = {Zermelo, Ernst},
  title   = {Beweis, dass jede Menge wohlgeordnet werden kann},
  journal = {Mathematische Annalen},
  volume  = {59},
  year    = {1904},
  pages   = {514--516}
}

@article{zermelo1908neuer,
  author  = {Zermelo, Ernst},
  title   = {Neuer Beweis f{\"u}r die M{\"o}glichkeit einer Wohlordnung},
  journal = {Mathematische Annalen},
  volume  = {65},
  year    = {1908},
  pages   = {107--128}
}

@article{hartogs1915problem,
  author  = {Hartogs, Friedrich},
  title   = {{\"U}ber das Problem der Wohlordnung},
  journal = {Mathematische Annalen},
  volume  = {76},
  year    = {1915},
  pages   = {438--443}
}

@article{zorn1935remark,
  author  = {Zorn, Max},
  title   = {A remark on method in transfinite algebra},
  journal = {Bulletin of the American Mathematical Society},
  volume  = {41},
  year    = {1935},
  pages   = {667--670}
}

@article{vitali1905problema,
  author  = {Vitali, Giuseppe},
  title   = {Sul problema della misura dei gruppi di punti di una retta},
  journal = {Tipografia Gamberini e Parmeggiani, Bologna},
  year    = {1905}
}

@article{banach1924decomposition,
  author  = {Banach, Stefan and Tarski, Alfred},
  title   = {Sur la d{\'e}composition des ensembles de points en parties respectivement congruentes},
  journal = {Fundamenta Mathematicae},
  volume  = {6},
  year    = {1924},
  pages   = {244--277}
}

@article{godel1938consistency,
  author  = {G{\"o}del, Kurt},
  title   = {The consistency of the axiom of choice and of the generalized continuum-hypothesis},
  journal = {Proceedings of the National Academy of Sciences},
  volume  = {24},
  year    = {1938},
  pages   = {556--557}
}

@article{cohen1963independence,
  author  = {Cohen, Paul J.},
  title   = {The independence of the continuum hypothesis},
  journal = {Proceedings of the National Academy of Sciences},
  volume  = {50},
  year    = {1963},
  pages   = {1143--1148}
}