40.07.04 · combinatorics / probabilistic-method

The Lovász Local Lemma and the Moser-Tardos Algorithm

shipped3 tiersLean: none

Anchor (Master): Alon-Spencer 2016 *The Probabilistic Method* 4e (Wiley) Ch. 5 and §5.6-5.7 (the Moser-Tardos algorithmic local lemma, the lopsided version); Erdős-Lovász 1975 *Colloq. Math. Soc. János Bolyai* 10 (Infinite and Finite Sets) (the original lemma); Moser-Tardos 2010 *J. ACM* 57 (the constructive proof via witness trees / entropy compression); Shearer 1985 *Combinatorica* 5 (the optimal lattice-gas / independence-polynomial bound)

Intuition Beginner

The first-moment method proves a good object exists when the average number of flaws is below one. But often there are so many possible flaws that their average is enormous — thousands of bad spots on average — and the first moment says nothing useful. Yet you may still feel the object should exist, because each flaw involves only a few ingredients, and flaws far apart in the structure have nothing to do with one another. The Lovász Local Lemma turns that feeling into a proof.

Here is the idea. List all the bad events you want to avoid. Make each one rare on its own, with probability at most some small . Now suppose each bad event is tangled up with only a few others — it shares ingredients with at most of them — and is otherwise completely independent of the rest. The lemma says: if rarity beats local crowding, by the clean rule that times about stays below a fixed budget, then with positive probability every single bad event is avoided at once. You do not need the average flaw count below one; you only need each flaw rare and each flaw lonely.

The second half of the story is that you can actually find the good object, not just know it exists. Start with a random guess. Whenever some bad event is currently happening, locally re-roll just the ingredients it touches, and repeat. This patient "fix the conflict in front of you" loop stops quickly, leaving a configuration with no bad events left.

Visual Beginner

Picture each bad event as a dot, and draw a line between two dots when those two events share an ingredient and so can influence each other. This picture is the dependency graph. A dot with few lines is a bad event that is nearly independent of everything; a dot with many lines is heavily entangled. The local lemma cares only about the largest number of lines at any dot — the local crowding — not about how many dots there are in total. A million lonely dots, each touching at most three others, are all avoidable together.

Quantity Meaning Local-lemma role
how rare each bad event is smaller is better
most neighbours any event has the local crowding
budget rarity versus crowding must stay at most

The table is the whole criterion: keep the product at or below , and a flawless configuration is guaranteed to exist.

Worked example Beginner

Suppose you have a long list of logical clauses, and you want one true/false setting of the variables that satisfies all of them. Each clause is a small demand on, say, variables; a clause is "broken" by a random coin-flip setting exactly when all of its variables come out the one wrong way. We will see when a satisfying setting must exist.

Step 1. Find how rare one broken clause is. A clause of variables is broken only by one specific combination out of the equally likely settings of its three variables. So the chance a fixed clause is broken is .

Step 2. Count local crowding. Say each clause shares a variable with at most other clauses. Two clauses can conflict only when they share a variable, so is the largest number of clause-neighbours any clause has.

Step 3. Apply the budget rule . Using about and , the rule reads , that is . So can be as large as , giving .

Step 4. Read off the guarantee. If every clause shares variables with at most other clause, the budget holds and a satisfying setting is guaranteed to exist — no matter how many clauses there are in total.

What this tells us: rarity () and crowding () trade off through one clean inequality, and the total number of clauses never enters. For -variable clauses the honest threshold grows like neighbours once the constant is handled sharply; the rough hand-count here keeps the arithmetic plain.

Check your understanding Beginner

Formal definition Intermediate+

Throughout, are events on a common probability space, thought of as the bad events one wishes to avoid simultaneously; the goal is to show . Basic probability — conditional probability, independence of families of events — is imported from the probability units and applied here, not reproved.

Definition (dependency graph). A graph on vertex set is a dependency graph for the events if for each the event is mutually independent of the family — that is, is independent of every event built from the with not adjacent to . Write for the neighbourhood; the events outside exert no influence on .

The general (asymmetric) Lovász Local Lemma. Suppose is a dependency graph for and there exist reals with $$ \Pr[A_i] ;\le; x_i \prod_{j \in \Gamma(i)} (1 - x_j) \qquad \text{for every } i. $$ Then $$ \Pr\Big[\bigcap_{i=1}^{n} \overline{A_i}\Big] ;\ge; \prod_{i=1}^{n} (1 - x_i) ;>; 0 , $$ so with positive probability none of the bad events occurs [Alon-Spencer Ch. 5]. The weights are a free parameter chosen per application; the inequality says each event must be rarer than a budget discounted by the weights on its neighbours.

The symmetric Lovász Local Lemma. If each has and the dependency graph has maximum degree at most (each event mutually independent of all but at most others), and $$ e,p,(d + 1) \le 1 , $$ then . This follows from the general form by setting every and using .

The notation , , , the dependency graph , and the weights are registered in _meta/NOTATION.md.

Counterexamples to common slips Intermediate+

  • "A dependency graph records all statistical correlation." It records enough independence: must be independent of the joint behaviour of all non-neighbours, not merely pairwise uncorrelated. Pairwise independence does not suffice; mutual independence of the complement family is the requirement.

  • "The symmetric form needs ." The correct factor is , not : an event is counted among its own "forbidden" set, so the budget is . Dropping the overstates how much crowding is tolerated.

  • " must equal ." The are slack variables, not the probabilities; one needs only , and choosing a little above to leave room is the standard move. The conclusion is then a valid lower bound on the survival probability.

  • "Positive probability means a large fraction." The bound is typically exponentially small; the lemma is an existence statement, and a positive—however tiny—probability is exactly what existence requires.

Key theorem with proof Intermediate+

The signature result is the general local lemma itself; its applications to -SAT and hypergraph colouring follow as corollaries. The proof is the classical conditional-probability induction of Erdős and Lovász [Erdős-Lovász 1975].

Theorem (general Lovász Local Lemma). Let have dependency graph , and let satisfy for all . Then for every and every subset , $$ \Pr\Big[A_i ,\Big|, \bigcap_{j \in S} \overline{A_j}\Big] ;\le; x_i , $$ and consequently .

Proof. The conditional bound is proved by induction on . For , since each factor is at most . For the inductive step, split where are the neighbours of in and the non-neighbours. Write the conditional probability as a ratio, $$ \Pr\Big[A_i ,\Big|, \bigcap_{j \in S} \overline{A_j}\Big] = \frac{\Pr\big[A_i \cap \bigcap_{j \in S_1}\overline{A_j} ,\big|, \bigcap_{k \in S_2}\overline{A_k}\big]} {\Pr\big[\bigcap_{j \in S_1}\overline{A_j} ,\big|, \bigcap_{k \in S_2}\overline{A_k}\big]} . $$ Bound the numerator from above. Dropping the conditioning on inside the numerator and using that is mutually independent of (as ), $$ \Pr\Big[A_i \cap \bigcap_{j \in S_1}\overline{A_j} ,\Big|, \bigcap_{k \in S_2}\overline{A_k}\Big] \le \Pr\Big[A_i ,\Big|, \bigcap_{k \in S_2}\overline{A_k}\Big] = \Pr[A_i] \le x_i \prod_{j \in \Gamma(i)}(1 - x_j). $$

Now bound the denominator from below. Enumerate and expand by the chain rule, $$ \Pr\Big[\bigcap_{\ell=1}^{r}\overline{A_{j_\ell}} ,\Big|, \bigcap_{k \in S_2}\overline{A_k}\Big] = \prod_{\ell=1}^{r}\Pr\Big[\overline{A_{j_\ell}} ,\Big|, \bigcap_{m < \ell}\overline{A_{j_m}} \cap \bigcap_{k \in S_2}\overline{A_k}\Big] \ge \prod_{\ell=1}^{r}(1 - x_{j_\ell}), $$ where each factor uses the induction hypothesis on a conditioning set of size to give , hence . Since , the product is a sub-product of , and all factors lie in , so the denominator is at least . Dividing, $$ \Pr\Big[A_i ,\Big|, \bigcap_{j \in S}\overline{A_j}\Big] \le \frac{x_i \prod_{j \in \Gamma(i)}(1 - x_j)}{\prod_{j \in \Gamma(i)}(1 - x_j)} = x_i . $$

Finally, by the chain rule over all events, $$ \Pr\Big[\bigcap_{i=1}^{n}\overline{A_i}\Big] = \prod_{i=1}^{n}\Pr\Big[\overline{A_i} ,\Big|, \bigcap_{m < i}\overline{A_m}\Big] \ge \prod_{i=1}^{n}(1 - x_i) > 0, $$ each factor bounded below using the conditional estimate just proved.

Corollary (-SAT). Let be a -CNF formula in which every clause shares a variable with at most other clauses. If , then is satisfiable. Proof. Assign each variable true/false by an independent fair coin. For clause let be the event that is unsatisfied; a -clause is violated by exactly one of the settings of its variables, so . Two clause-events are dependent only when the clauses share a variable, so the dependency degree is at most . The hypothesis is ; the symmetric local lemma gives , so a satisfying assignment exists.

Corollary (hypergraph 2-colouring / property B). A -uniform hypergraph in which every edge meets at most others is 2-colourable (property B) whenever . Proof. Colour each vertex red/blue independently and uniformly. For edge let be the event that is monochromatic; , and depends only on edges sharing a vertex with , at most of them. Then is , and the symmetric form gives a proper 2-colouring.

Bridge. The local lemma builds toward every existence proof where the first moment is hopeless because there are too many bad events: it replaces the global demand "expected defects below one" with the local pair "each defect rare, each defect lonely", and the foundational reason this works is the conditional estimate , which says that avoiding other bad events never inflates a given one beyond its private budget . This is exactly the move that the deletion method of 40.07.02 could only approximate by removing defects after the fact; here the defects are forbidden simultaneously, and the bridge is that the same indicator-per-defect bookkeeping which gave there is now organised by the dependency graph rather than summed flat. The -SAT and colouring corollaries appear again in the algorithmic story below, where the Moser-Tardos process does not merely certify existence but constructs the assignment; putting these together, the local lemma is the dependency-graph generalisation of the union bound, and its constructive counterpart is the central insight of the modern theory.

Exercises Intermediate+

Advanced results Master

The existence lemma is the floor; its algorithmic, optimal, and lopsided refinements are the structure built on it. The Moser-Tardos algorithm makes the lemma constructive under the identical hypothesis, Shearer's bound identifies the exact boundary of avoidability for a fixed dependency graph, and the lopsided form trades independence for one-sided negative dependence.

Theorem 1 (Moser-Tardos algorithmic local lemma; 2010). Suppose each event is determined by a subset of independent random variables , with the dependency graph given by shared variables, and the asymmetric condition holds. The sequential resampling process — repeatedly pick any currently-violated and resample all of — resamples event an expected number of times at most , so the expected total number of resampling steps is at most , and the process terminates almost surely with an assignment avoiding all [Moser-Tardos 2010]. In the symmetric regime this is expected resamplings per event in the relevant scaling, a polynomial-time constructive local lemma.

Theorem 2 (witness-tree encoding). Each resampling of during a run is assigned a witness tree : a rooted tree labelled by events, recording the causal history of resamplings that forced this one, in which a child's event shares a variable with (or equals) its parent's event, and siblings carry distinct events. Two key facts drive the analysis. First, distinct resamplings of the same event produce distinct witness trees, so the number of resamplings is bounded by the number of proper witness trees that occur. Second, a fixed witness tree occurs in the execution with probability at most , because the random choices that the tree certifies are independent and each must realise its event. Summing over all witness trees rooted at — a sum controlled by a Galton-Watson branching process with offspring weights — gives the bound of Theorem 1 [Moser-Tardos 2010].

Theorem 3 (entropy compression, Moser's earlier argument). The witness-tree bound has an information-theoretic shadow: a run of resamplings consumes uniformly random bits, yet the entire run can be reconstructed from the final state plus an encoding of the witness forest whose length is when the local-lemma slack is strict. A run longer than a threshold would compress a random string below its entropy, which is impossible; hence the run is short with overwhelming probability. This entropy-compression method (Moser, STOC 2009) preceded the clean witness-tree probability bound and seeded a family of direct combinatorial arguments [Moser-Tardos 2010].

Theorem 4 (Shearer's optimal bound; 1985). For a fixed dependency graph , the set of probability vectors for which every compatible family of events can be avoided is described exactly by the independence polynomial of : avoidance is guaranteed iff for all induced subgraphs the alternating independent-set sums stay positive. For the -regular dependency graph the optimal threshold is , asymptotic to as — improving the symmetric criterion's by the missing additive unit and provably best possible: above Shearer's boundary there exist events meeting the probability constraints that cannot all be avoided [Shearer 1985]. Shearer's condition is the local lemma's tight form; the Erdős-Lovász criterion is a convenient sufficient relaxation.

Theorem 5 (lopsided / lopsidependency local lemma). The mutual-independence hypothesis can be weakened to one-sided negative dependence: is a lopsidependency graph if for every and every . Under the same conclusion holds. This covers random-permutation models — Latin transversals, the Erdős-Spencer pattern-avoidance bounds — where conditioning on avoiding some conflicts only helps avoid others, and the events are not independent but are negatively associated [Alon-Spencer Ch. 5].

Synthesis. Putting these together, the local lemma is a single inequality read four ways: the existence form of Erdős-Lovász, the constructive form of Moser-Tardos, the optimal form of Shearer, and the negatively-dependent lopsided form, and the foundational reason all four cohere is the conditional bound — once that holds, the chain rule delivers a positive survival probability, a branching-process sum delivers a polynomial resampling count, and the independence polynomial delivers the exact boundary. This is exactly the generalisation of the deletion method of 40.07.02 that the bridge there pointed to: where deletion tolerates a few defects and removes them, the local lemma forbids all defects at once by exploiting that they are locally sparse, and the Moser-Tardos process is the algorithmic deletion — it resamples defects out of existence. The central insight is that bounded dependence is as good as independence for existence, and the entropy-compression argument shows it is as good for computation: a long failing run would compress randomness past its entropy. This is dual to the second-moment method of 40.07.03, which controls the variance of a single global count, whereas the local lemma controls the interaction structure of many local events; both convert local information into a global existence verdict, and the modern theory of algorithmic sampling and approximate counting grows from Shearer's exact boundary.

Full proof set Master

Proposition 1 (general local lemma, conditional estimate). Under the hypotheses with , for every and every , .

Proof. Induct on . If , . Otherwise partition with , ; assume the conditioning event has positive probability (else there is nothing to bound). Then $$ \Pr\Big[A_i ,\Big|, \bigcap_{j\in S}\overline{A_j}\Big] = \frac{\Pr[A_i \cap \bigcap_{j\in S_1}\overline{A_j}\mid \bigcap_{k\in S_2}\overline{A_k}]}{\Pr[\bigcap_{j\in S_1}\overline{A_j}\mid \bigcap_{k\in S_2}\overline{A_k}]}. $$ For the numerator, , and since the event is independent of , so the numerator is . For the denominator, write and apply the chain rule; each factor has conditioning set of size , so the induction hypothesis gives and the factor is . Thus the denominator is , since and the extra factors lie in . Dividing, .

Proposition 2 (positive survival probability). Under the same hypotheses, .

Proof. By the chain rule, . Each factor equals by Proposition 1 applied to . Hence the product is , which is positive since each .

Proposition 3 (symmetric criterion). If for all , the dependency graph has maximum degree , and , then .

Proof. Set . For , , since . Therefore , the middle inequality from . The general local lemma (Propositions 1-2) applies and gives .

Proposition 4 (-SAT satisfiability). A -CNF formula in which every clause shares a variable with at most other clauses is satisfiable.

Proof. Take an independent uniform assignment of the variables. For each clause , has , the single falsifying assignment of 's variables. The dependency graph joins clauses sharing a variable, of maximum degree . Then , so Proposition 3 gives ; an assignment satisfying every clause exists.

Proposition 5 (Moser-Tardos termination). In the variable model with , the resampling process resamples an expected times; the expected total number of resamplings is , finite, so the process halts almost surely at an assignment avoiding all .

Proof sketch (witness trees). Order the resamplings as they occur and attach to the -th resampling (of some event ) a rooted witness tree : the root is labelled , and reading the resampling log backwards, each earlier resampling of an event sharing a variable with a node already in the tree is attached as a child of the deepest such node; siblings receive distinct labels. Distinct resamplings of yield distinct witness trees (the log reconstructs uniquely), so the number of times is resampled is at most the number of proper witness trees rooted at that occur. A fixed tree occurs only if, for each node , the independent fresh variables drawn when was resampled realised ; these draws are independent across nodes, so . Summing over all witness trees rooted at , $$ \mathbb{E}[#\text{resamplings of }A_i] \le \sum_{\tau\text{ rooted at }A_i}\prod_{v\in\tau}\Pr[A_{[v]}] \le \sum_{\tau}\prod_{v\in\tau}x_{[v]}!!\prod_{j\in\Gamma([v])}!!(1 - x_j). $$ The right side is the total weight of a Galton-Watson branching process in which a node labelled has, for each , an independent child labelled with "probability" ; the expected size of this process rooted at is exactly (the standard generating-function identity for the sub-critical branching weights, valid since the satisfy the local-lemma inequality). Summing over gives . A process with finite expected length halts almost surely, and when it halts no is violated.

Proposition 6 (lopsided local lemma). If is a lopsidependency graph — for all and — and , then .

Proof. Repeat the induction of Proposition 1. The only place mutual independence was used is the numerator bound with . Under the lopsidependency hypothesis this equality is replaced by the inequality , which is all the argument requires, since the numerator is bounded above. Every other step — the chain-rule lower bound on the denominator and the final telescoping — is unchanged, giving and hence .

Connections Master

  • The deletion method of 40.07.02 is the local lemma's predecessor on the same problems: deletion removes the rare defects from a random object after the fact, while the local lemma forbids them simultaneously by exploiting that each defect is mutually independent of all but a bounded number of others. The Ramsey and hypergraph-2-colouring applications recur in both, but the local-lemma constant is sharper — property B needs only the local edge-meeting degree below , not the total edge count below — because bounded dependence replaces the global first-moment budget with a local one.

  • The second-moment method of 40.07.03 is the dual tool: where the second moment controls the variance of one global count to force a single structure to appear, the local lemma controls the dependency structure of many local events to force them all to be avoided. Both convert local data — pairwise edge-sharing of subgraph copies there, the dependency neighbourhood here — into a global existence verdict, and the random-graph colouring thresholds of that unit are exactly where local-lemma list-colouring bounds (bounded-degree graphs are -list-colourable, sharpened toward for triangle-free graphs by the local lemma) become the sharpest known.

  • The entropy / compression analysis of the Moser-Tardos algorithm is the algorithmic face of the entropy method developed in the co-produced 40.07.06: there entropy bounds the number of configurations of a constrained random structure, here the impossibility of compressing a random bit-string below its entropy bounds the length of a resampling run. The witness-tree counting and the Shannon-entropy lower bound are two encodings of the same information-theoretic obstruction, and the local lemma's constructive form is where the probabilistic method becomes an efficient algorithm rather than a pure existence proof.

Historical & philosophical context Master

The Lovász Local Lemma was introduced by Paul Erdős and László Lovász in their 1975 paper on -chromatic hypergraphs [Erdős-Lovász 1975], where the symmetric criterion first appeared as the tool that proved property B for hypergraphs far denser than the first moment could handle. The lemma answered a recurring difficulty: many natural existence questions place the expected number of bad events well above one, yet the bad events are locally sparse, and no prior method exploited that sparsity. The asymmetric weighted form and the conditional-probability induction are due to the same paper, refined by Spencer; the dependency-graph language became standard through Spencer's 1994 lectures [Spencer 1994].

For three decades the lemma was non-constructive: the survival probability is exponentially small, so it certified existence without an efficient search. Partial algorithmic versions (Beck 1991, Alon, Molloy-Reed, Srinivasan) progressively reduced the slack required. The decisive resolution is Robin Moser and Gábor Tardos's 2010 paper [Moser-Tardos 2010], which showed the naive "resample a violated event" process terminates in expected steps under the unmodified asymmetric condition; Moser's 2009 entropy-compression argument for the symmetric -SAT case preceded it and is the conceptual seed. James Shearer's 1985 paper [Shearer 1985] had earlier determined the exact boundary of avoidability for a fixed dependency graph through its independence polynomial, giving the optimal -regular threshold against which the Erdős-Lovász criterion is a clean but lossy sufficient condition.

Bibliography Master

@incollection{erdoslovasz1975,
  author    = {Erd{\H{o}}s, Paul and Lov{\'a}sz, L{\'a}szl{\'o}},
  title     = {Problems and results on 3-chromatic hypergraphs and some related questions},
  booktitle = {Infinite and Finite Sets},
  editor    = {Hajnal, A. and Rado, R. and S{\'o}s, V. T.},
  series    = {Colloquia Mathematica Societatis J{\'a}nos Bolyai},
  volume    = {10},
  publisher = {North-Holland},
  pages     = {609--627},
  year      = {1975}
}

@article{mosertardos2010,
  author  = {Moser, Robin A. and Tardos, G{\'a}bor},
  title   = {A constructive proof of the general Lov{\'a}sz Local Lemma},
  journal = {Journal of the ACM},
  volume  = {57},
  number  = {2},
  pages   = {Art. 11},
  year    = {2010}
}

@article{shearer1985,
  author  = {Shearer, James B.},
  title   = {On a problem of Spencer},
  journal = {Combinatorica},
  volume  = {5},
  number  = {3},
  pages   = {241--245},
  year    = {1985}
}

@inproceedings{moser2009,
  author    = {Moser, Robin A.},
  title     = {A constructive proof of the Lov{\'a}sz Local Lemma},
  booktitle = {Proceedings of the 41st Annual ACM Symposium on Theory of Computing (STOC)},
  pages     = {343--350},
  year      = {2009}
}

@book{molloyreed2002,
  author    = {Molloy, Michael and Reed, Bruce},
  title     = {Graph Colouring and the Probabilistic Method},
  publisher = {Springer},
  year      = {2002}
}

@book{spencer1994ten,
  author    = {Spencer, Joel H.},
  title     = {Ten Lectures on the Probabilistic Method},
  series    = {CBMS-NSF Regional Conference Series in Applied Mathematics},
  volume    = {64},
  publisher = {SIAM},
  year      = {1994}
}

@book{alonspencer2016,
  author    = {Alon, Noga and Spencer, Joel H.},
  title     = {The Probabilistic Method},
  edition   = {4},
  publisher = {Wiley-Interscience},
  year      = {2016}
}