19.04.01 · eco-evo-bio / drift

Genetic drift

shipped3 tiersLean: nonepending prereqs

Anchor (Master): Ewens Mathematical Population Genetics I; Wakeley Coalescent Theory; primary literature — Wright 1931, Kimura 1968, Kingman 1982, Ohta 1973

Intuition [Beginner]

Genetic drift is random change in allele frequencies due to chance events in reproduction. In any population, not every individual passes on their genes equally — some have more offspring by luck, not because their genes are better. This randomness is genetic drift.

Drift is strongest in small populations. Imagine a population of just 10 individuals, where allele A is at 50% frequency (5 copies). By chance, the individuals carrying A might have fewer offspring than those carrying a — not because A is worse, but simply because of random variation in reproductive success. In the next generation, A might drop to 30% or rise to 60%. Over many generations, this random fluctuation can drive an allele to fixation (100% frequency) or loss (0%), even without any selective advantage or disadvantage.

In a large population (say, 1 million individuals), the same random fluctuations occur, but their effect on allele frequency is much smaller. The law of large numbers ensures that the allele frequency stays close to its expected value. Drift is therefore inversely proportional to population size.

The key insight: drift is evolution by chance, not by adaptation. A neutral allele (one with no fitness effect) can spread to fixation purely by drift, and a slightly beneficial allele can be lost by drift if the population is small enough. The relative strength of selection versus drift is captured by the product $N s$ (population size times selection coefficient): when $N s$ is much greater than 1, selection dominates; when $N s$ is much less than 1, drift dominates.

Visual [Beginner]

Imagine 20 simulations of drift in a population of $N = 20$ diploid individuals (so 40 gene copies), starting with allele frequency $p = 0.5$ . Each line tracks allele frequency over generations.

Some replicates fix the A allele quickly; others lose it. The average time to fixation or loss depends on population size: in a population of $2 N$ gene copies, the expected time to fixation of a neutral allele starting at frequency $p$ is approximately $- 4 N [p ln p + (1 - p) ln (1 - p)]$ generations.

Worked example [Beginner]

In a population of $N = 50$ diploid individuals, a new neutral mutation arises as a single copy. What is the probability that it eventually fixes?

The fixation probability of a single new neutral allele in a diploid population of size $N$ is:

P_{fix} = \frac{1}{2 N} .

For $N = 50$ : $P_{fix} = 1/ (2 \times 50) = 1/100 = 1%$ .

Step 1. There are $2 N = 100$ gene copies. The new mutation is 1 copy out of 100, so its initial frequency is $p_{0} = 1/100$ .

Step 2. Under neutral drift, the expected frequency at any future time equals the current frequency. The process has no directional bias.

Step 3. Eventually the allele fixes ( $p = 1$ ) or is lost ( $p = 0$ ). The probability of fixation equals the initial frequency: $1/ (2 N) = 1%$ .

What this tells us: 99% of new neutral mutations are lost by drift. But the occasional one drifts to fixation purely by chance, taking on average about $4 N = 200$ generations to do so.

Check your understanding [Beginner]

Exercise (easy, multiple choice).

A bottleneck (sharp reduction in population size) and a founder effect (colonisation by a few individuals) both amplify genetic drift. Which statement correctly distinguishes them?

A. A bottleneck increases genetic diversity; a founder effect decreases it. B. A bottleneck reduces a large population; a founder effect starts a new one from a small sample. C. A founder effect only occurs in plants; bottlenecks only occur in animals. D. They are identical processes with different names.

Hint

Both reduce genetic variation through sampling error, but the triggering events differ.

Answer

B. A population bottleneck occurs when a large population is drastically reduced in size (e.g., by a natural disaster). A founder effect occurs when a small number of individuals colonise a new habitat. Both reduce genetic diversity through the same sampling mechanism — the survivors or founders carry only a subset of the original variation — but bottlenecks reduce an existing population while founder effects establish a new one. Feedback-wrong: A is backwards (both decrease diversity); C and D are incorrect.

Formal definition [Intermediate+]

The Wright-Fisher model

The Wright-Fisher model 19.02.05 is the standard mathematical model of genetic drift. Consider a diploid population of constant size $N$ (so $2 N$ gene copies at a locus). Let $X_{t}$ be the number of copies of allele A in generation $t$ . The model assumes:

Generations are non-overlapping (discrete time).
Each gene copy in generation $t + 1$ is drawn independently and uniformly at random from the $2 N$ copies in generation $t$ (with replacement, multinomial sampling).

Under these assumptions, $X_{t + 1}$ given $X_{t} = i$ follows a binomial distribution:

X_{t + 1} \sim Binomial (2 N, p_{t}),

where $p_{t} = i / (2 N)$ is the allele frequency in generation $t$ . The allele frequency in the next generation is:

p_{t + 1} = X_{t + 1} / (2 N) .

The expected value is $E [p_{t + 1}] = p_{t}$ (drift has no directional bias). The variance is $Var (p_{t + 1}) = p_{t} (1 - p_{t}) / (2 N)$ . This variance — the magnitude of random fluctuation per generation — is inversely proportional to population size.

Fixation probability

For a neutral allele at initial frequency $p_{0}$ in a Wright-Fisher population of size $N$ :

P_{fix} = p_{0} .

For a new mutation (initial frequency $p_{0} = 1/ (2 N)$ ):

P_{fix} = \frac{1}{2 N} .

This result follows from the martingale property of neutral drift: the expected allele frequency at any future time equals the current frequency.

Effective population size

The effective population size $N_{e}$ is the size of an ideal Wright-Fisher population that would produce the same rate of genetic drift as the actual population. Deviations from the ideal (unequal sex ratio, variation in offspring number, population size fluctuations) reduce $N_{e}$ below the census size $N$ :

N_{e} \leq N .

For example, in a population with unequal sex ratio ( $N_{m}$ males, $N_{f}$ females):

N_{e} = \frac{4 N _{m} N _{f}}{N _{m} + N _{f}} .

If there are 10 males and 990 females ( $N = 1000$ ), $N_{e} = 4 \times 10 \times 990/1000 = 39.6$ . The effective population size is only 40 despite 1000 individuals, because the few males are the genetic bottleneck.

Time to fixation

The expected time to fixation (conditional on fixation occurring) for a neutral allele at initial frequency $p_{0}$ in a Wright-Fisher population is:

T_{fix} (p_{0}) = - 4 N_{e} [\frac{( 1 - p _{0} )}{p _{0}} ln (1 - p_{0})] .

For a new mutation ( $p_{0} = 1/ (2 N)$ ), $T_{fix} \approx 4 N_{e}$ generations.

The coalescent

The coalescent (Kingman, 1982) traces the genealogy of a sample of genes backward in time. Two gene copies coalesce (find their most recent common ancestor) at rate $1/ (2 N_{e})$ per generation. For a sample of $k$ lineages, the time until the next coalescent event is exponentially distributed with rate $(2 k) / (2 N_{e})$ . The total time to the most recent common ancestor (TMRCA) of the entire sample is:

E [T_{MRCA}] = 2 N_{e} i = 1 \sum n - 1 \frac{1}{i} \approx 2 N_{e} ln (n),

where $n$ is the sample size.

Counterexamples to common slips

Drift only matters in small populations. Drift operates in all finite populations. Its magnitude scales as $1/ 2 N$ per generation; drift dominates selection whenever $∣ s ∣ < 1/ (2 N)$ . Even in populations of millions, sufficiently weak selection is indistinguishable from drift.
Drift always reduces fitness. Drift can fix deleterious alleles, neutral alleles, or beneficial alleles. The net fitness effect depends on population size and the distribution of fitness effects among mutations. Drift has no directional bias — it simply magnifies sampling variance.
Neutral theory says most mutations have no effect. Neutral theory says most fixed substitutions are neutral or nearly neutral. Most new mutations are deleterious and removed by purifying selection before they can drift.

Key theorem with proof [Intermediate+]

Theorem (Fixation probability of a new neutral allele). In the Wright-Fisher model with population size $N$ , the probability that a newly arisen neutral allele present in a single copy eventually reaches fixation is $1/ (2 N)$ .

Proof. Let $p_{0} = 1/ (2 N)$ be the initial frequency of the new allele. The Wright-Fisher model for a neutral allele is a martingale: $E [p_{t + 1} ∣ p_{t}] = p_{t}$ . By the martingale stopping theorem, the expected frequency at the time of absorption (fixation or loss) equals the initial frequency:

E [p_{\infty}] = p_{0} .

At absorption, $p_{\infty}$ is either 1 (fixation) or 0 (loss). Let $P_{fix}$ be the probability of fixation. Then:

E [p_{\infty}] = P_{fix} \times 1 + (1 - P_{fix}) \times 0 = P_{fix} .

Therefore $P_{fix} = p_{0} = 1/ (2 N)$ . $□$

Bridge. The martingale argument builds toward 19.02.05, where the diffusion approximation generalises fixation probability to selected alleles via Kolmogorov's backward equation. This is exactly the bridge between discrete Wright-Fisher sampling and the continuous stochastic calculus of allele-frequency dynamics: the central insight is that the martingale property $E [p_{\infty}] = p_{0}$ extends to $E [p_{\infty}] = (1 - e^{- 4 N s p_{0}}) / (1 - e^{- 4 N s})$ when drift and selection act together. The foundational reason drift and selection can be unified in a single framework is that both are captured by the infinitesimal mean and variance of the diffusion process, which appears again in 19.05.01 pending as the quantitative-genetics partition of phenotypic change into selection response and drift.

Exercises [Intermediate+]

Exercise 4 (medium, symbolic).

In the Wright-Fisher model, show that the expected heterozygosity $H_{t} = 2 p_{t} (1 - p_{t})$ satisfies $E [H_{t}] = H_{0} (1 - 1/ (2 N))^{t}$ . Derive this recurrence from the binomial sampling variance.

Hint

Compute $E [p_{t + 1} (1 - p_{t + 1}) ∣ p_{t}]$ using $Var (p_{t + 1} ∣ p_{t}) = p_{t} (1 - p_{t}) / (2 N)$ .

Answer

Write $p_{t + 1} = p_{t} + ϵ$ where $E [ϵ ∣ p_{t}] = 0$ and $Var (ϵ ∣ p_{t}) = p_{t} (1 - p_{t}) / (2 N)$ . Then:

E [p_{t + 1} (1 - p_{t + 1}) ∣ p_{t}] = E [(p_{t} + ϵ) (1 - p_{t} - ϵ) ∣ p_{t}] = p_{t} (1 - p_{t}) - E [ϵ^{2} ∣ p_{t}] .

Since $E [ϵ^{2} ∣ p_{t}] = Var (ϵ ∣ p_{t}) = p_{t} (1 - p_{t}) / (2 N)$ :

E [H_{t + 1} ∣ p_{t}] = H_{t} - \frac{H _{t}}{2 N} = H_{t} (1 - \frac{1}{2 N}) .

Iterating: $E [H_{t}] = H_{0} (1 - 1/ (2 N))^{t}$ . Heterozygosity decays exponentially at rate $1/ (2 N)$ per generation, and the population eventually becomes monomorphic.

Exercise 6 (hard, numeric).

A population of $N = 1000$ diploids has a neutral allele at frequency $p_{0} = 0.3$ . Using the diffusion approximation, compute the probability of fixation. Then compute the expected time to absorption (fixation or loss) in generations.

Hint

For a neutral allele, $P_{fix} = p_{0}$ . The expected time to absorption is $T_{abs} = - 4 N_{e} [p_{0} ln p_{0} + (1 - p_{0}) ln (1 - p_{0})]$ .

Answer

$P_{fix} = p_{0} = 0.3$ . Expected time to absorption:

T_{abs} = - 4 \times 1000 \times [0.3 ln (0.3) + 0.7 ln (0.7)] = - 4000 \times [0.3 \times (- 1.204) + 0.7 \times (- 0.357)] = - 4000 \times [- 0.361 - 0.250] = - 4000 \times (- 0.611) = 2444 generations .

With probability 0.3 the allele fixes (taking about $4 N_{e} (1 - p_{0}) / p_{0} \times ∣ ln (1 - p_{0}) ∣$ generations on average), and with probability 0.7 it is lost. The unconditional absorption time of 2444 generations is the weighted average of the conditional fixation and loss times.

Exercise 7 (hard, symbolic).

For the Wright-Fisher model with selection coefficient $s$ (additive fitness: genotypes AA, Aa, aa have fitness $1 + s$ , $1 + s /2$ , $1$ ), derive the diffusion approximation for the fixation probability $u (p)$ of allele A starting at frequency $p$ .

Hint

The backward Kolmogorov equation is $(1/2) V (p) u^{''} (p) + M (p) u^{'} (p) = 0$ with $M (p) = s p (1 - p)$ and $V (p) = p (1 - p) / (2 N)$ . Solve with boundary conditions $u (0) = 0$ , $u (1) = 1$ .

Answer

The backward equation is:

\frac{p ( 1 - p )}{4 N} u^{''} (p) + s p (1 - p) u^{'} (p) = 0.

Dividing by $p (1 - p) /4 N$ : $u^{''} (p) + 4 N s \cdot u^{'} (p) = 0$ . The solution is $u (p) = A + B e^{- 4 N s p}$ . Applying $u (0) = 0$ : $A + B = 0$ , so $B = - A$ . Applying $u (1) = 1$ : $A (1 - e^{- 4 N s}) = 1$ , giving $A = 1/ (1 - e^{- 4 N s})$ . Therefore:

u (p) = \frac{1 - e ^{- 4 N s p}}{1 - e ^{- 4 N s}} .

When $N s ≫ 1$ : $u (p) \approx 1 - e^{- 4 N s p}$ , so $u (1/ (2 N)) \approx 2 s$ — selection dominates. When $N s ≪ 1$ : $u (p) \approx p$ — drift dominates. The boundary $N s \approx 1$ separates the two regimes.

Exercise 8 (hard, symbolic).

In the island model with $d$ demes of equal size $N$ , each generation a fraction $m$ of each deme is replaced by migrants drawn from the total population. Derive Wright's $F_{S T} = 1/ (4 N m + 1)$ as a balance between drift (which differentiates demes) and migration (which homogenises them).

Hint

Track the probability that two genes drawn from the same deme are identical by descent. Drift increases this probability by $1/ (2 N)$ per generation; migration decreases it by a factor involving $m$ .

Answer

Let $F_{t}$ be the probability that two genes sampled from the same deme are identical by descent (IBD) in generation $t$ . Under drift within demes, two genes coalesce with probability $1/ (2 N)$ per generation. Under migration, each gene has probability $m$ of being a migrant, so the probability that both are non-migrants (and thus could be IBD from within the deme) is $(1 - m)^{2}$ . The recursion is:

F_{t + 1} = (1 - m)^{2} [\frac{1}{2 N} + (1 - \frac{1}{2 N}) F_{t}] .

At equilibrium ( $F_{t + 1} = F_{t} = F_{S T}$ ), solving for $F_{S T}$ and approximating for small $m$ :

F_{S T} \approx \frac{1}{4 N m + 1} .

When $N m ≫ 1$ (high migration), $F_{S T} \approx 0$ : demes are genetically homogeneous. When $N m ≪ 1$ (low migration), $F_{S T} \approx 1$ : demes are fully differentiated by drift.

Wright-Fisher, Moran, and the diffusion approximation [Master]

The Wright-Fisher model 19.02.05 is a discrete-time Markov chain on state space ${0, 1, \dots, 2 N}$ with transition matrix $P_{ij} = (j 2 N) (i /2 N)^{j} (1 - i /2 N)^{2 N - j}$ . States 0 and $2 N$ are absorbing. The spectral decomposition of this transition matrix reveals that the $k$ -th eigenvalue is $(1 - k (k - 1) / (4 N)) (1 + O (1/ N))$ for $k = 1, \dots, 2 N - 1$ , and the rate of convergence to absorption is governed by the largest non-unit eigenvalue $λ_{1} \approx 1 - 1/ (2 N)$ . This eigenvalue determines the rate of heterozygosity decay: $E [H_{t}] = H_{0} λ_{1}^{t} \approx H_{0} (1 - 1/ (2 N))^{t}$ , recovering the exponential decay derived at the intermediate level.

The Moran model provides a continuous-time alternative. At each infinitesimal time step, one individual is chosen to reproduce (proportional to fitness) and one is chosen to die (uniformly). For a neutral allele with $i$ copies in a haploid population of size $N$ , the transition rates are: $i \to i + 1$ at rate $i (N - i) / (N^{2})$ and $i \to i - 1$ at rate $i (N - i) / (N^{2})$ . The Moran model has the same fixation probability as Wright-Fisher ( $P_{fix} = p_{0}$ for neutral alleles) but the time scale differs: the Moran model's fixation time is approximately $N^{2}$ time steps, which translates to $N^{2} \times (1/ N) = N$ generations (each generation corresponds to $N$ birth-death events). This is half the Wright-Fisher fixation time of $\sim 2 N$ generations because the Moran model's overlapping generations allow faster allele turnover.

The diffusion approximation bridges both discrete models to continuous stochastic calculus. Rescale time as $τ = t / (2 N)$ and allele frequency as $p = X / (2 N)$ . As $N \to \infty$ , both the Wright-Fisher and Moran processes converge to the same diffusion on $[0, 1]$ with infinitesimal mean $M (p) = α p (1 - p)$ (where $α = 2 N s$ for selection) and infinitesimal variance $V (p) = p (1 - p)$ . The probability density $ϕ (p, τ)$ of the allele frequency at time $τ$ satisfies the Kolmogorov forward equation (Fokker-Planck equation):

\frac{\partial ϕ}{\partial τ} = - \frac{\partial}{\partial p} [M (p) ϕ] + \frac{1}{2} \frac{\partial ^{2}}{\partial p ^{2}} [V (p) ϕ] .

For the neutral case ( $M (p) = 0$ , $V (p) = p (1 - p)$ ), this reduces to:

\frac{\partial ϕ}{\partial τ} = \frac{1}{2} \frac{\partial ^{2}}{\partial p ^{2}} [p (1 - p) ϕ] .

The solution involves Gegenbauer polynomials $C_{k}^{(1)} (1 - 2 p)$ and decays as $e^{- k (k + 1) τ /4}$ for the $k$ -th mode, recovering the eigenvalue structure of the discrete chain in the continuous limit ^{[Ewens 2004]}.

The backward Kolmogorov equation governs fixation probabilities. For an allele with selection coefficient $s$ and additive fitness, the infinitesimal mean is $M (p) = s p (1 - p)$ and the fixation probability $u (p)$ satisfies:

\frac{1}{2} V (p) u^{''} (p) + M (p) u^{'} (p) = 0, u (0) = 0, u (1) = 1.

This yields $u (p) = (1 - e^{- 4 N s p}) / (1 - e^{- 4 N s})$ , the central result connecting drift and selection. The product $N s$ partitions evolutionary dynamics into two regimes: $N s ≫ 1$ where selection determines outcomes ( $u (1/ (2 N)) \approx 2 s$ ), and $N s ≪ 1$ where drift overrides selection ( $u (1/ (2 N)) \approx 1/ (2 N)$ ). The boundary $N s \approx 1$ is the drift-selection threshold, the population-genetic equivalent of the boundary between deterministic and stochastic regimes in statistical mechanics.

Effective population size $N_{e}$ has three operational definitions that coincide for the ideal Wright-Fisher population but diverge under realistic conditions ^{[Hartl & Clark 2007]}:

Variance $N_{e}$ : the size that yields $Var (Δ p) = p (1 - p) / (2 N_{e}^{(v)})$ . Under variable offspring number with variance $σ_{k}^{2}$ in reproductive output: $N_{e}^{(v)} = (4 N - 2) / (σ_{k}^{2} + 2)$ .
Inbreeding $N_{e}$ : the size that yields the observed rate of increase in the inbreeding coefficient. For a population fluctuating in size over generations: $1/ N_{e}^{(i)} = (1/ t) \sum_{k = 1}^{t} 1/ N_{k}$ , the harmonic mean of census sizes.
Eigenvalue $N_{e}$ : the size whose leading non-unit eigenvalue $1 - 1/ (2 N_{e}^{(λ)})$ matches the observed rate of decay of heterozygosity. For overlapping generations with age structure: $N_{e}^{(λ)}$ depends on the generation time and the age-specific survivorship schedule.

The harmonic-mean formula for fluctuating populations has a stark implication: a single generation of severe bottleneck dominates $N_{e}$ . A population that spends 99 generations at $N = 10, 000$ and one generation at $N = 10$ has $N_{e} \approx 1000/ (0.99 \times 0.0001 + 0.01 \times 0.1) \approx 991$ — the single bottleneck generation halves the effective size.

Coalescent theory and gene genealogies [Master]

Kingman's coalescent (1982) derives from a simple observation: in a finite population, any sample of genes shares common ancestors when traced backward in time. For a sample of $n$ genes from a diploid population of effective size $N_{e}$ , the probability that a specific pair of lineages shares a parent in the preceding generation is $1/ (2 N_{e})$ . The probability that some pair among $k$ remaining lineages coalesces in a given generation is $(2 k) / (2 N_{e})$ , and the waiting time $T_{k}$ until the next coalescence is geometrically distributed with mean $2 N_{e} / (2 k)$ generations.

As $N_{e} \to \infty$ with $n$ fixed, the rescaled process converges to a continuous-time Markov chain on set partitions — Kingman's coalescent — where $T_{k}$ becomes exponential with rate $(2 k) / (2 N_{e})$ . This convergence holds whenever the offspring distribution has finite variance (the Cannings model generalisation), establishing the coalescent as the universal genealogical process for populations with moderate variance in reproductive success.

The total expected time to the most recent common ancestor (TMRCA) of a sample of $n$ genes is:

E [T_{MRCA}] = k = 2 \sum n E [T_{k}] = 2 N_{e} k = 2 \sum n \frac{2}{k ( k - 1 )} = 4 N_{e} (1 - \frac{1}{n}) .

For the entire population ( $n = 2 N_{e}$ ), $E [T_{MRCA}] \approx 4 N_{e}$ generations. The total tree length, which determines the expected number of mutations observed in the sample, is:

E [L_{total}] = k = 2 \sum n k \cdot E [T_{k}] = 2 N_{e} k = 2 \sum n \frac{2}{k - 1} = 4 N_{e} j = 1 \sum n - 1 \frac{1}{j} = 4 N_{e} H_{n - 1} \approx 4 N_{e} ln (n),

where $H_{n - 1}$ is the $(n - 1)$ -th harmonic number. Most of the total tree length accumulates during the period when many lineages are still present (the recent branches), which concentrates mutations toward the tips of the genealogy — a prediction with direct implications for detecting recent population growth (excess of rare variants).

The coalescent provides natural estimators of the population-genetic parameter $θ = 4 N_{e} μ$ (where $μ$ is the mutation rate per generation per site). Two canonical estimators are:

Watterson's estimator uses the number of segregating sites $S$ (positions where the sample contains two or more distinct nucleotides). Under the infinite-sites model (each mutation occurs at a novel site), $E [S] = μ \cdot E [L_{total}] = θ \cdot a_{n}$ where $a_{n} = \sum_{j = 1}^{n - 1} 1/ j$ . This gives $\hat{θ}_{W} = S / a_{n}$ .

Tajima's estimator uses the average number of pairwise differences $π = \frac{2}{n ( n - 1 )} \sum_{i < j} d_{ij}$ where $d_{ij}$ is the number of sites at which sequences $i$ and $j$ differ. Under neutrality, $E [π] = θ$ , giving $\hat{θ}_{π} = π$ .

Both estimators converge to $θ$ under neutrality, but they weight different parts of the genealogy differently. Watterson's estimator weights all branches equally (through total tree length), while Tajima's estimator weights by pairwise path lengths, which disproportionately reflects the deeper branches. Tajima's D $= (\hat{θ}_{π} - \hat{θ}_{W}) / SE$ exploits this difference: under neutrality $E [D] = 0$ ; significantly negative $D$ indicates an excess of rare variants (population expansion or purifying selection); significantly positive $D$ indicates a deficit of rare variants (population bottleneck or balancing selection) ^{[Hartl & Clark 2007]}.

Several extensions broaden the coalescent framework:

The structured coalescent models a population divided into $d$ demes with migration rate $m_{ij}$ between demes $i$ and $j$ . Lineages are tracked backward through both coalescence (within demes) and migration (between demes). A lineage in deme $i$ migrates to deme $j$ at rate $m_{ij}$ , and two lineages in deme $i$ coalesce at rate $1/ (2 N_{i})$ . The interplay between migration and coalescence determines the expected coalescence time for genes from the same vs. different demes, which is the genealogical basis for $F_{S T}$ -based inference of migration rates.

The coalescent with recombination (Hudson 1983; Griffiths and Marjoram 1997) traces the ancestry of multiple linked loci simultaneously. Recombination splits a lineage into two ancestral lineages, each carrying part of the genetic material, producing an ancestral recombination graph (ARG). The rate of recombination events is $ρ = 4 N_{e} r$ per generation per locus (where $r$ is the recombination rate), and the ARG contains more branches than a simple tree — enough to carry the information needed for haplotype-based inference methods and linkage-disequilibrium mapping.

Multiple-merger coalescents ( $Λ$ -coalescents; Pitman 1999, Sagitov 1999) arise when the offspring distribution has heavy tails, as in sweepstakes reproduction (marine species with high fecundity and high juvenile mortality). In these models, a single parent can contribute a large fraction of the next generation, producing simultaneous coalescence of multiple lineages. The Kingman coalescent is the special case where the rate measure $Λ$ is concentrated at 0; Beta-coalescents (where $Λ$ follows a Beta distribution) model marine species with high variance in reproductive success and predict reduced genetic diversity relative to Kingman-based expectations.

Molecular clock, neutral theory, and nearly neutral theory [Master]

The molecular clock hypothesis emerged from the observation that amino-acid differences between homologous proteins in different species accumulate roughly linearly with time since divergence. Zuckerkandl and Pauling (1962) noted that haemoglobin sequences from different mammalian lineages showed approximately constant rates of substitution when calibrated against the fossil record ^{[Zuckerkandl & Pauling 1962]}. This clock-like regularity posed a puzzle: if most substitutions were driven by positive selection, the rate should depend on the rate of environmental change and the strength of selective pressures — both of which vary across lineages and time periods.

Kimura (1968) resolved this with the neutral theory of molecular evolution ^{[Kimura 1968]}. The argument proceeds in two steps:

First, the rate of substitution at a neutral locus. Let $μ$ be the mutation rate to a new neutral allele per generation per gene copy. In a diploid population of size $N$ , there are $2 N μ$ new neutral mutations per generation. Each has fixation probability $1/ (2 N)$ . The substitution rate is:

k = 2 N μ \times \frac{1}{2 N} = μ .

The substitution rate equals the mutation rate and is independent of population size. This is the neutral molecular clock: the rate of molecular evolution is set entirely by the biochemical rate of replication errors, not by population size or selective advantage.

Second, the genetic load argument. If most substitutions were driven by positive selection with selective advantage $s$ , the rate of substitution would be $k = 2 N μ \times 2 s = 4 N s μ$ (using the approximation $u \approx 2 s$ for beneficial alleles when $N s ≫ 1$ ). Haldane (1957) estimated that the total selective deaths a population can tolerate per generation — the substitutional load — is bounded. If substitutions occur at rate $k$ across $L$ loci, the total load is proportional to $\sum_{i = 1}^{L} s_{i}$ , which rapidly becomes unsustainable for realistic genome sizes and observed substitution rates. The neutral theory avoids this load because neutral substitutions impose no selective cost: every substitution occurs by drift, not by selective replacement.

The neutral theory does not claim that selection is unimportant. Its claims are:

The vast majority of new mutations are deleterious and removed by purifying selection before reaching appreciable frequency.
Among the mutations that escape purifying selection, most are effectively neutral (fitness effect $∣ s ∣ ≪ 1/ (2 N_{e})$ ).
Most fixed differences between species are neutral or nearly neutral substitutions fixed by drift, not adaptive substitutions driven by selection.

Selection is responsible for adaptation at the phenotypic level; drift is responsible for most molecular change at the sequence level.

Ohta's nearly neutral theory (1973) refined Kimura's framework by recognising that mutations with small but nonzero fitness effects ( $∣ s ∣ \sim 1/ (2 N_{e})$ ) behave differently in populations of different size ^{[Ohta 1973]}. In large populations ( $N_{e}$ large), $∣ s ∣ ≫ 1/ (2 N_{e})$ and selection removes mildly deleterious mutations efficiently. In small populations ( $N_{e}$ small), $∣ s ∣ ≪ 1/ (2 N_{e})$ and the same mutations behave as effectively neutral, drifting to fixation with probability $1/ (2 N_{e})$ rather than being eliminated by selection. This predicts a negative correlation between substitution rate and effective population size for slightly deleterious mutations — a pattern observed in mitochondrial DNA, where species with smaller $N_{e}$ show higher rates of non-synonymous substitution.

The dN/dS ratio operationalises the neutral-nearly neutral framework. Let $d_{N}$ be the rate of non-synonymous substitution (amino-acid-changing mutations) and $d_{S}$ be the rate of synonymous substitution (mutations that do not change the amino acid, presumed neutral). Under strict neutrality, $d_{N} / d_{S} = 1$ (both classes fix at rate $μ$ ). Under purifying selection, $d_{N} / d_{S} < 1$ (most non-synonymous mutations are deleterious and removed). Under positive selection, $d_{N} / d_{S} > 1$ (non-synonymous mutations are favoured and fix faster than neutral rate). The McDonald-Kreitman test (1991) compares within-species polymorphism to between-species divergence at synonymous and non-synonymous sites, providing a direct test of the neutral prediction that the ratio of non-synonymous to synonymous variation should be the same at both levels.

The generation-time hypothesis addresses why molecular clocks tick at different rates in organisms with different generation times. Under the neutral theory with rate $μ$ per generation, organisms with shorter generations should show faster molecular clocks per year (more generations per unit time). Data broadly support this for synonymous sites, but non-synonymous sites and mitochondrial DNA show weaker generation-time effects — consistent with Ohta's nearly neutral theory, where slightly deleterious mutations in long-lived species (large $N_{e}$ , long generation time) are efficiently removed by selection, slowing the non-synonymous clock per generation even as the generation-time effect accelerates it per year.

Genetic drift in conservation and population structure [Master]

Genetic drift has direct consequences for conservation biology because endangered species exist as small populations where drift is strong. The primary consequence is loss of genetic variation: heterozygosity declines at rate $1/ (2 N_{e})$ per generation, reducing the population's capacity to respond to future environmental change. A secondary consequence is accumulation of deleterious mutations: mildly deleterious alleles that would be eliminated by selection in large populations drift to higher frequency in small populations, reducing mean fitness — a process called mutational meltdown when it becomes self-reinforcing (reduced fitness reduces population size, which accelerates drift, which further reduces fitness).

Bottleneck effects are particularly severe for allelic diversity (the number of distinct alleles maintained) because rare alleles are lost first. Heterozygosity, which weights alleles by frequency, is relatively resilient to bottlenecks: a single-generation bottleneck of size $N_{b}$ reduces heterozygosity by a factor of only $1 - 1/ (2 N_{b})$ . But the number of alleles declines much faster: an allele at frequency $q$ in the source population survives the bottleneck with probability $1 - (1 - q)^{2 N_{b}}$ , so alleles with $q < 1/ (2 N_{b})$ are almost always lost. This asymmetry — heterozygosity recovers through mutation over $1 0^{5}$ – $1 0^{6}$ generations, but allelic diversity is irreplaceable on ecological time scales — is the genetic basis for the conservation principle that allelic richness is a more sensitive indicator of bottleneck severity than heterozygosity.

The 50/500 rule (Franklin 1980) provides operational thresholds ^{[Franklin 1980]}. An effective population size of $N_{e} \geq 50$ is sufficient to prevent inbreeding depression from causing immediate extinction in the short term (roughly 5 generations), because the rate of increase in inbreeding per generation is $1/ (2 N_{e}) \leq 1%$ , which most mammal populations tolerate. An $N_{e} \geq 500$ maintains enough additive genetic variance for quantitative traits that the population can respond to selection over evolutionary time scales (the equilibrium additive variance under drift-mutation balance is $V_{A} = 2 N_{e} V_{M}$ where $V_{M}$ is the mutational variance per generation, and Franklin showed that $N_{e} = 500$ balances drift-induced loss against mutation-generated gain for typical trait architectures). These numbers are guidelines, not sharp thresholds; the actual minimum viable population depends on life history, mating system, generation time, and environmental variability.

Genetic rescue — the introduction of genetic material from a different population to counteract inbreeding depression — was dramatically demonstrated in the Florida panther (Puma concolor coryi). By the early 1990s, the population had declined to $N \approx 20$ –25 adults, showing severe inbreeding depression: cryptorchidism (undescended testicles, rate $\sim$ 90%), kinked tails, atrial septal defects, and reduced sperm quality. In 1995, eight female Texas pumas (the closest extant population, diverged $\sim$ 100–200 years) were introduced. Within two generations, heterozygosity increased by 24%, the frequency of cryptorchidism dropped from 90% to below 40%, survival rates of kittens increased threefold, and the population grew to $\sim$ 200 individuals by 2010 ^{[Hartl & Clark 2007]}. The rescue worked because the introduced alleles broke the homozygosity at deleterious recessive loci — a direct demonstration that drift-induced fixation of deleterious alleles is reversible through gene flow.

F-statistics quantify the partitioning of genetic variation within and among subpopulations. Wright (1931) defined three related quantities ^{[Wright 1931]}:

$F_{I S}$ : the inbreeding coefficient of an individual relative to its subpopulation — the probability that two alleles at a locus in an individual are identical by descent, given the allele frequencies in the individual's subpopulation. Positive $F_{I S}$ indicates inbreeding (excess homozygosity within demes); negative $F_{I S}$ indicates outbreeding.

$F_{S T}$ : the proportion of total genetic variance contained in the subpopulation relative to the total population. For a biallelic locus: $F_{S T} = (H_{T} - H_{S}) / H_{T}$ where $H_{T}$ is the expected heterozygosity in the total population and $H_{S}$ is the average expected heterozygosity within subpopulations. Under the infinite-island model at equilibrium between drift and migration, $F_{S T} = 1/ (4 N_{e} m + 1)$ where $N_{e} m$ is the number of migrants per generation.

$F_{I T}$ : the inbreeding coefficient of an individual relative to the total population — the overall probability that two alleles in an individual are identical by descent. These satisfy the hierarchy $1 - F_{I T} = (1 - F_{I S}) (1 - F_{S T})$ .

AMOVA (Analysis of Molecular Variance; Excoffier et al. 1992) extends F-statistics to molecular data by partitioning the total sum of squared pairwise differences into within-individual, among-individuals-within-populations, and among-populations components. The method applies to any distance metric on sequence data (nucleotide substitutions, microsatellite stepwise mutation model, SNP haplotypes) and provides a hypothesis-testing framework via permutation tests for the significance of population structure.

The cheetah (Acinonyx jubatus) provides a second illustrative case of severe drift-induced erosion. Modern cheetahs show remarkably low genetic variation at MHC loci, microsatellites, and allozymes — skin grafts between unrelated cheetahs are accepted as though the animals were identical twins, a hallmark of extreme homozygosity across the genome. The bottleneck likely occurred near the Pleistocene-Holocene transition roughly 10,000 years ago, when cheetah populations crashed to an estimated $N_{e}$ of a few dozen individuals across their range. The consequence is a species that is genetically depauperate: allozyme polymorphism sits at roughly one-tenth the level observed in other felids, and the surviving variation is insufficient for effective immune response to novel pathogens, contributing to high cub mortality in both wild and captive populations. Unlike the Florida panther rescue, no closely related population exists from which to source genetic material, making the loss effectively permanent on any meaningful time scale.

The conservation implications are direct: populations with high $F_{S T}$ relative to the island-model expectation have restricted gene flow and are vulnerable to local drift-induced loss of variation. Management strategies that maintain $N_{e} m \geq 1$ (one migrant per generation) keep $F_{S T} \leq 0.2$ and maintain sufficient gene flow to prevent substantial differentiation — a rule of thumb derived from Wright's island model that remains the most widely cited gene-flow threshold in conservation genetics.

Full proof set [Master]

Proposition 1 (Spectral decomposition of the Wright-Fisher chain). In the Wright-Fisher model with $2 N$ gene copies and no mutation or selection, the transition matrix $P$ has eigenvalues $λ_{k} = (k 2 N - k) / (k 2 N)$ for $k = 0, 1, \dots, 2 N$ , with $λ_{0} = 1$ , $λ_{1} = 1 - 1/ (2 N)$ , and $λ_{k} \approx 1 - k (k - 1) / (4 N)$ for $k ≪ N$ . The rate of decay of heterozygosity is determined by $λ_{1}$ .

Proof. The eigenvectors of the Wright-Fisher chain are the Krawtchouk polynomials $K_{k} (i) = \sum_{j = 0}^{k} (- 1)^{j} (j i) (k - j 2 N - i)$ for $k = 0, 1, \dots, 2 N$ . The eigenvalue for mode $k$ is:

λ_{k} = \frac{( 2 N - k )! ( 2 N - k )!}{( 2 N )! ( 2 N - 2 k )!} \cdot \frac{( 2 N )!}{( 2 N - k )! k !} \cdot k! = j = 0 \prod k - 1 \frac{2 N - 2 j - 1}{2 N - j} .

For $k = 1$ : $λ_{1} = (2 N - 1) / (2 N) = 1 - 1/ (2 N)$ . Since heterozygosity $H = 2 p (1 - p) = (2 N)^{- 2} \cdot [2 N \cdot i - i^{2}]$ projects onto the $k = 1$ eigenvector, $E [H_{t}] = H_{0} λ_{1}^{t} = H_{0} (1 - 1/ (2 N))^{t}$ , recovering the intermediate-tier result. $□$

Proposition 2 (Coalescent waiting times are exponential). In a diploid population of effective size $N_{e}$ with $k$ distinct lineages, the waiting time $T_{k}$ until the next coalescence event is exponential with rate $(2 k) / (2 N_{e})$ as $N_{e} \to \infty$ with $k$ fixed.

Proof. In any single generation, the probability that a specific pair of lineages shares a parent is $1/ (2 N_{e})$ (both must draw the same parental gene copy). The probability that at least one pair among $(2 k)$ pairs coalesces in one generation is $p_{k} = (2 k) / (2 N_{e}) + O (1/ N_{e}^{2})$ , neglecting the probability of multiple simultaneous coalescences, which is $O (1/ N_{e}^{2})$ . The probability that no coalescence occurs for $t$ consecutive generations is $(1 - p_{k})^{t} \approx e^{- p_{k} t}$ for $p_{k}$ small. In the limit $N_{e} \to \infty$ with rescaled time $τ = t / (2 N_{e})$ , this becomes $e^{- (2 k) τ}$ — the survival function of an exponential with rate $(2 k)$ . $□$

Proposition 3 (Neutral substitution rate equals mutation rate). Under the infinite-sites model with mutation rate $μ$ per generation per gene copy in a diploid population of effective size $N_{e}$ , the rate of neutral substitution is $k = μ$ , independent of $N_{e}$ .

Proof. The number of new neutral mutations arising per generation is $2 N_{e} μ$ (one for each of the $2 N_{e}$ gene copies). Each new mutation starts at frequency $1/ (2 N_{e})$ and, by the martingale fixation probability, fixes with probability $1/ (2 N_{e})$ . The expected number of substitutions per generation is therefore $2 N_{e} μ \times 1/ (2 N_{e}) = μ$ . The $N_{e}$ dependence cancels exactly: larger populations produce more mutations but each has proportionally lower fixation probability. $□$

Connections [Master]

Wright-Fisher model and diffusion approximation 19.02.05. The mathematical machinery of drift — the Wright-Fisher Markov chain, its diffusion limit, the Kolmogorov equations — is developed in full in the Wright-Fisher unit. The present unit applies that machinery to derive population-genetic consequences: fixation probabilities, heterozygosity decay, and the drift-selection threshold. The bridge is that the Wright-Fisher chain's spectral decomposition determines the rate of all drift-driven processes.
Mendelian genetics 19.01.01 pending. Drift operates on the allele-segregation machinery established by Mendelian inheritance: the locus-allele framework, dominance relationships, and the concept of genotype frequency. Drift changes allele frequencies through random sampling of Mendelian segregation events, and the binomial structure of the Wright-Fisher model directly encodes Mendelian assortment into a stochastic process.
Natural selection 19.03.01 pending. Selection and drift are the two forces that change allele frequencies, and the product $N_{e} s$ determines which dominates at any given locus. The fixation probability formula $u (p) = (1 - e^{- 4 N s p}) / (1 - e^{- 4 N s})$ unifies both: when $N_{e} s ≫ 1$ , $u (p) \approx 1 - e^{- 4 N s p}$ (selection-dominated), and when $N_{e} s ≪ 1$ , $u (p) \approx p$ (drift-dominated). This unit provides the drift side of that balance; 19.03.01 pending provides the selection side.
Quantitative genetics 19.05.01 pending. The partitioning of phenotypic change into a selection response ( $R = h^{2} S$ via the breeder's equation) and a drift component ( $E [Δ \overset{z}{ˉ}_{drift}] = 0$ with variance $V_{P} / (2 N_{e})$ ) directly applies the Wright-Fisher variance result. Quantitative-trait drift is allele-frequency drift summed across all loci contributing to the trait, and the drift-selection threshold $N_{e} s \approx 1$ generalises to $N_{e} h^{2} σ_{s}^{2} \approx V_{P}$ for quantitative characters.
Phylogenetics 19.07.01. The coalescent — the retrospective genealogy of sampled genes — provides the null model for phylogenetic tree inference. Coalescent-based species delimitation (using multi-species coalescent models) and demographic inference (Bayesian skyline plots) are direct applications of the genealogical framework developed here. The neutral molecular clock ( $k = μ$ ) provides the time calibration that converts genetic divergence into absolute time in molecular phylogenies.

Historical & philosophical context [Master]

Sewall Wright introduced genetic drift (which he called "random drift") in his 1931 paper Evolution in Mendelian Populations ^{[Wright 1931]}. Wright saw drift as one component of his shifting balance theory: populations explore an adaptive landscape through drift (crossing fitness valleys that selection alone cannot traverse), then selection drives them up new peaks, and interdemic selection spreads the new adaptation. Fisher, in The Genetical Theory of Natural Selection (1930), had emphasised the dominance of selection in large populations and regarded drift as a negligible force except in vanishingly small groups. The Wright-Fisher debate — whether drift or selection is the primary driver of evolution — defined population genetics for decades and remains unresolved in its strongest form, though the neutral and nearly neutral theories have largely settled the molecular version.

Motoo Kimura's neutral theory (1968) was the most consequential application of drift ^{[Kimura 1968]}. Kimura argued that the observed rate of molecular evolution was too high to be explained by positive selection without implausible genetic load, and that the rate of neutral substitution equals the mutation rate. Tomoko Ohta (1973) extended this to the nearly neutral theory, incorporating slightly deleterious mutations whose fate depends on effective population size ^{[Ohta 1973]}. John Kingman's coalescent (1982) unified the retrospective approach to population genetics, providing a mathematically rigorous genealogical framework that underlies virtually all modern statistical methods in the field ^{[Kingman 1982]}.

Bibliography [Master]

@article{Wright1931,
  author = {Wright, Sewall},
  title = {Evolution in {M}endelian populations},
  journal = {Genetics},
  volume = {16},
  year = {1931},
  pages = {97--159},
}

@article{Kimura1968,
  author = {Kimura, Motoo},
  title = {Evolutionary rate at the molecular level},
  journal = {Nature},
  volume = {217},
  year = {1968},
  pages = {624--626},
}

@article{Ohta1973,
  author = {Ohta, Tomoko},
  title = {Slightly deleterious mutant substitutions in evolution},
  journal = {Nature},
  volume = {246},
  year = {1973},
  pages = {96--98},
}

@article{Kingman1982,
  author = {Kingman, J. F. C.},
  title = {The coalescent},
  journal = {Stochastic Processes and their Applications},
  volume = {13},
  year = {1982},
  pages = {235--248},
}

@book{Kimura1983,
  author = {Kimura, Motoo},
  title = {The Neutral Theory of Molecular Evolution},
  publisher = {Cambridge University Press},
  year = {1983},
}

@book{HartlClark2007,
  author = {Hartl, Daniel L. and Clark, Andrew G.},
  title = {Principles of Population Genetics},
  edition = {4th},
  publisher = {Sinauer Associates},
  year = {2007},
}

@book{Ewens2004,
  author = {Ewens, Warren J.},
  title = {Mathematical Population Genetics {I}: Theoretical Introduction},
  edition = {2nd},
  publisher = {Springer},
  year = {2004},
}

@book{Wakeley2008,
  author = {Wakeley, John},
  title = {Coalescent Theory: An Introduction},
  publisher = {Roberts \& Company},
  year = {2008},
}

@book{Futuyma2017,
  author = {Futuyma, Douglas J.},
  title = {Evolution},
  edition = {4th},
  publisher = {Sinauer Associates},
  year = {2017},
}

@incollection{ZuckerkandlPauling1962,
  author = {Zuckerkandl, Emile and Pauling, Linus},
  title = {Molecular disease, evolution, and genic heterogeneity},
  booktitle = {Horizons in Biochemistry},
  editor = {Kasha, Michael and Pullman, Bernard},
  publisher = {Academic Press},
  year = {1962},
  pages = {189--225},
}

@incollection{Franklin1980,
  author = {Franklin, James R.},
  title = {Evolutionary change in small populations},
  booktitle = {Conservation Biology: An Evolutionary-Ecological Perspective},
  editor = {Soule, Michael E. and Wilcox, Bruce A.},
  publisher = {Sinauer Associates},
  year = {1980},
  pages = {135--149},
}

Prerequisites

19.01.01 pending
19.02.05 pending

Tier anchors

beginner: Coyne Why Evolution Is True Ch. 7; Campbell Biology 12th ed. Ch. 23; Crash Course Biology genetic drift episodes
intermediate: Hartl & Clark Principles of Population Genetics 4th ed. Ch. 3, 7; Futuyma Evolution 4th ed. Ch. 10
master: Ewens Mathematical Population Genetics I; Wakeley Coalescent Theory; primary literature — Wright 1931, Kimura 1968, Kingman 1982, Ohta 1973

References

TODO_REF pending
Wright, S. — Evolution in Mendelian populations (Genetics 16, 97-159, 1931) · Originator paper for genetic drift, F-statistics, and the shifting balance theory · see docs/catalogs/NEED_TO_SOURCE.md#bio-wright-1931
TODO_REF pending
Kimura, M. — Evolutionary rate at the molecular level (Nature 217, 624-626, 1968) · Originator paper for the neutral theory of molecular evolution · see docs/catalogs/NEED_TO_SOURCE.md#bio-kimura-1968
TODO_REF pending
Kingman, J. F. C. — The coalescent (Stochastic Processes and their Applications 13, 235-248, 1982) · Originator paper for the coalescent process · see docs/catalogs/NEED_TO_SOURCE.md#bio-kingman-1982
TODO_REF pending
Ohta, T. — Slightly deleterious mutant substitutions in evolution (Nature 246, 96-98, 1973) · Originator paper for the nearly neutral theory · see docs/catalogs/NEED_TO_SOURCE.md#bio-ohta-1973
TODO_REF pending
Hartl, D. L. & Clark, A. G. — Principles of Population Genetics, 4th ed. (Sinauer, 2007) · Ch. 3 Random genetic drift; Ch. 7 The coalescent · see docs/catalogs/NEED_TO_SOURCE.md#bio-hartl-clark-2007
TODO_REF pending
Ewens, W. J. — Mathematical Population Genetics I, 2nd ed. (Springer, 2004) · Ch. 3 Stochastic theory; Ch. 7 The coalescent · see docs/catalogs/NEED_TO_SOURCE.md#bio-ewens-2004
TODO_REF pending
Wakeley, J. — Coalescent Theory: An Introduction (Roberts & Company, 2008) · Ch. 1-3, 5-6; the canonical modern coalescent reference · see docs/catalogs/NEED_TO_SOURCE.md#bio-wakeley-2008
TODO_REF pending
Futuyma, D. J. — Evolution, 4th ed. (Sinauer, 2017) · Ch. 10 Genetic drift · see docs/catalogs/NEED_TO_SOURCE.md#bio-futuyma-2017
TODO_REF pending
Zuckerkandl, E. & Pauling, L. — Molecular disease, evolution, and genic heterogeneity; in Kasha & Pullman eds., Horizons in Biochemistry (Academic Press, 1962), pp. 189-225 · Originator paper for the molecular clock hypothesis · see docs/catalogs/NEED_TO_SOURCE.md#bio-zuckerkandl-pauling-1962
TODO_REF pending
Franklin, J. R. — Evolutionary change in small populations; in Soule & Wilcox eds., Conservation Biology (Sinauer, 1980), pp. 135-149 · Originator of the 50/500 rule for minimum viable population · see docs/catalogs/NEED_TO_SOURCE.md#bio-franklin-1980
tong
raw/pdfs/mathbio/mathbio.pdf · Mathematical biology background — stochastic processes, Markov chains, and coalescent theory

Reviewer

Tyler (pending external biology reviewer per BIOLOGY_PLAN §6)

Estimated time

beginner: 14m
intermediate: 35m
master: 70m