19.02.05 · eco-evo-bio / pop-genetics

Wright-Fisher model and the diffusion approximation

shipped3 tiersLean: nonepending prereqs

Anchor (Master): Ewens, *Mathematical Population Genetics I* 2nd ed. (Springer, 2004), Ch. 3 (the Wright-Fisher Markov chain) + Ch. 4 (the diffusion approximation); Crow & Kimura, *An Introduction to Population Genetics Theory* (Harper & Row, 1970), Ch. 3 + Ch. 8 + Ch. 9; Charlesworth & Charlesworth, *Elements of Evolutionary Genetics* (Roberts, 2010), Ch. 5; primary literature — Wright 1931 *Genetics* 16; Fisher 1930 *Genetical Theory*; Kimura 1955 *PNAS* 41; Kimura 1962 *Genetics* 47; Kingman 1982 *Stoch. Proc. Appl.* 13

Intuition [Beginner]

Take a small island population — say, 10 squirrels — and look at one gene with two versions, call them $A$ and $a$ . Say 5 of the 20 gene copies in the population are the $A$ version and 15 are $a$ . If each squirrel in the next generation gets its two gene copies by reaching at random into the parental gene pool, then on average we expect the same 5-to-15 ratio in the next generation. But on any given run, by sheer luck, we might draw 4 or 6 or 7 copies of $A$ instead. The frequency wobbles.

The Wright-Fisher model is the simplest version of this story. Each generation the next pool of gene copies is drawn at random from the current one, with replacement. The expected frequency stays the same, but the actual frequency drifts. Run the experiment long enough and one of two endpoints arrives: either every gene copy in the population is $A$ , or every gene copy is $a$ . One allele wins, the other is gone. This endpoint is called fixation, and fixation is the long-term fate of every population under drift alone.

A coin-flip analogy makes this concrete. Flip 20 fair coins. The expected number of heads is 10, but you almost never get exactly 10; some flips give 8 heads, some give 12. Now imagine that the proportion of heads from your last batch sets the bias of the coin in the next batch. If you got 12 heads, the coin in the next batch is slightly biased toward heads, so the next batch is likely to give a few more heads still. Run this loop and the proportion drifts further and further from 1/2 — eventually hitting 0 or 1 and staying there. That is the Wright-Fisher process. Sampling noise compounds because each generation's noise determines the next generation's bias.

Two things flow from this picture. First, drift is strongest when the population is small: the wobble per generation is proportional to one over the population size. A population of a million squirrels barely drifts at all in a single generation; a population of ten drifts substantially every generation. Second, drift erases variation. A population that started with both $A$ and $a$ ends with only one of them, and the diversity within the population — what biologists call the heterozygosity — decreases over time. The two effects together explain why small isolated populations lose genetic variation quickly: drift is fast in small populations, and it always destroys variation.

Once we add selection back in — one of the alleles is slightly better than the other — the picture becomes richer. Selection pushes the frequency one way; drift wobbles it both ways. When selection is strong and the population is large, selection wins and the better allele fixes. When selection is weak or the population is small, drift can override selection and the worse allele can win by luck. The boundary between these two regimes is captured by the product $2 N s$ , where $N$ is the population size and $s$ is the selection coefficient — the central scaling parameter of population genetics, and the key insight this unit builds toward.

Visual [Beginner]

A picture is worth a thousand simulations here. The classical Wright-Fisher diagram plots allele frequency on the vertical axis from 0 to 1, with generations running left to right, and overlays many independent runs of the process starting from the same initial frequency. Each line wobbles up and down, some hitting 0 (loss), some hitting 1 (fixation), and the spread of lines fanning out over time captures the variance growth.

A complementary visual is the diffusion density: instead of tracking individual sample paths, track the probability distribution of the allele frequency at each time. Starting from a delta function at the initial frequency, the distribution spreads, flattens, and gradually concentrates at the two absorbing boundaries 0 and 1. After many generations, almost all of the probability has been absorbed: the population has either fixed or lost the allele, and the residual probability somewhere in between vanishes.

The picture captures three lessons that the formal theory will sharpen. Sample paths of allele frequency are martingales — they have no systematic upward or downward push under neutrality. The variance of the frequency grows over time, increasing the chance of hitting an absorbing boundary. And the long-run fate is fixation at one of the two endpoints, with probabilities that match the starting frequency.

Worked example [Beginner]

A textbook population of 10 diploid individuals — so 20 gene copies — has an allele $A$ at initial frequency $p_{0} = 0.4$ . That means 8 copies of $A$ and 12 copies of $a$ in the founding generation. With no selection, no mutation, and no migration, we want to see how the frequency moves over the next few generations.

Step 1. The next generation is built by drawing 20 gene copies at random, with replacement, from the parental pool. Each draw is $A$ with probability 0.4, independently. The number of $A$ copies in the next generation is therefore a binomial random variable with 20 trials and success probability 0.4. Its expected value is $20 \times 0.4 = 8$ , the same as the parent generation.

Step 2. The variance of that binomial count is $20 \times 0.4 \times 0.6 = 4.8$ , with standard deviation about 2.2. So a typical run produces between 6 and 10 copies of $A$ in the next generation — a frequency between 0.3 and 0.5. Some runs will give 5 copies, some 11, some 8 exactly. The frequency wobbles around the parental value, with the size of the wobble set by the binomial variance.

Step 3. Say generation 1 happens to land on 6 copies of $A$ , a frequency of 0.3. Now generation 2 is sampled from a parental pool with $p = 0.3$ : the binomial parameters are 20 trials and probability 0.3, expected value 6, variance 4.2. The frequency in generation 2 wobbles around 0.3.

Step 4. Iterate. The frequency follows a random walk whose step sizes depend on where it currently sits — the variance per step is $p (1 - p) / (2 N)$ for diploid population size $N$ , largest at $p = 0.5$ and zero at the boundaries $p = 0$ and $p = 1$ . Once the frequency reaches 0 or 1 it cannot leave: the boundaries are absorbing. Run for long enough and one of two endpoints arrives.

What this tells us: even with no selection, allele frequencies are not preserved across generations in a small population. They drift, and the drift accumulates until one allele takes over. The smaller the population, the faster the drift.

Check your understanding [Beginner]

Exercise (easy, numeric).

A single new mutation arises in a Wright-Fisher population of 100 diploid individuals. Under neutrality, what is the probability that this mutation eventually fixes in the population?

Hint

A new mutation starts at frequency $1/ (2 N)$ . The neutral fixation probability equals the starting frequency.

Answer

$0.005$ (or 1 in 200). A single mutation in 100 diploids represents 1 out of $2 \times 100 = 200$ gene copies, so the starting frequency is $1/200 = 0.005$ . The neutral fixation probability equals the starting frequency, so the new mutation has a 0.5% chance of fixing and a 99.5% chance of being lost. This is why most neutral mutations die out; the molecular clock works precisely because the small fixation probability is exactly balanced by the high input rate of mutations.

Formal definition [Intermediate+]

Fix a single autosomal locus with two alleles $A$ and $a$ in a diploid population of constant size $N$ across generations, with $2 N$ gene copies per generation. Let $X_{n} \in {0, 1, \dots, 2 N}$ denote the number of $A$ alleles at generation $n$ , and write $p_{n} := X_{n} / (2 N) \in [0, 1]$ for the allele frequency. The Wright-Fisher model specifies that generation $n + 1$ is built by drawing $2 N$ gene copies independently and with replacement from the parental gene pool — equivalently, conditional on $X_{n} = j$ , the count $X_{n + 1}$ is a binomial random variable with $2 N$ trials and success probability $j / (2 N)$ .

Definition (Wright-Fisher Markov chain). The Wright-Fisher chain at population size $N$ with no selection, mutation, or migration is the discrete-time Markov chain on the state space ${0, 1, \dots, 2 N}$ with transition probabilities

P (X_{n + 1} = k ∣ X_{n} = j) = (k 2 N) (\frac{j}{2 N})^{k} (1 - \frac{j}{2 N})^{2 N - k},

for $j, k \in {0, 1, \dots, 2 N}$ . The states $0$ and $2 N$ are absorbing: once $X_{n} = 0$ the allele $A$ is lost and stays so; once $X_{n} = 2 N$ the allele $A$ is fixed and stays so. The remaining states ${1, 2, \dots, 2 N - 1}$ are transient.

The conditional expectation and variance of the next-generation count are

E [X_{n + 1} ∣ X_{n} = j] = j, Var (X_{n + 1} ∣ X_{n} = j) = 2 N \cdot \frac{j}{2 N} \cdot (1 - \frac{j}{2 N}) = j (1 - \frac{j}{2 N}) .

Translating to the frequency $p_{n} = X_{n} / (2 N)$ ,

E [p_{n + 1} ∣ p_{n}] = p_{n}, Var (p_{n + 1} ∣ p_{n}) = \frac{p _{n} ( 1 - p _{n} )}{2 N} .

The frequency process is a martingale: its conditional expectation given the past is its current value. The variance of a single-generation step is $p (1 - p) / (2 N)$ , which vanishes at the boundaries $p = 0$ and $p = 1$ and is maximised at the interior point $p = 1/2$ .

Definition (Wright-Fisher with selection and mutation). Adding selection coefficient $s$ against allele $a$ (so allele $A$ has relative fitness $1$ and allele $a$ has relative fitness $1 - s$ ) and forward/backward mutation rates $μ$ from $A$ to $a$ and $ν$ from $a$ to $A$ , the per-generation expected post-selection-and-mutation frequency is

ψ (p) = \frac{p ( 1 - μ ) + ( 1 - p ) ν \cdot ( 1 - s )}{1 - s ( 1 - p )} \approx p + s p (1 - p) - μ p + ν (1 - p) (weak forces),

and the Wright-Fisher transition is binomial sampling around $ψ (p)$ : $X_{n + 1} ∣ X_{n} = j$ is binomial $(2 N, ψ (j / (2 N)))$ . The four forces — drift (from sampling), selection, forward mutation, backward mutation — are superposed in a single transition kernel.

Counterexamples to common slips

Drift does not have a direction. The Wright-Fisher chain has $E [X_{n + 1} ∣ X_{n}] = X_{n}$ under neutrality, so drift on average leaves the frequency unchanged. The bias toward fixation or loss arises from absorption at the boundaries, not from any per-step push.
Drift is not "selection of the random kind." Selection systematically shifts the expected frequency; drift adds zero-mean noise on top of any systematic shift. The two are dimensionally distinct: selection enters as the drift coefficient $s p (1 - p)$ of the diffusion, mutation as additional drift terms, drift in the random-walk sense as the variance coefficient $p (1 - p) / (2 N)$ .
The Wright-Fisher chain is not the same as the Moran model. The Moran process (Moran 1958) replaces one individual per time step rather than the entire population — it has a different variance per unit time but the same diffusion limit. Their continuous-time scaling differs by a factor of 2.
Variance per generation depends on where the frequency sits. The boundary regions, where one allele is nearly fixed, have small per-generation variance because $p (1 - p)$ is small. The chain moves slowly near the boundaries and quickly through the middle of the interval.

Key theorem with proof [Intermediate+]

The signature result of Wright-Fisher theory under selection is Kimura's fixation-probability formula, first stated in Kimura 1962 Genetics 47, derived via the diffusion limit and the backward Kolmogorov equation.

Theorem (Kimura, 1962). Let $u (p)$ denote the probability that allele $A$ eventually fixes in a Wright-Fisher population of $2 N$ gene copies, starting from allele-frequency $p \in [0, 1]$ , where $A$ has selective advantage $s$ over $a$ (so allele $A$ has fitness $1 + s$ relative to $a$ at fitness $1$ , with $s$ small). In the diffusion limit $N \to \infty$ with $σ := 2 N s$ held fixed, the fixation probability is

u (p) = \frac{1 - e ^{- σ p}}{1 - e ^{- σ}} = \frac{1 - e ^{- 2 N s p}}{1 - e ^{- 2 N s}} .

Under neutrality $s \to 0$ , the formula reduces to $u (p) = p$ .

Proof. The diffusion limit replaces the discrete Wright-Fisher chain with a continuous-state Markov process $p (t)$ on $[0, 1]$ , whose infinitesimal generator is

L f (p) = \frac{1}{2} V (p) \frac{d ^{2} f}{d p ^{2}} + M (p) \frac{df}{d p},

with variance coefficient $V (p) = p (1 - p)$ and drift coefficient $M (p) = s p (1 - p)$ . The fixation probability $u (p)$ satisfies the backward Kolmogorov equation $L u = 0$ on $(0, 1)$ with boundary conditions $u (0) = 0$ (an allele at frequency $0$ never fixes) and $u (1) = 1$ (an allele already fixed stays fixed).

Writing out $L u = 0$ in coordinates,

\frac{1}{2} p (1 - p) u^{''} (p) + s p (1 - p) u^{'} (p) = 0.

Cancel the common factor $p (1 - p)$ (which is positive on $(0, 1)$ ):

\frac{1}{2} u^{''} (p) + s u^{'} (p) = 0,

an ordinary differential equation for $u^{'} (p)$ . Let $v (p) := u^{'} (p)$ ; then $v^{'} = - 2 s v$ , with solution $v (p) = C_{1} e^{- 2 s p}$ for some constant $C_{1}$ . Integrating,

u (p) = C_{2} - \frac{C _{1}}{2 s} e^{- 2 s p} .

Apply the boundary conditions. From $u (0) = 0$ : $C_{2} = C_{1} / (2 s)$ , so

u (p) = \frac{C _{1}}{2 s} (1 - e^{- 2 s p}) .

From $u (1) = 1$ : $C_{1} / (2 s) = 1/ (1 - e^{- 2 s})$ , so

u (p) = \frac{1 - e ^{- 2 s p}}{1 - e ^{- 2 s}} .

In the diffusion scaling $s$ is replaced by $σ / (2 N) = s$ (the parameter held fixed in the limit is $σ = 2 N s$ ), giving the boxed formula. Under $s \to 0$ , expand both numerator and denominator: $1 - e^{- 2 s p} \approx 2 s p$ and $1 - e^{- 2 s} \approx 2 s$ , so $u (p) \to p$ , the neutral result. $□$

Bridge. The diffusion-limit derivation builds toward 19.02.01 pending Hardy-Weinberg, where the deterministic frequency-equilibrium is the leading-order picture and Wright-Fisher drift is the first-order stochastic correction. The same formula appears again in 19.03.01 pending natural selection, where the strong-selection asymptotics $u \approx 2 s$ for $2 N s ≫ 1$ recovers Haldane's 1927 rule that the probability of fixation of a beneficial new mutation is approximately twice its selection coefficient. The foundational reason is that the backward Kolmogorov equation identifies the fixation probability with a harmonic function for the diffusion generator, and the central insight is that the diffusion-limit ODE is integrable in closed form because the variance and drift coefficients share a common factor $p (1 - p)$ that cancels. Putting these together, the $u (p)$ formula is the cleanest analytic bridge between the discrete genetics of Wright and the analytic methods of Kimura — the bridge is exactly the diffusion approximation, and it generalises through the entire toolkit of one-dimensional diffusion theory.

How the formula behaves

The formula has three limiting regimes that the master tier will exploit.

Regime	Condition	Approximation
Strong positive selection	$2 N s ≫ 1$ and $p ≫ 1/ (2 N s)$	$u (p) \approx 1$
New mutation under strong selection	$2 N s ≫ 1$ and $p = 1/ (2 N)$	$u \approx (1 - e^{- s}) \approx 2 s$
Effectively neutral	$	2Ns
Strong negative selection	$2 N s ≪ - 1$	$u (p) \to 0$ (allele lost)

The crossover near $∣2 N s ∣ \sim 1$ is the nearly-neutral threshold of Ohta 1973: mutations whose selection coefficient $∣ s ∣ ≪ 1/ (2 N)$ behave as effectively neutral and obey the $p$ rule, while those with $∣ s ∣ ≫ 1/ (2 N)$ behave as effectively deterministic. The location of the threshold depends on $N$ , so identical mutations behave neutrally in small populations and selectively in large ones.

Exercises [Intermediate+]

Exercise 1 (easy, numeric).

A new neutral mutation arises in a Wright-Fisher population of 10000 diploid individuals. Compute (a) the fixation probability of the new mutation and (b) the expected number of generations until either fixation or loss, given the standard result $\overset{ˉ}{t} \approx - 4 N [p ln p + (1 - p) ln (1 - p)]$ for the expected time conditional on the process not yet being absorbed.

Hint

The fixation probability of a neutral new mutation is $1/ (2 N)$ . For the expected time formula, plug $p = 1/ (2 N)$ into the closed-form expression.

Answer

(a) Fixation probability $u = 1/ (2 N) = 1/20000 = 5 \times 1 0^{- 5}$ . (b) Plugging $p = 1/20000$ into $- 4 N [p ln p + (1 - p) ln (1 - p)]$ with $N = 10000$ gives $\overset{ˉ}{t} \approx - 40000 \times [(5 \times 1 0^{- 5}) (- 9.9) + (1) (- 5 \times 1 0^{- 5})] \approx - 40000 \times [- 4.95 \times 1 0^{- 4} - 5 \times 1 0^{- 5}] \approx 21.8$ generations. New neutral mutations are nearly always lost, and the typical loss happens in tens of generations rather than the much larger fixation timescale of order $4 N$ .

Exercise 2 (easy, numeric).

A Wright-Fisher population of 50 diploid individuals starts with allele frequency $p_{0} = 0.2$ . What is the variance of the allele frequency after a single generation, and (under repeated independent runs) what is the expected variance of the frequency after 10 generations under the standard approximation that variance grows linearly with time as long as the boundaries have not been hit?

Hint

Single-generation variance is $p (1 - p) / (2 N)$ . For short times, the variance accumulates approximately additively before drift drives the frequency near the boundaries.

Answer

Single-generation variance $= 0.2 \times 0.8/100 = 1.6 \times 1 0^{- 3}$ , so the standard deviation of the frequency after one generation is about $0.04$ . For 10 generations, assuming the variance per generation stays close to its initial value (a reasonable approximation while $p$ remains in the interior), the cumulative variance is $\approx 10 \times 1.6 \times 1 0^{- 3} = 0.016$ and the standard deviation is $0.016 \approx 0.126$ . After ten generations the frequency has typically wandered by about $\pm 0.13$ — substantial drift relative to its starting value of $0.2$ , which is why populations of this size lose alleles rapidly.

Exercise 3 (medium, numeric).

A beneficial mutation with selection coefficient $s = 0.01$ arises as a single copy in a Wright-Fisher population of 1000 diploid individuals. Compute (a) its fixation probability using Kimura's formula and (b) compare to the strong-selection approximation $u \approx 2 s$ .

Hint

Use $u (p) = (1 - e^{- 2 N s p}) / (1 - e^{- 2 N s})$ with $p = 1/ (2 N)$ , $N = 1000$ , $s = 0.01$ , so $2 N s = 20$ and $2 N s p = 20 \times 0.0005 = 0.01$ .

Answer

(a) With $2 N s p = 0.01$ and $2 N s = 20$ : $u = (1 - e^{- 0.01}) / (1 - e^{- 20}) \approx (0.00995) / (1) = 0.00995$ . (b) The strong-selection approximation gives $u \approx 2 s = 0.02$ . The full formula returns $u \approx 0.01$ , which is about half the $2 s$ value — the approximation $u \approx 2 s$ is accurate only when $s N$ is so large that $1 - e^{- 2 N s} \approx 1$ AND the fixation probability is dominated by the $1 - e^{- s}$ Taylor expansion near $p = 1/ (2 N)$ . The correct value for $2 N s = 20$ with $p = 1/ (2 N)$ is closer to $s$ than to $2 s$ . Haldane's 1927 result $u \approx 2 s$ is the asymptote for $2 N s \to \infty$ ; for moderate $2 N s$ , Kimura's exact formula is needed.

Exercise 4 (medium, symbolic).

Starting from the backward Kolmogorov equation $\frac{1}{2} V (p) u^{''} + M (p) u^{'} = 0$ , derive the general formula for $u (p)$ in terms of an arbitrary drift coefficient $M (p)$ and the standard variance coefficient $V (p) = p (1 - p)$ . Show that with $M (p) = s p (1 - p)$ you recover Kimura's formula, and that with $M (p) = 0$ you recover the neutral result $u (p) = p$ .

Hint

Define $ψ (p) := exp (- \int 2 M / V d p)$ ; then $u^{'} (p) = C ψ (p)$ for a constant $C$ , so $u$ is a normalised integral of $ψ$ .

Answer

Rewriting the equation as $u^{''} / u^{'} = - 2 M (p) / V (p)$ , integrate to get $ln u^{'} (p) = - \int_{0}^{p} 2 M (y) / V (y) d y + const$ , so

u^{'} (p) = C ψ (p), ψ (p) := exp (- \int_{0}^{p} \frac{2 M ( y )}{V ( y )} d y) .

Integrating once more and applying $u (0) = 0$ and $u (1) = 1$ :

u (p) = \frac{\int _{0}^{p} ψ ( y ) d y}{\int _{0}^{1} ψ ( y ) d y} .

With $M (p) = s p (1 - p)$ and $V (p) = p (1 - p)$ the ratio is $2 M / V = 2 s$ , so $ψ (y) = e^{- 2 sy}$ and the formula reduces to $u (p) = (1 - e^{- 2 s p}) / (1 - e^{- 2 s})$ — Kimura's formula. With $M = 0$ the function $ψ$ is identically $1$ and $u (p) = p$ , the neutral result. The integral formula is the canonical handle for any one-dimensional diffusion with the same variance coefficient and an arbitrary smooth drift.

Exercise 5 (medium, numeric).

A deleterious mutation has $s = - 0.001$ (mildly deleterious). For populations of effective size $N_{e} = 100$ , $N_{e} = 10000$ , and $N_{e} = 1 0^{6}$ , compute $2 N_{e} s$ and the corresponding fixation probability of a new copy (starting at frequency $p = 1/ (2 N_{e})$ ) using Kimura's formula. Comment on the regime.

Hint

A negative $s$ in Kimura's formula flips the sign in the exponent: use $u (p) = (1 - e^{- 2 N_{e} s p}) / (1 - e^{- 2 N_{e} s})$ with $s = - 0.001$ .

Answer

$2 N_{e} s$ is $- 0.2$ , $- 20$ , and $- 2000$ respectively. For $N_{e} = 100$ : $u (1/200) = (1 - e^{0.001}) / (1 - e^{0.2}) \approx - 0.001/ - 0.221 \approx 0.00452$ , very close to the neutral value $1/200 = 0.005$ — drift dominates. For $N_{e} = 10000$ : $u (1/20000) = (1 - e^{0.001}) / (1 - e^{20}) \approx - 0.001/ (- 4.85 \times 1 0^{8}) \approx 2.06 \times 1 0^{- 12}$ , vastly suppressed below the neutral value $1/20000 = 5 \times 1 0^{- 5}$ — selection dominates and the deleterious allele is effectively never fixed. For $N_{e} = 1 0^{6}$ : $u$ is essentially zero. The same selection coefficient transitions from effectively neutral in a small population to effectively lethal in a large one — the qualitative content of Ohta's nearly-neutral theory.

Exercise 6 (hard, symbolic).

Show that the neutral Wright-Fisher allele-frequency process is a martingale, and use the optional-stopping theorem on the stopping time $T = in f {n : X_{n} \in {0, 2 N}}$ to derive the neutral fixation probability $u (p) = p$ directly from the discrete chain, without invoking the diffusion limit.

Hint

A bounded martingale stopped at an almost-surely finite stopping time satisfies $E [X_{T}] = E [X_{0}]$ .

Answer

The Wright-Fisher chain satisfies $E [X_{n + 1} ∣ X_{n}] = X_{n}$ , so ${X_{n}}$ is a martingale. It is bounded ( $0 \leq X_{n} \leq 2 N$ ) and the absorbing states ${0, 2 N}$ are reached in finite expected time (this follows from the chain being a finite irreducible chain on the transient class with positive-probability transitions to the boundary). By the optional-stopping theorem,

E [X_{T}] = E [X_{0}] = X_{0} .

But $X_{T} \in {0, 2 N}$ , so $E [X_{T}] = 2 N \cdot Pr (X_{T} = 2 N) + 0 \cdot Pr (X_{T} = 0) = 2 N \cdot u$ , where $u$ is the fixation probability. Therefore $2 N u = X_{0}$ , so $u = X_{0} / (2 N) = p$ . The neutral fixation result drops out of the martingale structure directly — no diffusion limit required. The selection case requires the diffusion approximation because ${X_{n}}$ is no longer a martingale under selection; the appropriate martingale becomes $ψ (X_{n})$ for the integrating factor $ψ (p) = e^{- 2 s p}$ in the diffusion limit.

Exercise 7 (hard, numeric).

The expected time to fixation of a new neutral mutation that eventually fixes is approximately $\overset{ˉ}{t}_{fix} \approx 4 N$ generations (Kimura-Ohta 1969 result). For a population of $N_{e} = 1 0^{4}$ diploid humans and a per-generation mutation rate of $μ = 1 0^{- 8}$ per nucleotide, compute (a) the expected number of substitutions per nucleotide per generation under the neutral theory, and (b) the timescale on which a single new neutral nucleotide variant takes to fix.

Hint

Under neutrality the substitution rate per nucleotide per generation equals the per-lineage mutation rate $μ$ — the input rate $2 N μ$ times the fixation probability $1/ (2 N)$ cancels.

Answer

(a) The substitution rate per nucleotide per generation is the input rate $2 N μ$ times the fixation probability $1/ (2 N)$ , equalling $μ = 1 0^{- 8}$ . The molecular clock runs at the per-lineage mutation rate independent of population size — the central prediction of Kimura's neutral theory and the foundation of molecular dating. (b) A single new neutral variant that eventually fixes takes about $4 N_{e} = 4 \times 1 0^{4}$ generations to fix. For humans with a generation time of 25 years, that is about $1 0^{6}$ years. So even at the rapid input rate of $μ = 1 0^{- 8}$ per nucleotide per generation, the populations between fixation events have a leisurely turnover — most segregating nucleotide variation in a human population at any moment will be lost rather than fix.

Exercise 8 (hard, symbolic).

Derive the expected heterozygosity decay under Wright-Fisher drift. Define $H_{n} := 2 E [p_{n} (1 - p_{n})]$ to be the expected heterozygosity at generation $n$ . Show that $H_{n + 1} = (1 - 1/ (2 N)) H_{n}$ and hence $H_{n} = (1 - 1/ (2 N))^{n} H_{0}$ , so the per-generation decay rate is $1/ (2 N)$ .

Hint

Use $E [p_{n + 1}^{2} ∣ p_{n}] = Var (p_{n + 1} ∣ p_{n}) + (E [p_{n + 1} ∣ p_{n}])^{2} = p_{n} (1 - p_{n}) / (2 N) + p_{n}^{2}$ , then compute $E [p_{n + 1} (1 - p_{n + 1}) ∣ p_{n}]$ .

Answer

Conditional on $p_{n}$ ,

E [p_{n + 1} (1 - p_{n + 1}) ∣ p_{n}] = E [p_{n + 1} ∣ p_{n}] - E [p_{n + 1}^{2} ∣ p_{n}] .

Compute $E [p_{n + 1}^{2} ∣ p_{n}] = Var (p_{n + 1} ∣ p_{n}) + (E [p_{n + 1} ∣ p_{n}])^{2} = p_{n} (1 - p_{n}) / (2 N) + p_{n}^{2}$ . Therefore

E [p_{n + 1} (1 - p_{n + 1}) ∣ p_{n}] = p_{n} - p_{n}^{2} - p_{n} (1 - p_{n}) / (2 N) = p_{n} (1 - p_{n}) (1 - 1/ (2 N)) .

Taking unconditional expectations, $H_{n + 1} = (1 - 1/ (2 N)) H_{n}$ , so $H_{n} = (1 - 1/ (2 N))^{n} H_{0}$ . The heterozygosity decays geometrically with per-generation rate $1/ (2 N)$ , equivalently with timescale $2 N$ generations. This is one of the load-bearing results of population genetics: drift erases variation at a rate exactly set by the inverse population size. After $2 N$ generations the heterozygosity has dropped by a factor of $1/ e$ ; after $4 N$ generations it is essentially gone.

From discrete to diffusion — the formal limit [Master]

The discrete Wright-Fisher chain is exactly solvable for small $N$ — one can explicitly diagonalise the $2 N + 1$ by $2 N + 1$ binomial transition matrix and read off the eigenvalues and eigenvectors. Wright did exactly this in his 1931 Genetics paper, deriving the principal eigenvalue $λ_{1} = 1 - 1/ (2 N)$ that controls the long-run rate of heterozygosity decay. But the matrix-diagonalisation route does not scale: for biologically realistic $N$ on the order of $1 0^{4}$ or $1 0^{6}$ , the matrix is unwieldy and the eigenvalue structure is hard to interpret. The diffusion approximation is the analytic substitute that recovers the same eigenvalue at leading order and opens the route to closed-form fixation probabilities, fixation times, and stationary distributions.

The setup. Take a sequence of Wright-Fisher chains $X_{n}^{(N)}$ at population sizes $N = N_{1}, N_{2}, \dots$ tending to infinity, with selection coefficient $s_{N}$ and mutation rates $μ_{N}, ν_{N}$ scaled so that $σ := 2 N s_{N}$ , $θ_{μ} := 4 N μ_{N}$ , and $θ_{ν} := 4 N ν_{N}$ remain fixed in the limit. Rescale time so that one diffusion-time unit corresponds to $2 N$ generations: $p^{(N)} (t) := X_{⌊ 2 N t ⌋}^{(N)} / (2 N)$ . The claim — Feller 1951, Karlin-Taylor 1981, Ewens 2004 Ch. 4 for the modern statement — is that as $N \to \infty$ the process $p^{(N)} (\cdot)$ converges in distribution (in the Skorokhod sense on path space) to a Markov diffusion $p (t)$ on $[0, 1]$ with infinitesimal generator $L f = \frac{1}{2} V (p) f^{''} (p) + M (p) f^{'} (p)$ for variance coefficient $V (p) = p (1 - p)$ and drift coefficient $M (p) = σ p (1 - p) /2 - θ_{μ} p /2 + θ_{ν} (1 - p) /2$ (the factor of $1/2$ comes from the time rescaling; conventions vary).

Why does the limit work? Two observations. Per generation, the conditional mean of the rescaled increment is $E [Δ p^{(N)} ∣ p_{n}^{(N)}] = s_{N} p (1 - p) + O (s_{N}^{2})$ , of order $s_{N} = O (1/ N)$ . The conditional variance per generation is $Var (Δ p^{(N)} ∣ p_{n}^{(N)}) = p (1 - p) / (2 N)$ , also of order $1/ N$ . Both quantities scale identically. Rescaling time by a factor of $2 N$ multiplies the per-step mean by $2 N \cdot s_{N} = σ$ (the held-fixed combination) and multiplies the per-step variance by $2 N \cdot p (1 - p) / (2 N) = p (1 - p)$ (the unscaled variance coefficient). So in the limit, the rescaled mean per unit time is $σ p (1 - p)$ (finite, fixed) and the rescaled variance per unit time is $p (1 - p)$ (also finite, fixed). These are precisely the drift and variance coefficients of the limiting diffusion.

The deeper structural fact is that the higher moments — $E [(Δ p)^{k} ∣ p]$ for $k \geq 3$ — are of order $1/ N^{k /2}$ , so they vanish faster than the time-rescaling can amplify them. This is the standard hypothesis for the diffusion-limit theorem: variance scales as $1/ N$ , drift scales as $1/ N$ , all higher cumulants are negligible. The Wright-Fisher chain satisfies it because the binomial distribution concentrates around its mean at rate $N$ — exactly the rate Brownian motion concentrates.

The diffusion limit is not merely a calculational convenience; it captures universality. Several different microscopic models of finite-population evolution — the Wright-Fisher chain with non-overlapping generations, the Moran model with overlapping generations, the Cannings family of exchangeable models — all converge to the same diffusion limit at leading order in $1/ N$ , with possibly rescaled time coordinates. The diffusion equation $\partial_{t} ϕ = \frac{1}{2} \partial_{p}^{2} [p (1 - p) ϕ] - \partial_{p} [M (p) ϕ]$ is the universal continuum description of evolutionary stochasticity at the single-locus level, in the same way that the heat equation is the universal continuum description of random walks. The microscopic details — whether sampling is binomial or hypergeometric or Polya-urn — drop out in the limit.

A precise statement. The forward Kolmogorov (Fokker-Planck) equation for the density $ϕ (p, t)$ of the allele frequency is

\frac{\partial ϕ}{\partial t} = - \frac{\partial}{\partial p} [M (p) ϕ] + \frac{1}{2} \frac{\partial ^{2}}{\partial p ^{2}} [V (p) ϕ],

with boundary behaviour determined by whether $0$ and $1$ are accessible (for the Wright-Fisher diffusion they are, and the boundary points are absorbing in the no-mutation case and reflecting once mutation is present). The dual backward Kolmogorov equation for any expectation $u (p) = E_{p} [g (p (T))]$ at fixed terminal time $T$ is $\partial_{t} u = L u$ , run backward in time, with $L = \frac{1}{2} V \partial_{p}^{2} + M \partial_{p}$ . The fixation-probability derivation in the Key Theorem above is the time-independent case of the backward equation, $L u = 0$ .

In stochastic-differential-equation language the diffusion is

d p (t) = M (p) d t + V (p) d W (t),

with $W (t)$ a standard Wiener process and the boundary $\partial {0, 1}$ accessible in finite time. The variance coefficient $V (p) = p (1 - p)$ is non-Lipschitz at the boundaries; existence and uniqueness of solutions follow from the Yamada-Watanabe criterion (Ikeda-Watanabe 1981 Ch. IV) and the singular boundaries are absorbing under the relevant boundary classification (Feller 1952 Trans. AMS 77). The connection to math §02 is direct: the Wright-Fisher diffusion is one of the canonical one-dimensional Itô processes whose generator is degenerate at the boundary and whose study motivated much of mid-20th-century stochastic-process theory.

The neutral Wright-Fisher and the $u (p) = p$ formula [Master]

Setting $s = 0$ collapses the drift coefficient $M (p)$ to zero, leaving the purely diffusive Wright-Fisher process with generator $L f = \frac{1}{2} p (1 - p) f^{''} (p)$ . The backward equation $L u = 0$ becomes $u^{''} (p) = 0$ on $(0, 1)$ , with boundary conditions $u (0) = 0, u (1) = 1$ . The solution is the linear function $u (p) = p$ .

This is Kimura's neutral-fixation formula. It says: under neutrality, the probability that an allele eventually fixes equals its current frequency in the population. A new mutation at frequency $1/ (2 N)$ has fixation probability $1/ (2 N)$ . A polymorphism at frequency 0.5 has fixation probability 0.5 — it is equally likely to win or lose. A polymorphism near fixation at frequency 0.99 has fixation probability 0.99 — it almost certainly wins, not because of any push but because the random walk is much closer to the upper boundary than the lower.

The proof has two illuminating routes. The diffusion route (above) reduces to a one-line ODE. The martingale route, sketched in Exercise 6, observes that ${X_{n}}$ is a bounded martingale under neutrality, and the optional-stopping theorem gives $E [X_{T}] = X_{0}$ , where $T$ is the absorption time. Since $X_{T} \in {0, 2 N}$ , this immediately yields $u = X_{0} / (2 N) = p$ . The martingale argument is elementary; the diffusion argument is computationally heavier but generalises to selection, mutation, migration, and arbitrary one-dimensional state-dependent drift.

Why does the formula matter biologically? Three points.

First, the input-output structure of neutral evolution. A population of $2 N$ gene copies introduces new mutations at total rate $2 N μ$ per generation, where $μ$ is the per-copy per-generation mutation rate. Each new mutation has fixation probability $1/ (2 N)$ . The substitution rate — new alleles fixing per generation — is therefore $2 N μ \times 1/ (2 N) = μ$ , independent of population size. This is the molecular clock, Kimura's most consequential single result and the foundation of phylogenetic dating. The rate at which neutral substitutions accumulate along a lineage equals the per-lineage mutation rate, and time can be inferred from sequence divergence by inverting this relation. The cancellation between population-scale input and per-copy fixation probability is the cleanest expression of why population size, which dominates so many population-genetic quantities, drops out of the substitution dynamics.

Second, the time to fixation versus time to loss. Even though new mutations fix with probability $1/ (2 N)$ , the time to fixation (conditional on fixing) is on the order of $4 N$ generations — the diffusion result of Kimura-Ohta 1969 derived by integrating the backward equation with absorption at $p = 1$ as the terminal condition. The time to loss (conditional on losing) is much shorter, on the order of $2 ln (2 N)$ generations: most neutral mutations die out within a handful of generations of arising. The disparity between fixation time ( $4 N$ ) and loss time ( $2 lo g N$ ) means that at any moment, the polymorphism in a population is dominated by alleles on their way to loss, with a small tail of alleles slowly transiting toward fixation. The expected number of polymorphic sites in a sample of $n$ chromosomes is $θ \cdot \sum_{k = 1}^{n - 1} 1/ k$ where $θ = 4 N μ$ — Watterson's 1975 result and the foundation of statistical inference from sequence data.

Third, the failure mode of the strict neutral theory. Empirically, real populations show more polymorphism than $θ = 4 N μ$ predicts under strict neutrality combined with realistic $N$ and $μ$ , and the site-frequency spectrum of polymorphisms is skewed toward rare alleles relative to the neutral prediction. The standard explanations — background selection (Charlesworth-Morgan-Charlesworth 1993), recurrent selective sweeps (Maynard Smith-Haigh 1974), demographic history — all act by perturbing the neutral diffusion in specific, parametrically-identifiable ways. The neutral $u (p) = p$ formula is the null hypothesis against which every modern molecular-evolution test (Tajima's $D$ , Fu-Li tests, McDonald-Kreitman) is constructed. Empirical neutralism, in the sense of Kimura's strong claim that most molecular variation is neutral, remains contested; but neutralism as the formal baseline of the field is uncontroversial.

Selection added — the famous formula and weak-selection asymptotics [Master]

The introduction of selection turns the deterministic Wright-Fisher mean into $E [Δ p ∣ p] \approx s p (1 - p)$ per generation (for small $s$ and additive selection). The diffusion-limit generator is

L f = \frac{1}{2} p (1 - p) f^{''} (p) + s p (1 - p) f^{'} (p),

and the fixation probability satisfying $L u = 0$ , $u (0) = 0$ , $u (1) = 1$ is the Kimura formula

u (p) = \frac{1 - e ^{- 2 N_{e} s p}}{1 - e ^{- 2 N_{e} s}},

where $N_{e}$ is the effective population size (the conversion factor between actual population size and the size of the equivalent Wright-Fisher idealisation). The scaling parameter is $σ = 2 N_{e} s$ , the effective selection strength. Three regimes.

Strong selection, $σ ≫ 1$ . For a new beneficial mutation at $p = 1/ (2 N)$ , the formula evaluates to $u \approx (1 - e^{- s}) / (1) \approx s$ for very small $s$ (since $σ p = s$ ), and to $u \approx 2 s$ when one carefully takes the diffusion limit of the discrete chain. The precise result depends on the dominance assumption and the underlying chain (Wright-Fisher vs Moran differ by a factor of $2$ in the asymptote). Haldane's 1927 result $u \approx 2 s$ for a new dominant beneficial mutation in a haploid (or codominant diploid) model is the canonical strong-selection limit. The fixation probability rises linearly with $s$ , not exponentially; even a strongly beneficial new mutation has only modest fixation probability, because most copies are lost in the first few generations to drift before selection can amplify them.

Effectively neutral, $∣ σ ∣ ≪ 1$ . Expanding the numerator and denominator of $u (p)$ in powers of $σ$ gives $u (p) \approx p + O (σ)$ ; the leading-order behaviour is the neutral $u = p$ formula. The correction is $u (p) = p + \frac{σ}{2} p (1 - p) + O (σ^{2})$ — a small selection-dependent bow above (for $σ > 0$ ) or below (for $σ < 0$ ) the diagonal. Alleles in this regime — Ohta's nearly-neutral mutations — accumulate at a rate intermediate between the strict neutral $μ$ and the strict-selection $2 s μ$ . The nearly-neutral theory predicts a slight elevation of the substitution rate of weakly beneficial mutations and a slight suppression of the rate of weakly deleterious mutations relative to the strict-neutral baseline. Ohta 1973 Nature 246 is the foundational reference; the resulting predictions on the codon-usage and synonymous-vs-nonsynonymous substitution ratios are central to modern molecular-evolution inference.

Strong negative selection, $σ ≪ - 1$ . The formula evaluates to $u (p) \approx e^{2 N_{e} s p} \cdot p$ for $p \cdot ∣ σ ∣ ≪ 1$ and to $u \approx 0$ for finite $p$ when $∣ σ ∣ ≫ 1$ . A new strongly deleterious mutation has essentially zero fixation probability; the input-output product $2 N μ \cdot u \approx 2 N μ \cdot e^{2 N_{e} s} / (2 N)$ , exponentially suppressed below the neutral rate. The substitution rate of strongly deleterious mutations is therefore negligible. The genetic load — the equilibrium mutation-selection-balance frequency — is set by the balance between recurrent input $μ$ and selective elimination $s$ , giving $\overset{q}{^} \approx μ / s$ for a fully recessive deleterious allele (since selection is then weak against heterozygotes) and $\overset{q}{^} \approx μ / s$ for a partially-dominant one.

The crossover near $∣ σ ∣ \sim 1$ — equivalently $∣ s ∣ \sim 1/ (2 N_{e})$ — is the nearly-neutral threshold, conceptually the most consequential single quantity in molecular evolution. Mutations of identical absolute selection coefficient behave neutrally in small populations and selectively in large ones. The effective population size of Drosophila is on the order of $1 0^{6}$ , of humans on the order of $1 0^{4}$ ; mutations of selection coefficient $1 0^{- 5}$ are nearly-neutral in humans (drift wins) but effectively selected in Drosophila (selection wins). This is the foundational reason that synonymous codon usage is biased in Drosophila and essentially random in humans — drift overwhelms codon-bias selection in the smaller- $N_{e}$ species. Cross-species comparisons of $d_{N} / d_{S}$ ratios encode exactly this difference, mapping the ratio of nonsynonymous to synonymous substitution rates to the underlying ratio of selectively-driven to neutrally-driven fixation.

The closed-form selection formula has a beautiful structural interpretation. Define the integrating factor $ψ (p) := e^{- 2 N_{e} s p}$ . Then $u (p) = (ψ (0) - ψ (p)) / (ψ (0) - ψ (1))$ — the fixation probability is a ratio of differences of the integrating factor at the boundary, the interior, and the other boundary. The same structure governs first-passage problems for arbitrary one-dimensional diffusions: the integrating factor is $exp (- \int 2 M / V)$ , and fixation probabilities, sojourn times, and stationary distributions are all expressible as integrals of $ψ$ . The Wright-Fisher diffusion is the prototypical example in mathematical biology of this general one-dimensional theory, and the closed-form solvability stems from the fact that the variance coefficient $V (p) = p (1 - p)$ and the drift coefficient $M (p) = s p (1 - p)$ share the common factor $p (1 - p)$ that cancels in the integrating-factor formula.

Coalescent dual and the genealogical view [Master]

A different, equally illuminating route to Wright-Fisher results runs in reverse time. Instead of asking how present-day allele frequencies evolve forward into the future, ask: where did the gene copies in a present-day sample come from? Trace each backward, generation by generation, until lineages meet at common ancestors. The resulting random tree on $n$ sampled lineages is Kingman's coalescent (Kingman 1982 Stoch. Proc. Appl. 13), the time-reversal of the Wright-Fisher diffusion in the large- $N$ limit.

The construction. In a Wright-Fisher population of $2 N$ gene copies, two distinct lineages at the present coalesce in the previous generation with probability $1/ (2 N)$ — the probability that the two randomly chosen offspring happened to share a parent. With $k$ distinct lineages, any pair coalesces with probability $1/ (2 N)$ , so the per-generation total coalescence rate is $(2 k) / (2 N)$ . Rescaling time in units of $2 N$ generations, the coalescence rate is $(2 k)$ per unit rescaled time. The continuous-time limit — letting $N \to \infty$ with the rescaled time held fixed — is a pure-jump Markov process on the partitions of ${1, \dots, n}$ in which pairs of lineages coalesce at exponential rate $1$ . Each merge reduces the lineage count by one; the process terminates when a single lineage remains.

The expected time between $k$ and $k - 1$ lineages is $1/ (2 k) = 2/ (k (k - 1))$ rescaled time units. The expected total height of the coalescent tree — the expected time to the most recent common ancestor (TMRCA) of a sample of $n$ — is

E [T_{MRCA}] = k = 2 \sum n \frac{2}{k ( k - 1 )} = 2 (1 - \frac{1}{n}),

approaching 2 rescaled time units (i.e., $4 N$ generations) as $n \to \infty$ . The expected total branch length in the tree — the sum of all branches in coalescent units — is $E [L_{n}] = 2 \sum_{k = 1}^{n - 1} 1/ k = 2 a_{n}$ , the harmonic-number coefficient.

The relevance to population genetics is direct. Under the infinite-sites neutral model — every mutation falls at a previously-unmutated site — the expected number of segregating sites in a sample of $n$ chromosomes is $θ \cdot a_{n}$ , where $θ = 4 N_{e} μ$ is the population mutation rate. The reasoning: mutations arise on each branch of the coalescent tree at rate $μ$ per unit generation, so along total branch length $L_{n}$ in coalescent units (which corresponds to $2 N L_{n}$ actual generations) the expected number of mutations is $2 N μ L_{n} = θ L_{n} /2 \cdot 2$ . Substituting $E [L_{n}] = 2 a_{n}$ gives Watterson's estimator $\hat{θ}_{W} = S / a_{n}$ for the population mutation rate from the observed number of segregating sites $S$ .

The coalescent is the load-bearing tool of modern phylogenetics and population-genetic inference for two reasons. First, it reduces a forward-time stochastic dynamics on $2 N$ gene copies to a backward-time random tree on at most $n$ lineages — a dramatic dimensional reduction that makes simulation and likelihood computation tractable. Second, it is modular: changes to the demographic history (variable population size, population subdivision, ancient admixture, selection) can be encoded as deformations of the coalescent — branching-rate functions of time, structured coalescents with migration, ancestral recombination graphs — without changing the underlying tree-on-lineages formalism. The Wright-Fisher diffusion and the Kingman coalescent are the dual descriptions of the same neutral evolutionary process: the diffusion in forward time at the population level, the coalescent in backward time at the lineage level.

The connection back to this unit's main theorem: the closed-form fixation probability $u (p) = (1 - e^{- 2 N s p}) / (1 - e^{- 2 N s})$ has a coalescent interpretation as the probability that a randomly chosen present-day chromosome carries a lineage that traces back to an ancestor of allele $A$ rather than allele $a$ . Under neutrality this probability equals the founding frequency $p$ ; under selection it is biased by the integrating factor $e^{- 2 N s p}$ that distinguishes selectively-favoured from selectively-disfavoured lineages. The coalescent and the diffusion are the same theory in two languages — one looks forward in time at frequencies, the other looks backward in time at lineages — and population genetics in the post-Kingman era is increasingly written in the coalescent language because it scales better to the data scales of modern genomics.

Connections [Master]

Hardy-Weinberg equilibrium 19.02.01 pending. Hardy-Weinberg is the deterministic null model — the leading-order theory at $N = \infty$ with no selection, mutation, migration, or non-random mating. The Wright-Fisher chain is the canonical finite-population correction: relaxing the infinite-population assumption replaces the deterministic frequency-equilibrium with the stochastic process described here. This unit is the immediate generalisation of 19.02.01 pending in the stochastic direction, and every quantitative test of Hardy-Weinberg implicitly compares observed data against the Wright-Fisher null with finite $N_{e}$ .
Natural selection 19.03.01 pending. The selection equation $Δ p = s p (1 - p) / \overset{w}{ˉ}$ derived in the natural-selection unit is the deterministic limit of the Wright-Fisher drift coefficient; Kimura's $u (p)$ formula derived here is the finite-population stochastic correction. The strong-selection asymptote $u \approx 2 s$ recovers Haldane's 1927 fixation rule. The interplay between selection and drift — captured by the scaling parameter $2 N_{e} s$ — is the substantive connection: when $2 N_{e} s ≫ 1$ selection dominates, when $2 N_{e} s ≪ 1$ drift dominates, and the threshold is the foundational quantity of the nearly-neutral theory.
Genetic drift and the neutral theory 19.04.01. The genetic-drift unit takes the Wright-Fisher chain and its diffusion approximation developed here as the formal substrate of the neutral theory. The molecular-clock prediction (substitution rate $= μ$ independent of $N$ ), the heterozygosity-decay formula $H_{n} = (1 - 1/ (2 N))^{n} H_{0}$ , and the site-frequency-spectrum predictions all flow from the $u (p) = p$ neutral result and the diffusion-limit machinery. This unit provides the engine; the drift unit develops the empirical applications.
Quantitative genetics and the breeder's equation 19.05.01 pending. Wright-Fisher drift at a single locus generalises to multi-locus quantitative-trait evolution under the infinitesimal model: a quantitative trait controlled by many loci each of small effect responds to selection at rate $h^{2} S$ (Robertson-Price, Lande) while drifting at rate $V_{A} / (2 N_{e})$ per generation. The single-locus Wright-Fisher diffusion derived here is the per-locus building block; the quantitative-genetic theory aggregates over many loci and replaces the binomial sampling here with a Gaussian sampling at the trait level.
Phylogenetics and the molecular clock 19.07.01. The Kingman coalescent developed in the Master sub-section above is the substrate of modern phylogenetic inference. Bayesian coalescent-based methods (BEAST, MrBayes, PhyloNet) parametrise demographic history and selection as deformations of the neutral Kingman coalescent, and the $u (p) = p$ neutral substitution rate is the calibrating quantity of the molecular clock that turns sequence divergence into time. This unit is the population-genetic side of that bridge; the phylogenetics unit builds the inference framework on it.
Stochastic differential equations and Brownian motion 02.13.05 pending. The Wright-Fisher diffusion is one of the canonical one-dimensional Itô processes; its variance coefficient $p (1 - p)$ is degenerate at the absorbing boundaries and its boundary classification is a load-bearing example in the Feller theory of one-dimensional diffusions. The math-side connection — to the rectification theorem for vector fields and the broader theory of SDEs on bounded domains — is the natural cross-direction hook into math §02. The Wright-Fisher diffusion is to mathematical population genetics what Brownian motion on the line is to probability theory: the prototype example whose closed-form solvability anchors the general theory.
Sexual selection 19.03.02. Sexual selection imposes selection coefficients with variance components — male reproductive success is more dispersed than female under polygyny, runaway Fisherian dynamics generate frequency-dependent fitness landscapes, and good-genes mechanisms couple ornament alleles to viability alleles — and the Wright-Fisher machinery developed here is what converts that mating-success variance into allele-frequency change at finite $N_{e}$ . The sexual-selection unit at 19.03.02 uses the diffusion approximation to set fixation probabilities for ornament alleles and to bound the rate at which condition-dependent display traits can spread, and the effective-population-size deflation under high reproductive skew (a classic Wright-Fisher correction) is the quantitative entry point for analysing Y-chromosome and mtDNA bottlenecks under male-biased variance.
Mutation and repair 17.06.01 pending. The per-generation mutation rate $μ$ that appears as a parameter in the Wright-Fisher model is the output of the molecular mutation-rate machinery: the balance between endogenous damage, exogenous damage, and repair fidelity. The mutation-selection balance derived in the mutation and repair unit is the deterministic limit of the stochastic dynamics developed here, and the molecular clock for neutral evolution is set by the rate at which new alleles are introduced.

Historical & philosophical context [Master]

Sewall Wright 1931 Genetics 16 ^[Wright1931] introduced the discrete Wright-Fisher process as a model of allele-frequency dynamics in finite Mendelian populations. The paper, originally a 100-page monograph in Genetics, was Wright's foundational statement of the role of chance in evolution and the source of the term random genetic drift. Wright derived the eigenvalue $1 - 1/ (2 N)$ governing heterozygosity decay, introduced his $F$ -statistics for population structure, and proposed the shifting balance theory of evolution under combined drift and selection on a rugged adaptive landscape. R. A. Fisher's 1930 The Genetical Theory of Natural Selection ^[Fisher1930] is the parallel monograph from the selection side; Fisher's treatment used continuous-time deterministic differential equations and effectively anticipated the diffusion limit, though without Wright's explicit handling of finite-population stochasticity. The Wright-Fisher chain is named for both — Fisher's 1922 Proc. Roy. Soc. Edinburgh 42 paper On the dominance ratio contained an early version of the binomial-sampling process.

The diffusion limit was made precise by Motoo Kimura in a 1955 PNAS paper ^[Kimura1955], Solution of a process of random genetic drift with a continuous model — Kimura's first major paper, written while a graduate student at the University of Wisconsin. The paper solved the no-selection Fokker-Planck equation by separation of variables in terms of Gegenbauer polynomials, derived the heterozygosity-decay rate, and obtained the no-selection fixation-probability density. The selection extension and the closed-form $u (p) = (1 - e^{- 2 N s p}) / (1 - e^{- 2 N s})$ formula came in Kimura 1962 Genetics 47 ^[Kimura1962], a paper whose three-page derivation is one of the most cited results in evolutionary biology.

Kimura's 1968 Nature paper Evolutionary rate at the molecular level ^[Kimura1968] used the Wright-Fisher diffusion to propose the neutral theory of molecular evolution — the empirical claim that most molecular polymorphism within species and most molecular divergence between species is due to drift on neutral mutations rather than selection on advantageous ones. The 1968 paper triggered the longest-running debate in evolutionary biology (the neutralist-selectionist controversy), and Kimura's 1983 monograph The Neutral Theory of Molecular Evolution remains the canonical statement. Tomoko Ohta 1973 Nature 246 ^[Ohta1973] refined the strict-neutral theory to the nearly-neutral theory, recognising that mutations with $∣ s ∣ \sim 1/ (2 N_{e})$ occupy a regime where neither strict drift nor strict selection captures the dynamics and additional structure is required.

The coalescent dual was developed by John Kingman 1982 ^{[Kingman1982a]} in twin papers The coalescent (Stoch. Proc. Appl. 13) and On the genealogy of large populations (J. Appl. Prob. 19A). Kingman showed that the genealogy of a sample of $n$ lineages from a Wright-Fisher population, traced backward in time, converges in the large- $N$ limit to a specific random tree process — the Kingman coalescent. The forward-time Wright-Fisher diffusion and the backward-time coalescent are dual descriptions of the same neutral process, related by Watterson's 1975 Theor. Pop. Biol. 7 estimator that bridges sequence-level polymorphism to coalescent branch lengths. Patrick A. P. Moran 1958 Proc. Camb. Phil. Soc. 54 ^[Moran1958] introduced the alternative continuous-time Moran model, replacing one individual per time step rather than the entire generation; the Moran model has the same diffusion limit as Wright-Fisher up to a factor-of-2 time rescaling, illustrating the universality of the diffusion description.

Philosophically the Wright-Fisher diffusion is the cleanest case in evolutionary biology where a microscopic mechanism (binomial offspring sampling) produces a continuum limit (one-dimensional diffusion on $[0, 1]$ ) with universal structure — independent of microscopic details, sensitive only to the variance and drift coefficients at the population scale. This is the same continuum-limit logic that produces the heat equation from a random walk and the Black-Scholes equation from a binomial option-pricing tree, applied here to gene-frequency dynamics. The interpretive content — that evolutionary outcomes depend on the product $N s$ rather than on $N$ and $s$ separately — has been a recurring touchstone in the philosophy-of-biology literature on chance, contingency, and the unit of selection (Sober 1984; Beatty 1987; Walsh 2007). The neutral theory's claim that drift, rather than selection, is the dominant force shaping molecular variation makes it a rare case where a quantitative population-genetic theory has direct philosophical bearing on the structure of evolutionary explanation.

Bibliography [Master]

@article{Wright1931,
  author    = {Wright, S.},
  title     = {Evolution in {M}endelian Populations},
  journal   = {Genetics},
  year      = {1931},
  volume    = {16},
  pages     = {97--159},
}

@book{Fisher1930,
  author    = {Fisher, R. A.},
  title     = {The Genetical Theory of Natural Selection},
  publisher = {Clarendon Press},
  year      = {1930},
  address   = {Oxford},
}

@article{Kimura1955,
  author    = {Kimura, M.},
  title     = {Solution of a Process of Random Genetic Drift with a Continuous Model},
  journal   = {Proceedings of the National Academy of Sciences USA},
  year      = {1955},
  volume    = {41},
  pages     = {144--150},
}

@article{Kimura1962,
  author    = {Kimura, M.},
  title     = {On the Probability of Fixation of Mutant Genes in a Population},
  journal   = {Genetics},
  year      = {1962},
  volume    = {47},
  pages     = {713--719},
}

@article{Kimura1968,
  author    = {Kimura, M.},
  title     = {Evolutionary Rate at the Molecular Level},
  journal   = {Nature},
  year      = {1968},
  volume    = {217},
  pages     = {624--626},
}

@book{Kimura1983,
  author    = {Kimura, M.},
  title     = {The Neutral Theory of Molecular Evolution},
  publisher = {Cambridge University Press},
  year      = {1983},
}

@article{Ohta1973,
  author    = {Ohta, T.},
  title     = {Slightly Deleterious Mutant Substitutions in Evolution},
  journal   = {Nature},
  year      = {1973},
  volume    = {246},
  pages     = {96--98},
}

@article{Kingman1982a,
  author    = {Kingman, J. F. C.},
  title     = {The Coalescent},
  journal   = {Stochastic Processes and their Applications},
  year      = {1982},
  volume    = {13},
  pages     = {235--248},
}

@article{Kingman1982b,
  author    = {Kingman, J. F. C.},
  title     = {On the Genealogy of Large Populations},
  journal   = {Journal of Applied Probability},
  year      = {1982},
  volume    = {19A},
  pages     = {27--43},
}

@article{Moran1958,
  author    = {Moran, P. A. P.},
  title     = {Random Processes in Genetics},
  journal   = {Proceedings of the Cambridge Philosophical Society},
  year      = {1958},
  volume    = {54},
  pages     = {60--71},
}

@article{Watterson1975,
  author    = {Watterson, G. A.},
  title     = {On the Number of Segregating Sites in Genetical Models without Recombination},
  journal   = {Theoretical Population Biology},
  year      = {1975},
  volume    = {7},
  pages     = {256--276},
}

@article{KimuraOhta1969,
  author    = {Kimura, M. and Ohta, T.},
  title     = {The Average Number of Generations until Fixation of a Mutant Gene in a Finite Population},
  journal   = {Genetics},
  year      = {1969},
  volume    = {61},
  pages     = {763--771},
}

@article{Feller1951,
  author    = {Feller, W.},
  title     = {Diffusion Processes in Genetics},
  journal   = {Proc. Second Berkeley Symp. Math. Stat. and Prob.},
  year      = {1951},
  pages     = {227--246},
}

@book{Ewens2004,
  author    = {Ewens, W. J.},
  title     = {Mathematical Population Genetics {I}: Theoretical Introduction},
  publisher = {Springer},
  edition   = {2},
  year      = {2004},
}

@book{CrowKimura1970,
  author    = {Crow, J. F. and Kimura, M.},
  title     = {An Introduction to Population Genetics Theory},
  publisher = {Harper and Row},
  year      = {1970},
}

@book{CharlesworthCharlesworth2010,
  author    = {Charlesworth, B. and Charlesworth, D.},
  title     = {Elements of Evolutionary Genetics},
  publisher = {Roberts and Company},
  year      = {2010},
}

@book{HartlClark2007,
  author    = {Hartl, D. L. and Clark, A. G.},
  title     = {Principles of Population Genetics},
  publisher = {Sinauer Associates},
  edition   = {4},
  year      = {2007},
}

@book{Gillespie2004,
  author    = {Gillespie, J. H.},
  title     = {Population Genetics: A Concise Guide},
  publisher = {Johns Hopkins University Press},
  edition   = {2},
  year      = {2004},
}

@book{Futuyma2017,
  author    = {Futuyma, D. J.},
  title     = {Evolution},
  publisher = {Sinauer Associates},
  edition   = {4},
  year      = {2017},
}

@article{Haldane1927,
  author    = {Haldane, J. B. S.},
  title     = {A Mathematical Theory of Natural and Artificial Selection, Part V: Selection and Mutation},
  journal   = {Proceedings of the Cambridge Philosophical Society},
  year      = {1927},
  volume    = {23},
  pages     = {838--844},
}

Prerequisites

19.01.01 pending

Tier anchors

beginner: Coyne, *Why Evolution Is True* (Viking, 2009), Ch. 5 on the role of chance; Hartl & Clark, *Principles of Population Genetics* 4th ed. (Sinauer, 2007), Ch. 3 introductory sections on drift; Gillespie, *Population Genetics: A Concise Guide* 2nd ed. (Johns Hopkins, 2004), Ch. 2 (the Wright-Fisher process in plain language)
intermediate: Hartl & Clark, *Principles of Population Genetics* 4th ed. Ch. 3 + Ch. 7; Gillespie, *Population Genetics: A Concise Guide* 2nd ed. Ch. 2–3; Futuyma, *Evolution* 4th ed. (Sinauer, 2017), Ch. 10 (genetic drift); Crow & Kimura, *An Introduction to Population Genetics Theory* (Harper & Row, 1970), Ch. 3
master: Ewens, *Mathematical Population Genetics I* 2nd ed. (Springer, 2004), Ch. 3 (the Wright-Fisher Markov chain) + Ch. 4 (the diffusion approximation); Crow & Kimura, *An Introduction to Population Genetics Theory* (Harper & Row, 1970), Ch. 3 + Ch. 8 + Ch. 9; Charlesworth & Charlesworth, *Elements of Evolutionary Genetics* (Roberts, 2010), Ch. 5; primary literature — Wright 1931 *Genetics* 16; Fisher 1930 *Genetical Theory*; Kimura 1955 *PNAS* 41; Kimura 1962 *Genetics* 47; Kingman 1982 *Stoch. Proc. Appl.* 13

References

TODO_REF
Wright, S. — *Genetics* 16, 97–159 (1931) · Originator paper, *Evolution in Mendelian populations* — the discrete Wright-Fisher process, drift coefficient, fixation
TODO_REF
Fisher, R. A. — *The Genetical Theory of Natural Selection* (Clarendon Press, 1930) · Precursor monograph — gene-frequency change as a continuous process; antecedent of the diffusion limit
TODO_REF
Kimura, M. — *Proc. Natl. Acad. Sci. USA* 41, 144–150 (1955) · Originator paper on the diffusion equation for genetic drift, *Solution of a process of random genetic drift with a continuous model*
TODO_REF
Kimura, M. — *Genetics* 47, 713–719 (1962) · Originator paper, *On the probability of fixation of mutant genes in a population* — the famous $u(p)$ formula
TODO_REF
Kimura, M. — *Nature* 217, 624–626 (1968); *The Neutral Theory of Molecular Evolution* (Cambridge, 1983) · Originator papers on the neutral theory; molecular-clock corollary of the diffusion picture
TODO_REF
Ewens, W. J. — *Mathematical Population Genetics I: Theoretical Introduction*, 2nd ed. (Springer, 2004) · Ch. 3 the Wright-Fisher chain; Ch. 4 the diffusion approximation; canonical modern reference
TODO_REF
Crow, J. F. & Kimura, M. — *An Introduction to Population Genetics Theory* (Harper & Row, 1970) · Ch. 3 random genetic drift; Ch. 8 stochastic processes in genetics; Ch. 9 diffusion methods
TODO_REF
Charlesworth, B. & Charlesworth, D. — *Elements of Evolutionary Genetics* (Roberts and Company, 2010) · Ch. 5 effective population size, drift, and the coalescent
TODO_REF
Hartl, D. L. & Clark, A. G. — *Principles of Population Genetics*, 4th ed. (Sinauer Associates, 2007) · Ch. 3 causes of evolution; Ch. 7 the diffusion approximation in applied form
TODO_REF
Gillespie, J. H. — *Population Genetics: A Concise Guide*, 2nd ed. (Johns Hopkins, 2004) · Ch. 2 the Wright-Fisher process; Ch. 3 drift; teaching-grade presentation
TODO_REF
Kingman, J. F. C. — *Stochastic Processes and their Applications* 13, 235–248 (1982); *Journal of Applied Probability* 19A, 27–43 (1982) · Originator papers on the coalescent process — the time-reversed dual of the Wright-Fisher diffusion
TODO_REF
Futuyma, D. J. — *Evolution*, 4th ed. (Sinauer Associates, 2017) · Ch. 10 the role of chance in evolution; conceptual companion to the formal treatment here
TODO_REF
Moran, P. A. P. — *Proc. Camb. Phil. Soc.* 54, 60–71 (1958) · The Moran model, a continuous-time alternative to the Wright-Fisher chain with identical diffusion limit

Reviewer

Tyler (pending external evolutionary-biology reviewer per BIOLOGY_PLAN §7 — top recruitment priority; bio-side population-genetics specialist needed for master-tier sign-off, especially on the diffusion-limit theorem and the coalescent dual)

Estimated time

beginner: 16m
intermediate: 40m
master: 75m

Intuition [Beginner]

Visual [Beginner]

Worked example [Beginner]

Check your understanding [Beginner]

Formal definition [Intermediate+]

Counterexamples to common slips

Key theorem with proof [Intermediate+]

How the formula behaves

Exercises [Intermediate+]

From discrete to diffusion — the formal limit [Master]

The neutral Wright-Fisher and the u(p)=p formula [Master]

Selection added — the famous formula and weak-selection asymptotics [Master]

Coalescent dual and the genealogical view [Master]

Connections [Master]

Historical & philosophical context [Master]

Bibliography [Master]

The neutral Wright-Fisher and the $u (p) = p$ formula [Master]