Wright-Fisher model and the diffusion approximation
Anchor (Master): Ewens, *Mathematical Population Genetics I* 2nd ed. (Springer, 2004), Ch. 3 (the Wright-Fisher Markov chain) + Ch. 4 (the diffusion approximation); Crow & Kimura, *An Introduction to Population Genetics Theory* (Harper & Row, 1970), Ch. 3 + Ch. 8 + Ch. 9; Charlesworth & Charlesworth, *Elements of Evolutionary Genetics* (Roberts, 2010), Ch. 5; primary literature — Wright 1931 *Genetics* 16; Fisher 1930 *Genetical Theory*; Kimura 1955 *PNAS* 41; Kimura 1962 *Genetics* 47; Kingman 1982 *Stoch. Proc. Appl.* 13
Intuition [Beginner]
Take a small island population — say, 10 squirrels — and look at one gene with two versions, call them and . Say 5 of the 20 gene copies in the population are the version and 15 are . If each squirrel in the next generation gets its two gene copies by reaching at random into the parental gene pool, then on average we expect the same 5-to-15 ratio in the next generation. But on any given run, by sheer luck, we might draw 4 or 6 or 7 copies of instead. The frequency wobbles.
The Wright-Fisher model is the simplest version of this story. Each generation the next pool of gene copies is drawn at random from the current one, with replacement. The expected frequency stays the same, but the actual frequency drifts. Run the experiment long enough and one of two endpoints arrives: either every gene copy in the population is , or every gene copy is . One allele wins, the other is gone. This endpoint is called fixation, and fixation is the long-term fate of every population under drift alone.
A coin-flip analogy makes this concrete. Flip 20 fair coins. The expected number of heads is 10, but you almost never get exactly 10; some flips give 8 heads, some give 12. Now imagine that the proportion of heads from your last batch sets the bias of the coin in the next batch. If you got 12 heads, the coin in the next batch is slightly biased toward heads, so the next batch is likely to give a few more heads still. Run this loop and the proportion drifts further and further from 1/2 — eventually hitting 0 or 1 and staying there. That is the Wright-Fisher process. Sampling noise compounds because each generation's noise determines the next generation's bias.
Two things flow from this picture. First, drift is strongest when the population is small: the wobble per generation is proportional to one over the population size. A population of a million squirrels barely drifts at all in a single generation; a population of ten drifts substantially every generation. Second, drift erases variation. A population that started with both and ends with only one of them, and the diversity within the population — what biologists call the heterozygosity — decreases over time. The two effects together explain why small isolated populations lose genetic variation quickly: drift is fast in small populations, and it always destroys variation.
Once we add selection back in — one of the alleles is slightly better than the other — the picture becomes richer. Selection pushes the frequency one way; drift wobbles it both ways. When selection is strong and the population is large, selection wins and the better allele fixes. When selection is weak or the population is small, drift can override selection and the worse allele can win by luck. The boundary between these two regimes is captured by the product , where is the population size and is the selection coefficient — the central scaling parameter of population genetics, and the key insight this unit builds toward.
Visual [Beginner]
A picture is worth a thousand simulations here. The classical Wright-Fisher diagram plots allele frequency on the vertical axis from 0 to 1, with generations running left to right, and overlays many independent runs of the process starting from the same initial frequency. Each line wobbles up and down, some hitting 0 (loss), some hitting 1 (fixation), and the spread of lines fanning out over time captures the variance growth.
A complementary visual is the diffusion density: instead of tracking individual sample paths, track the probability distribution of the allele frequency at each time. Starting from a delta function at the initial frequency, the distribution spreads, flattens, and gradually concentrates at the two absorbing boundaries 0 and 1. After many generations, almost all of the probability has been absorbed: the population has either fixed or lost the allele, and the residual probability somewhere in between vanishes.
The picture captures three lessons that the formal theory will sharpen. Sample paths of allele frequency are martingales — they have no systematic upward or downward push under neutrality. The variance of the frequency grows over time, increasing the chance of hitting an absorbing boundary. And the long-run fate is fixation at one of the two endpoints, with probabilities that match the starting frequency.
Worked example [Beginner]
A textbook population of 10 diploid individuals — so 20 gene copies — has an allele at initial frequency . That means 8 copies of and 12 copies of in the founding generation. With no selection, no mutation, and no migration, we want to see how the frequency moves over the next few generations.
Step 1. The next generation is built by drawing 20 gene copies at random, with replacement, from the parental pool. Each draw is with probability 0.4, independently. The number of copies in the next generation is therefore a binomial random variable with 20 trials and success probability 0.4. Its expected value is , the same as the parent generation.
Step 2. The variance of that binomial count is , with standard deviation about 2.2. So a typical run produces between 6 and 10 copies of in the next generation — a frequency between 0.3 and 0.5. Some runs will give 5 copies, some 11, some 8 exactly. The frequency wobbles around the parental value, with the size of the wobble set by the binomial variance.
Step 3. Say generation 1 happens to land on 6 copies of , a frequency of 0.3. Now generation 2 is sampled from a parental pool with : the binomial parameters are 20 trials and probability 0.3, expected value 6, variance 4.2. The frequency in generation 2 wobbles around 0.3.
Step 4. Iterate. The frequency follows a random walk whose step sizes depend on where it currently sits — the variance per step is for diploid population size , largest at and zero at the boundaries and . Once the frequency reaches 0 or 1 it cannot leave: the boundaries are absorbing. Run for long enough and one of two endpoints arrives.
What this tells us: even with no selection, allele frequencies are not preserved across generations in a small population. They drift, and the drift accumulates until one allele takes over. The smaller the population, the faster the drift.
Check your understanding [Beginner]
Formal definition [Intermediate+]
Fix a single autosomal locus with two alleles and in a diploid population of constant size across generations, with gene copies per generation. Let denote the number of alleles at generation , and write for the allele frequency. The Wright-Fisher model specifies that generation is built by drawing gene copies independently and with replacement from the parental gene pool — equivalently, conditional on , the count is a binomial random variable with trials and success probability .
Definition (Wright-Fisher Markov chain). The Wright-Fisher chain at population size with no selection, mutation, or migration is the discrete-time Markov chain on the state space with transition probabilities
for . The states and are absorbing: once the allele is lost and stays so; once the allele is fixed and stays so. The remaining states are transient.
The conditional expectation and variance of the next-generation count are
Translating to the frequency ,
The frequency process is a martingale: its conditional expectation given the past is its current value. The variance of a single-generation step is , which vanishes at the boundaries and and is maximised at the interior point .
Definition (Wright-Fisher with selection and mutation). Adding selection coefficient against allele (so allele has relative fitness and allele has relative fitness ) and forward/backward mutation rates from to and from to , the per-generation expected post-selection-and-mutation frequency is
and the Wright-Fisher transition is binomial sampling around : is binomial. The four forces — drift (from sampling), selection, forward mutation, backward mutation — are superposed in a single transition kernel.
Counterexamples to common slips
- Drift does not have a direction. The Wright-Fisher chain has under neutrality, so drift on average leaves the frequency unchanged. The bias toward fixation or loss arises from absorption at the boundaries, not from any per-step push.
- Drift is not "selection of the random kind." Selection systematically shifts the expected frequency; drift adds zero-mean noise on top of any systematic shift. The two are dimensionally distinct: selection enters as the drift coefficient of the diffusion, mutation as additional drift terms, drift in the random-walk sense as the variance coefficient .
- The Wright-Fisher chain is not the same as the Moran model. The Moran process (Moran 1958) replaces one individual per time step rather than the entire population — it has a different variance per unit time but the same diffusion limit. Their continuous-time scaling differs by a factor of 2.
- Variance per generation depends on where the frequency sits. The boundary regions, where one allele is nearly fixed, have small per-generation variance because is small. The chain moves slowly near the boundaries and quickly through the middle of the interval.
Key theorem with proof [Intermediate+]
The signature result of Wright-Fisher theory under selection is Kimura's fixation-probability formula, first stated in Kimura 1962 Genetics 47, derived via the diffusion limit and the backward Kolmogorov equation.
Theorem (Kimura, 1962). Let denote the probability that allele eventually fixes in a Wright-Fisher population of gene copies, starting from allele-frequency , where has selective advantage over (so allele has fitness relative to at fitness , with small). In the diffusion limit with held fixed, the fixation probability is
Under neutrality , the formula reduces to .
Proof. The diffusion limit replaces the discrete Wright-Fisher chain with a continuous-state Markov process on , whose infinitesimal generator is
with variance coefficient and drift coefficient . The fixation probability satisfies the backward Kolmogorov equation on with boundary conditions (an allele at frequency never fixes) and (an allele already fixed stays fixed).
Writing out in coordinates,
Cancel the common factor (which is positive on ):
an ordinary differential equation for . Let ; then , with solution for some constant . Integrating,
Apply the boundary conditions. From : , so
From : , so
In the diffusion scaling is replaced by (the parameter held fixed in the limit is ), giving the boxed formula. Under , expand both numerator and denominator: and , so , the neutral result.
Bridge. The diffusion-limit derivation builds toward 19.02.01 pending Hardy-Weinberg, where the deterministic frequency-equilibrium is the leading-order picture and Wright-Fisher drift is the first-order stochastic correction. The same formula appears again in 19.03.01 pending natural selection, where the strong-selection asymptotics for recovers Haldane's 1927 rule that the probability of fixation of a beneficial new mutation is approximately twice its selection coefficient. The foundational reason is that the backward Kolmogorov equation identifies the fixation probability with a harmonic function for the diffusion generator, and the central insight is that the diffusion-limit ODE is integrable in closed form because the variance and drift coefficients share a common factor that cancels. Putting these together, the formula is the cleanest analytic bridge between the discrete genetics of Wright and the analytic methods of Kimura — the bridge is exactly the diffusion approximation, and it generalises through the entire toolkit of one-dimensional diffusion theory.
How the formula behaves
The formula has three limiting regimes that the master tier will exploit.
| Regime | Condition | Approximation |
|---|---|---|
| Strong positive selection | and | |
| New mutation under strong selection | and | |
| Effectively neutral | $ | 2Ns |
| Strong negative selection | (allele lost) |
The crossover near is the nearly-neutral threshold of Ohta 1973: mutations whose selection coefficient behave as effectively neutral and obey the rule, while those with behave as effectively deterministic. The location of the threshold depends on , so identical mutations behave neutrally in small populations and selectively in large ones.
Exercises [Intermediate+]
From discrete to diffusion — the formal limit [Master]
The discrete Wright-Fisher chain is exactly solvable for small — one can explicitly diagonalise the by binomial transition matrix and read off the eigenvalues and eigenvectors. Wright did exactly this in his 1931 Genetics paper, deriving the principal eigenvalue that controls the long-run rate of heterozygosity decay. But the matrix-diagonalisation route does not scale: for biologically realistic on the order of or , the matrix is unwieldy and the eigenvalue structure is hard to interpret. The diffusion approximation is the analytic substitute that recovers the same eigenvalue at leading order and opens the route to closed-form fixation probabilities, fixation times, and stationary distributions.
The setup. Take a sequence of Wright-Fisher chains at population sizes tending to infinity, with selection coefficient and mutation rates scaled so that , , and remain fixed in the limit. Rescale time so that one diffusion-time unit corresponds to generations: . The claim — Feller 1951, Karlin-Taylor 1981, Ewens 2004 Ch. 4 for the modern statement — is that as the process converges in distribution (in the Skorokhod sense on path space) to a Markov diffusion on with infinitesimal generator for variance coefficient and drift coefficient (the factor of comes from the time rescaling; conventions vary).
Why does the limit work? Two observations. Per generation, the conditional mean of the rescaled increment is , of order . The conditional variance per generation is , also of order . Both quantities scale identically. Rescaling time by a factor of multiplies the per-step mean by (the held-fixed combination) and multiplies the per-step variance by (the unscaled variance coefficient). So in the limit, the rescaled mean per unit time is (finite, fixed) and the rescaled variance per unit time is (also finite, fixed). These are precisely the drift and variance coefficients of the limiting diffusion.
The deeper structural fact is that the higher moments — for — are of order , so they vanish faster than the time-rescaling can amplify them. This is the standard hypothesis for the diffusion-limit theorem: variance scales as , drift scales as , all higher cumulants are negligible. The Wright-Fisher chain satisfies it because the binomial distribution concentrates around its mean at rate — exactly the rate Brownian motion concentrates.
The diffusion limit is not merely a calculational convenience; it captures universality. Several different microscopic models of finite-population evolution — the Wright-Fisher chain with non-overlapping generations, the Moran model with overlapping generations, the Cannings family of exchangeable models — all converge to the same diffusion limit at leading order in , with possibly rescaled time coordinates. The diffusion equation is the universal continuum description of evolutionary stochasticity at the single-locus level, in the same way that the heat equation is the universal continuum description of random walks. The microscopic details — whether sampling is binomial or hypergeometric or Polya-urn — drop out in the limit.
A precise statement. The forward Kolmogorov (Fokker-Planck) equation for the density of the allele frequency is
with boundary behaviour determined by whether and are accessible (for the Wright-Fisher diffusion they are, and the boundary points are absorbing in the no-mutation case and reflecting once mutation is present). The dual backward Kolmogorov equation for any expectation at fixed terminal time is , run backward in time, with . The fixation-probability derivation in the Key Theorem above is the time-independent case of the backward equation, .
In stochastic-differential-equation language the diffusion is
with a standard Wiener process and the boundary accessible in finite time. The variance coefficient is non-Lipschitz at the boundaries; existence and uniqueness of solutions follow from the Yamada-Watanabe criterion (Ikeda-Watanabe 1981 Ch. IV) and the singular boundaries are absorbing under the relevant boundary classification (Feller 1952 Trans. AMS 77). The connection to math §02 is direct: the Wright-Fisher diffusion is one of the canonical one-dimensional Itô processes whose generator is degenerate at the boundary and whose study motivated much of mid-20th-century stochastic-process theory.
The neutral Wright-Fisher and the formula [Master]
Setting collapses the drift coefficient to zero, leaving the purely diffusive Wright-Fisher process with generator . The backward equation becomes on , with boundary conditions . The solution is the linear function .
This is Kimura's neutral-fixation formula. It says: under neutrality, the probability that an allele eventually fixes equals its current frequency in the population. A new mutation at frequency has fixation probability . A polymorphism at frequency 0.5 has fixation probability 0.5 — it is equally likely to win or lose. A polymorphism near fixation at frequency 0.99 has fixation probability 0.99 — it almost certainly wins, not because of any push but because the random walk is much closer to the upper boundary than the lower.
The proof has two illuminating routes. The diffusion route (above) reduces to a one-line ODE. The martingale route, sketched in Exercise 6, observes that is a bounded martingale under neutrality, and the optional-stopping theorem gives , where is the absorption time. Since , this immediately yields . The martingale argument is elementary; the diffusion argument is computationally heavier but generalises to selection, mutation, migration, and arbitrary one-dimensional state-dependent drift.
Why does the formula matter biologically? Three points.
First, the input-output structure of neutral evolution. A population of gene copies introduces new mutations at total rate per generation, where is the per-copy per-generation mutation rate. Each new mutation has fixation probability . The substitution rate — new alleles fixing per generation — is therefore , independent of population size. This is the molecular clock, Kimura's most consequential single result and the foundation of phylogenetic dating. The rate at which neutral substitutions accumulate along a lineage equals the per-lineage mutation rate, and time can be inferred from sequence divergence by inverting this relation. The cancellation between population-scale input and per-copy fixation probability is the cleanest expression of why population size, which dominates so many population-genetic quantities, drops out of the substitution dynamics.
Second, the time to fixation versus time to loss. Even though new mutations fix with probability , the time to fixation (conditional on fixing) is on the order of generations — the diffusion result of Kimura-Ohta 1969 derived by integrating the backward equation with absorption at as the terminal condition. The time to loss (conditional on losing) is much shorter, on the order of generations: most neutral mutations die out within a handful of generations of arising. The disparity between fixation time () and loss time () means that at any moment, the polymorphism in a population is dominated by alleles on their way to loss, with a small tail of alleles slowly transiting toward fixation. The expected number of polymorphic sites in a sample of chromosomes is where — Watterson's 1975 result and the foundation of statistical inference from sequence data.
Third, the failure mode of the strict neutral theory. Empirically, real populations show more polymorphism than predicts under strict neutrality combined with realistic and , and the site-frequency spectrum of polymorphisms is skewed toward rare alleles relative to the neutral prediction. The standard explanations — background selection (Charlesworth-Morgan-Charlesworth 1993), recurrent selective sweeps (Maynard Smith-Haigh 1974), demographic history — all act by perturbing the neutral diffusion in specific, parametrically-identifiable ways. The neutral formula is the null hypothesis against which every modern molecular-evolution test (Tajima's , Fu-Li tests, McDonald-Kreitman) is constructed. Empirical neutralism, in the sense of Kimura's strong claim that most molecular variation is neutral, remains contested; but neutralism as the formal baseline of the field is uncontroversial.
Selection added — the famous formula and weak-selection asymptotics [Master]
The introduction of selection turns the deterministic Wright-Fisher mean into per generation (for small and additive selection). The diffusion-limit generator is
and the fixation probability satisfying , , is the Kimura formula
where is the effective population size (the conversion factor between actual population size and the size of the equivalent Wright-Fisher idealisation). The scaling parameter is , the effective selection strength. Three regimes.
Strong selection, . For a new beneficial mutation at , the formula evaluates to for very small (since ), and to when one carefully takes the diffusion limit of the discrete chain. The precise result depends on the dominance assumption and the underlying chain (Wright-Fisher vs Moran differ by a factor of in the asymptote). Haldane's 1927 result for a new dominant beneficial mutation in a haploid (or codominant diploid) model is the canonical strong-selection limit. The fixation probability rises linearly with , not exponentially; even a strongly beneficial new mutation has only modest fixation probability, because most copies are lost in the first few generations to drift before selection can amplify them.
Effectively neutral, . Expanding the numerator and denominator of in powers of gives ; the leading-order behaviour is the neutral formula. The correction is — a small selection-dependent bow above (for ) or below (for ) the diagonal. Alleles in this regime — Ohta's nearly-neutral mutations — accumulate at a rate intermediate between the strict neutral and the strict-selection . The nearly-neutral theory predicts a slight elevation of the substitution rate of weakly beneficial mutations and a slight suppression of the rate of weakly deleterious mutations relative to the strict-neutral baseline. Ohta 1973 Nature 246 is the foundational reference; the resulting predictions on the codon-usage and synonymous-vs-nonsynonymous substitution ratios are central to modern molecular-evolution inference.
Strong negative selection, . The formula evaluates to for and to for finite when . A new strongly deleterious mutation has essentially zero fixation probability; the input-output product , exponentially suppressed below the neutral rate. The substitution rate of strongly deleterious mutations is therefore negligible. The genetic load — the equilibrium mutation-selection-balance frequency — is set by the balance between recurrent input and selective elimination , giving for a fully recessive deleterious allele (since selection is then weak against heterozygotes) and for a partially-dominant one.
The crossover near — equivalently — is the nearly-neutral threshold, conceptually the most consequential single quantity in molecular evolution. Mutations of identical absolute selection coefficient behave neutrally in small populations and selectively in large ones. The effective population size of Drosophila is on the order of , of humans on the order of ; mutations of selection coefficient are nearly-neutral in humans (drift wins) but effectively selected in Drosophila (selection wins). This is the foundational reason that synonymous codon usage is biased in Drosophila and essentially random in humans — drift overwhelms codon-bias selection in the smaller- species. Cross-species comparisons of ratios encode exactly this difference, mapping the ratio of nonsynonymous to synonymous substitution rates to the underlying ratio of selectively-driven to neutrally-driven fixation.
The closed-form selection formula has a beautiful structural interpretation. Define the integrating factor . Then — the fixation probability is a ratio of differences of the integrating factor at the boundary, the interior, and the other boundary. The same structure governs first-passage problems for arbitrary one-dimensional diffusions: the integrating factor is , and fixation probabilities, sojourn times, and stationary distributions are all expressible as integrals of . The Wright-Fisher diffusion is the prototypical example in mathematical biology of this general one-dimensional theory, and the closed-form solvability stems from the fact that the variance coefficient and the drift coefficient share the common factor that cancels in the integrating-factor formula.
Coalescent dual and the genealogical view [Master]
A different, equally illuminating route to Wright-Fisher results runs in reverse time. Instead of asking how present-day allele frequencies evolve forward into the future, ask: where did the gene copies in a present-day sample come from? Trace each backward, generation by generation, until lineages meet at common ancestors. The resulting random tree on sampled lineages is Kingman's coalescent (Kingman 1982 Stoch. Proc. Appl. 13), the time-reversal of the Wright-Fisher diffusion in the large- limit.
The construction. In a Wright-Fisher population of gene copies, two distinct lineages at the present coalesce in the previous generation with probability — the probability that the two randomly chosen offspring happened to share a parent. With distinct lineages, any pair coalesces with probability , so the per-generation total coalescence rate is . Rescaling time in units of generations, the coalescence rate is per unit rescaled time. The continuous-time limit — letting with the rescaled time held fixed — is a pure-jump Markov process on the partitions of in which pairs of lineages coalesce at exponential rate . Each merge reduces the lineage count by one; the process terminates when a single lineage remains.
The expected time between and lineages is rescaled time units. The expected total height of the coalescent tree — the expected time to the most recent common ancestor (TMRCA) of a sample of — is
approaching 2 rescaled time units (i.e., generations) as . The expected total branch length in the tree — the sum of all branches in coalescent units — is , the harmonic-number coefficient.
The relevance to population genetics is direct. Under the infinite-sites neutral model — every mutation falls at a previously-unmutated site — the expected number of segregating sites in a sample of chromosomes is , where is the population mutation rate. The reasoning: mutations arise on each branch of the coalescent tree at rate per unit generation, so along total branch length in coalescent units (which corresponds to actual generations) the expected number of mutations is . Substituting gives Watterson's estimator for the population mutation rate from the observed number of segregating sites .
The coalescent is the load-bearing tool of modern phylogenetics and population-genetic inference for two reasons. First, it reduces a forward-time stochastic dynamics on gene copies to a backward-time random tree on at most lineages — a dramatic dimensional reduction that makes simulation and likelihood computation tractable. Second, it is modular: changes to the demographic history (variable population size, population subdivision, ancient admixture, selection) can be encoded as deformations of the coalescent — branching-rate functions of time, structured coalescents with migration, ancestral recombination graphs — without changing the underlying tree-on-lineages formalism. The Wright-Fisher diffusion and the Kingman coalescent are the dual descriptions of the same neutral evolutionary process: the diffusion in forward time at the population level, the coalescent in backward time at the lineage level.
The connection back to this unit's main theorem: the closed-form fixation probability has a coalescent interpretation as the probability that a randomly chosen present-day chromosome carries a lineage that traces back to an ancestor of allele rather than allele . Under neutrality this probability equals the founding frequency ; under selection it is biased by the integrating factor that distinguishes selectively-favoured from selectively-disfavoured lineages. The coalescent and the diffusion are the same theory in two languages — one looks forward in time at frequencies, the other looks backward in time at lineages — and population genetics in the post-Kingman era is increasingly written in the coalescent language because it scales better to the data scales of modern genomics.
Connections [Master]
Hardy-Weinberg equilibrium
19.02.01pending. Hardy-Weinberg is the deterministic null model — the leading-order theory at with no selection, mutation, migration, or non-random mating. The Wright-Fisher chain is the canonical finite-population correction: relaxing the infinite-population assumption replaces the deterministic frequency-equilibrium with the stochastic process described here. This unit is the immediate generalisation of19.02.01pending in the stochastic direction, and every quantitative test of Hardy-Weinberg implicitly compares observed data against the Wright-Fisher null with finite .Natural selection
19.03.01pending. The selection equation derived in the natural-selection unit is the deterministic limit of the Wright-Fisher drift coefficient; Kimura's formula derived here is the finite-population stochastic correction. The strong-selection asymptote recovers Haldane's 1927 fixation rule. The interplay between selection and drift — captured by the scaling parameter — is the substantive connection: when selection dominates, when drift dominates, and the threshold is the foundational quantity of the nearly-neutral theory.Genetic drift and the neutral theory
19.04.01. The genetic-drift unit takes the Wright-Fisher chain and its diffusion approximation developed here as the formal substrate of the neutral theory. The molecular-clock prediction (substitution rate independent of ), the heterozygosity-decay formula , and the site-frequency-spectrum predictions all flow from the neutral result and the diffusion-limit machinery. This unit provides the engine; the drift unit develops the empirical applications.Quantitative genetics and the breeder's equation
19.05.01pending. Wright-Fisher drift at a single locus generalises to multi-locus quantitative-trait evolution under the infinitesimal model: a quantitative trait controlled by many loci each of small effect responds to selection at rate (Robertson-Price, Lande) while drifting at rate per generation. The single-locus Wright-Fisher diffusion derived here is the per-locus building block; the quantitative-genetic theory aggregates over many loci and replaces the binomial sampling here with a Gaussian sampling at the trait level.Phylogenetics and the molecular clock
19.07.01. The Kingman coalescent developed in the Master sub-section above is the substrate of modern phylogenetic inference. Bayesian coalescent-based methods (BEAST, MrBayes, PhyloNet) parametrise demographic history and selection as deformations of the neutral Kingman coalescent, and the neutral substitution rate is the calibrating quantity of the molecular clock that turns sequence divergence into time. This unit is the population-genetic side of that bridge; the phylogenetics unit builds the inference framework on it.Stochastic differential equations and Brownian motion
02.13.05pending. The Wright-Fisher diffusion is one of the canonical one-dimensional Itô processes; its variance coefficient is degenerate at the absorbing boundaries and its boundary classification is a load-bearing example in the Feller theory of one-dimensional diffusions. The math-side connection — to the rectification theorem for vector fields and the broader theory of SDEs on bounded domains — is the natural cross-direction hook into math §02. The Wright-Fisher diffusion is to mathematical population genetics what Brownian motion on the line is to probability theory: the prototype example whose closed-form solvability anchors the general theory.Sexual selection
19.03.02. Sexual selection imposes selection coefficients with variance components — male reproductive success is more dispersed than female under polygyny, runaway Fisherian dynamics generate frequency-dependent fitness landscapes, and good-genes mechanisms couple ornament alleles to viability alleles — and the Wright-Fisher machinery developed here is what converts that mating-success variance into allele-frequency change at finite . The sexual-selection unit at19.03.02uses the diffusion approximation to set fixation probabilities for ornament alleles and to bound the rate at which condition-dependent display traits can spread, and the effective-population-size deflation under high reproductive skew (a classic Wright-Fisher correction) is the quantitative entry point for analysing Y-chromosome and mtDNA bottlenecks under male-biased variance.Mutation and repair
17.06.01pending. The per-generation mutation rate that appears as a parameter in the Wright-Fisher model is the output of the molecular mutation-rate machinery: the balance between endogenous damage, exogenous damage, and repair fidelity. The mutation-selection balance derived in the mutation and repair unit is the deterministic limit of the stochastic dynamics developed here, and the molecular clock for neutral evolution is set by the rate at which new alleles are introduced.
Historical & philosophical context [Master]
Sewall Wright 1931 Genetics 16 [Wright1931] introduced the discrete Wright-Fisher process as a model of allele-frequency dynamics in finite Mendelian populations. The paper, originally a 100-page monograph in Genetics, was Wright's foundational statement of the role of chance in evolution and the source of the term random genetic drift. Wright derived the eigenvalue governing heterozygosity decay, introduced his -statistics for population structure, and proposed the shifting balance theory of evolution under combined drift and selection on a rugged adaptive landscape. R. A. Fisher's 1930 The Genetical Theory of Natural Selection [Fisher1930] is the parallel monograph from the selection side; Fisher's treatment used continuous-time deterministic differential equations and effectively anticipated the diffusion limit, though without Wright's explicit handling of finite-population stochasticity. The Wright-Fisher chain is named for both — Fisher's 1922 Proc. Roy. Soc. Edinburgh 42 paper On the dominance ratio contained an early version of the binomial-sampling process.
The diffusion limit was made precise by Motoo Kimura in a 1955 PNAS paper [Kimura1955], Solution of a process of random genetic drift with a continuous model — Kimura's first major paper, written while a graduate student at the University of Wisconsin. The paper solved the no-selection Fokker-Planck equation by separation of variables in terms of Gegenbauer polynomials, derived the heterozygosity-decay rate, and obtained the no-selection fixation-probability density. The selection extension and the closed-form formula came in Kimura 1962 Genetics 47 [Kimura1962], a paper whose three-page derivation is one of the most cited results in evolutionary biology.
Kimura's 1968 Nature paper Evolutionary rate at the molecular level [Kimura1968] used the Wright-Fisher diffusion to propose the neutral theory of molecular evolution — the empirical claim that most molecular polymorphism within species and most molecular divergence between species is due to drift on neutral mutations rather than selection on advantageous ones. The 1968 paper triggered the longest-running debate in evolutionary biology (the neutralist-selectionist controversy), and Kimura's 1983 monograph The Neutral Theory of Molecular Evolution remains the canonical statement. Tomoko Ohta 1973 Nature 246 [Ohta1973] refined the strict-neutral theory to the nearly-neutral theory, recognising that mutations with occupy a regime where neither strict drift nor strict selection captures the dynamics and additional structure is required.
The coalescent dual was developed by John Kingman 1982 [Kingman1982a] in twin papers The coalescent (Stoch. Proc. Appl. 13) and On the genealogy of large populations (J. Appl. Prob. 19A). Kingman showed that the genealogy of a sample of lineages from a Wright-Fisher population, traced backward in time, converges in the large- limit to a specific random tree process — the Kingman coalescent. The forward-time Wright-Fisher diffusion and the backward-time coalescent are dual descriptions of the same neutral process, related by Watterson's 1975 Theor. Pop. Biol. 7 estimator that bridges sequence-level polymorphism to coalescent branch lengths. Patrick A. P. Moran 1958 Proc. Camb. Phil. Soc. 54 [Moran1958] introduced the alternative continuous-time Moran model, replacing one individual per time step rather than the entire generation; the Moran model has the same diffusion limit as Wright-Fisher up to a factor-of-2 time rescaling, illustrating the universality of the diffusion description.
Philosophically the Wright-Fisher diffusion is the cleanest case in evolutionary biology where a microscopic mechanism (binomial offspring sampling) produces a continuum limit (one-dimensional diffusion on ) with universal structure — independent of microscopic details, sensitive only to the variance and drift coefficients at the population scale. This is the same continuum-limit logic that produces the heat equation from a random walk and the Black-Scholes equation from a binomial option-pricing tree, applied here to gene-frequency dynamics. The interpretive content — that evolutionary outcomes depend on the product rather than on and separately — has been a recurring touchstone in the philosophy-of-biology literature on chance, contingency, and the unit of selection (Sober 1984; Beatty 1987; Walsh 2007). The neutral theory's claim that drift, rather than selection, is the dominant force shaping molecular variation makes it a rare case where a quantitative population-genetic theory has direct philosophical bearing on the structure of evolutionary explanation.
Bibliography [Master]
@article{Wright1931,
author = {Wright, S.},
title = {Evolution in {M}endelian Populations},
journal = {Genetics},
year = {1931},
volume = {16},
pages = {97--159},
}
@book{Fisher1930,
author = {Fisher, R. A.},
title = {The Genetical Theory of Natural Selection},
publisher = {Clarendon Press},
year = {1930},
address = {Oxford},
}
@article{Kimura1955,
author = {Kimura, M.},
title = {Solution of a Process of Random Genetic Drift with a Continuous Model},
journal = {Proceedings of the National Academy of Sciences USA},
year = {1955},
volume = {41},
pages = {144--150},
}
@article{Kimura1962,
author = {Kimura, M.},
title = {On the Probability of Fixation of Mutant Genes in a Population},
journal = {Genetics},
year = {1962},
volume = {47},
pages = {713--719},
}
@article{Kimura1968,
author = {Kimura, M.},
title = {Evolutionary Rate at the Molecular Level},
journal = {Nature},
year = {1968},
volume = {217},
pages = {624--626},
}
@book{Kimura1983,
author = {Kimura, M.},
title = {The Neutral Theory of Molecular Evolution},
publisher = {Cambridge University Press},
year = {1983},
}
@article{Ohta1973,
author = {Ohta, T.},
title = {Slightly Deleterious Mutant Substitutions in Evolution},
journal = {Nature},
year = {1973},
volume = {246},
pages = {96--98},
}
@article{Kingman1982a,
author = {Kingman, J. F. C.},
title = {The Coalescent},
journal = {Stochastic Processes and their Applications},
year = {1982},
volume = {13},
pages = {235--248},
}
@article{Kingman1982b,
author = {Kingman, J. F. C.},
title = {On the Genealogy of Large Populations},
journal = {Journal of Applied Probability},
year = {1982},
volume = {19A},
pages = {27--43},
}
@article{Moran1958,
author = {Moran, P. A. P.},
title = {Random Processes in Genetics},
journal = {Proceedings of the Cambridge Philosophical Society},
year = {1958},
volume = {54},
pages = {60--71},
}
@article{Watterson1975,
author = {Watterson, G. A.},
title = {On the Number of Segregating Sites in Genetical Models without Recombination},
journal = {Theoretical Population Biology},
year = {1975},
volume = {7},
pages = {256--276},
}
@article{KimuraOhta1969,
author = {Kimura, M. and Ohta, T.},
title = {The Average Number of Generations until Fixation of a Mutant Gene in a Finite Population},
journal = {Genetics},
year = {1969},
volume = {61},
pages = {763--771},
}
@article{Feller1951,
author = {Feller, W.},
title = {Diffusion Processes in Genetics},
journal = {Proc. Second Berkeley Symp. Math. Stat. and Prob.},
year = {1951},
pages = {227--246},
}
@book{Ewens2004,
author = {Ewens, W. J.},
title = {Mathematical Population Genetics {I}: Theoretical Introduction},
publisher = {Springer},
edition = {2},
year = {2004},
}
@book{CrowKimura1970,
author = {Crow, J. F. and Kimura, M.},
title = {An Introduction to Population Genetics Theory},
publisher = {Harper and Row},
year = {1970},
}
@book{CharlesworthCharlesworth2010,
author = {Charlesworth, B. and Charlesworth, D.},
title = {Elements of Evolutionary Genetics},
publisher = {Roberts and Company},
year = {2010},
}
@book{HartlClark2007,
author = {Hartl, D. L. and Clark, A. G.},
title = {Principles of Population Genetics},
publisher = {Sinauer Associates},
edition = {4},
year = {2007},
}
@book{Gillespie2004,
author = {Gillespie, J. H.},
title = {Population Genetics: A Concise Guide},
publisher = {Johns Hopkins University Press},
edition = {2},
year = {2004},
}
@book{Futuyma2017,
author = {Futuyma, D. J.},
title = {Evolution},
publisher = {Sinauer Associates},
edition = {4},
year = {2017},
}
@article{Haldane1927,
author = {Haldane, J. B. S.},
title = {A Mathematical Theory of Natural and Artificial Selection, Part V: Selection and Mutation},
journal = {Proceedings of the Cambridge Philosophical Society},
year = {1927},
volume = {23},
pages = {838--844},
}