19.03.01 · eco-evo-bio / selection

Natural selection — directional, stabilizing, and disruptive

draft3 tiersLean: nonepending prereqs

Anchor (Master): Futuyma Ch. 11-12; Crow & Kimura Introduction to Population Genetics Theory; primary literature — Darwin 1859, Fisher 1930, Haldane 1932

Intuition [Beginner]

Natural selection is the mechanism by which populations change over time. The idea is simple: individuals with traits that make them better suited to their environment tend to survive and reproduce more than those without those traits. Over many generations, the advantageous traits become more common in the population.

Three conditions must be met for natural selection to occur. First, there must be variation — individuals in the population must differ in their traits. Second, the variation must be heritable — parents must pass their traits to offspring. Third, the variation must affect fitness — the ability to survive and reproduce. Without all three, natural selection cannot operate.

Fitness is a measure of reproductive success. An individual with high fitness produces more offspring that survive to reproduce themselves. Fitness is not about being the strongest or fastest — it is about leaving more descendants in the next generation.

Natural selection can act in three ways on a quantitative trait (one that varies continuously, like body size):

Directional selection favours individuals at one extreme of the trait distribution (e.g., the largest individuals). Over time, the average trait value in the population shifts in that direction.

Stabilizing selection favours individuals near the average and selects against both extremes. The trait distribution becomes narrower over time. Birth weight in humans is a classic example: both very small and very large babies have higher mortality.

Disruptive selection favours individuals at both extremes and selects against the average. If strong enough, it can split a population into two distinct groups.

Visual [Beginner]

Imagine a trait like beak size in a bird population. The distribution of beak sizes is shown as a bell curve. An arrow indicates the direction of selection:

In directional selection, the peak of the curve moves to the right (toward larger beaks). In stabilizing selection, the curve stays centred but becomes narrower. In disruptive selection, the single peak splits into two peaks at the extremes, with a valley in the middle.

Worked example [Beginner]

The peppered moth (Biston betularia) provides a classic example of directional selection. In pre-industrial England, light-coloured moths were well camouflaged against lichen-covered trees. Dark (melanic) moths were rare and easily spotted by birds.

During the Industrial Revolution, soot darkened the trees. On darkened bark, light moths were conspicuous and dark moths were camouflaged. The selection coefficient against light moths in polluted areas was approximately $s = 0.1$ — meaning light moths had 10% lower survival than dark moths.

The change in allele frequency per generation is approximately:

Δ p \approx s pq,

where $p$ is the frequency of the advantageous (dark) allele, $q$ is the frequency of the disadvantageous (light) allele, and $s = 0.1$ . Starting from $p = 0.01$ (1% dark allele), $q = 0.99$ :

Δ p \approx 0.1 \times 0.01 \times 0.99 = 0.00099 \approx 0.001.

The dark allele increases by about 0.1% per generation. Over 50 generations (~25 years), this accumulates, and the dark allele can rise to high frequency — exactly what was observed in industrial England.

Check your understanding [Beginner]

Exercise (medium, short answer).

Explain why natural selection cannot produce "perfect" organisms.

Hint

What constraints limit the power of selection? Consider the available genetic variation, trade-offs, and historical contingency.

Answer

Several constraints prevent perfection. (1) Lack of genetic variation: selection can only act on existing alleles; if no mutation has produced a beneficial variant, selection cannot create it. (2) Trade-offs: improving one trait often worsens another (e.g., large body size increases strength but reduces speed and requires more food). (3) Historical constraints: evolution works with existing structures, not from scratch. The vertebrate eye has a blind spot (where the optic nerve exits) because of the inverted retina — a historical accident of development that cannot be "un-evolved." (4) Genetic drift: in finite populations, chance can override selection, especially for weakly selected traits. (5) Environmental change: selection adapts organisms to past conditions; if the environment changes, adaptations may become obsolete.

Formal definition [Intermediate+]

Fitness

Absolute fitness ( $W$ ) is the expected number of offspring an individual contributes to the next generation. Relative fitness ( $w$ ) is absolute fitness divided by the mean fitness of the population. By convention, the fittest genotype is assigned $w = 1$ and others are expressed as fractions.

For a single locus with two alleles (A, a) and genotypes AA, Aa, aa, the relative fitnesses are typically written:

Genotype	AA	Aa	aa
Fitness	1	$1 - h s$	$1 - s$

Here $s$ is the selection coefficient against the homozygous recessive, and $h$ is the dominance coefficient ( $h = 0$ for complete recessivity, $h = 0.5$ for additive, $h = 1$ for complete dominance).

Allele frequency change

For a diploid population with alleles A (frequency $p$ ) and a (frequency $q = 1 - p$ ), viability selection followed by random mating produces genotype frequencies:

f (AA) = p^{2} w_{AA} / \overset{w}{ˉ}, f (Aa) = 2 pq w_{A a} / \overset{w}{ˉ}, f (aa) = q^{2} w_{aa} / \overset{w}{ˉ},

where $\overset{w}{ˉ} = p^{2} w_{AA} + 2 pq w_{A a} + q^{2} w_{aa}$ is the mean fitness.

The new allele frequency after selection is:

p^{'} = \frac{p ^{2} w _{AA} + pq w _{A a}}{w ˉ} .

The change per generation is:

Δ p = p^{'} - p = \frac{pq [ p ( w _{AA} - w _{A a} ) + q ( w _{A a} - w _{aa} )]}{w ˉ} .

For the special case of a recessive deleterious allele ( $w_{AA} = w_{A a} = 1$ , $w_{aa} = 1 - s$ ):

Δ p = \frac{s p q ^{2}}{1 - s q ^{2}} .

This shows that selection against a recessive allele is very weak when $q$ is small (because most copies are hidden in heterozygotes where they are not exposed to selection).

Fisher's fundamental theorem

Fisher (1930) showed that the rate of increase in mean fitness due to natural selection equals the additive genetic variance in fitness:

\frac{d w ˉ}{d t} = V_{A} (w) .

This elegant result states that the more genetic variation in fitness exists in a population, the faster selection can increase mean fitness. As selection fixes favourable alleles, variation is depleted, and the rate of adaptation slows.

Key theorem with proof [Intermediate+]

Theorem (Directional selection fixes advantageous alleles). Consider an additive locus ( $h = 0.5$ ) with alleles A (advantageous, fitness $1 + s$ in homozygote) and a (fitness $1 - s$ in homozygote, $1$ in heterozygote). Starting from any initial frequency $p_{0} > 0$ , the advantageous allele A will fix ( $p \to 1$ ) in an infinite population under directional selection.

Proof. With additive fitness ( $w_{AA} = 1 + s$ , $w_{A a} = 1$ , $w_{aa} = 1 - s$ ), the allele frequency change is:

Δ p = \frac{pq [ p ( w _{AA} - w _{A a} ) + q ( w _{A a} - w _{aa} )]}{w ˉ} = \frac{pq [ p s + q s ]}{w ˉ} = \frac{s pq}{w ˉ} .

Since $s > 0$ and $p, q > 0$ (for $0 < p < 1$ ), $Δ p > 0$ . The allele frequency increases monotonically. As $p \to 1$ , $q \to 0$ , and $Δ p \to 0$ , so the approach is asymptotic. The allele fixes. $□$

The additive case is the cleanest: selection acts on heterozygotes as well as homozygotes, so there is no "hiding" of deleterious alleles. The rate of increase is proportional to $pq$ , maximised at $p = 0.5$ (where genetic variation is greatest), and slowing as the allele approaches fixation.

Exercises [Intermediate+]

Exercise 4 (medium, short answer).

Explain why selection against a recessive deleterious allele becomes very slow when the allele is rare.

Hint

When $q$ is small, most copies of the a allele are in heterozygotes (Aa). What is the fitness of Aa?

Answer

When $q$ is small, most copies of the deleterious allele are in heterozygotes (Aa) rather than homozygotes (aa). The ratio of heterozygotes to homozygotes is approximately $2 p / q : q$ , which becomes very large when $q$ is small. Since Aa has the same fitness as AA (the allele is recessive), selection "cannot see" the deleterious allele in heterozygotes. Only the $q^{2}$ fraction of homozygotes is exposed to selection. The result: $Δ q \approx - s q^{2}$ when $q$ is small, so the rate of decrease is proportional to $q^{2}$ . Selection is ineffective at removing the last copies of a recessive deleterious allele.

Exercise 5 (hard, symbolic).

Consider an overdominant locus (heterozygote advantage) where $w_{AA} = 1 - s_{1}$ , $w_{A a} = 1$ , $w_{aa} = 1 - s_{2}$ with $s_{1}, s_{2} > 0$ . Derive the equilibrium allele frequency $p^{*}$ and show that it is stable.

Hint

At equilibrium, $Δ p = 0$ . Solve for $p$ . Then check whether selection pushes $p$ back toward $p^{*}$ from nearby values.

Answer

Setting $Δ p = 0$ :

Δ p = \frac{pq [ - p s _{1} + q s _{2} ]}{w ˉ} = 0.

Ignoring the boundary solutions $p = 0, q = 0$ : $- p s_{1} + q s_{2} = 0$ , so $s_{2} (1 - p) = s_{1} p$ , giving:

p^{*} = \frac{s _{2}}{s _{1} + s _{2}} .

For stability: when $p < p^{*}$ , $Δ p > 0$ (the allele increases); when $p > p^{*}$ , $Δ p < 0$ (the allele decreases). The equilibrium is therefore stable — a protected polymorphism. Both alleles are maintained indefinitely because neither homozygote has higher fitness than the heterozygote. The classic example is sickle-cell anaemia: $s_{1}$ (fitness cost of AA in malaria-free regions, relative to Aa which has malaria resistance) and $s_{2}$ (fitness cost of aa, sickle-cell disease). The equilibrium frequency of the sickle-cell allele depends on the balance of these two selection pressures.

Exercise 7 (hard, short answer).

Fisher's fundamental theorem states that the rate of increase in mean fitness equals the additive genetic variance in fitness. Explain why this theorem implies that natural selection cannot decrease mean fitness.

Hint

Variance is always non-negative. What does this imply about the sign of $d \overset{w}{ˉ} / d t$ ?

Answer

The additive genetic variance $V_{A} (w)$ is a variance — it is always non-negative. Therefore $d \overset{w}{ˉ} / d t = V_{A} (w) \geq 0$ . Natural selection (acting alone, in an infinite population, with constant fitnesses) can only increase or maintain mean fitness, never decrease it. This is why natural selection is described as an "optimising" process: it drives the population toward higher mean fitness on the adaptive landscape. (Caveats: in finite populations, drift can oppose selection; in changing environments, fitness values shift and yesterday's optimum may be today's liability.)

Exercise 8 (medium, short answer).

Give an example of disruptive selection in nature and explain how it could lead to speciation.

Hint

Think about a situation where the environment provides two distinct resources, and intermediates are poorly adapted to both.

Answer

Darwin's finches on the Galapagos Islands provide an example. On islands where two distinct seed sizes are available (large hard seeds and small soft seeds), birds with either very large beaks (for cracking hard seeds) or very small beaks (for handling small seeds efficiently) have higher fitness than birds with intermediate beaks. This disruptive selection can lead to assortative mating (large-beaked birds mate with other large-beaked birds) and eventually to reproductive isolation between the two beak-size groups. If gene flow between the groups is restricted, they diverge genetically and may become separate species — a process called sympatric speciation. The African finch Pyrenestes ostrinus shows exactly this pattern, with two distinct bill morphs maintained by disruptive selection on seed hardness.

Deterministic Wright-Fisher dynamics and Fisher's fundamental theorem [Master]

The Intermediate tier introduced the per-generation allele-frequency change $Δ p$ as a discrete recursion. At the Master level the same dynamics admit two complementary descriptions: the discrete Wright-Fisher update that is closest to what actually happens in a finite generation cycle, and the continuous-time selection equation that strips the dynamics to their analytic skeleton. Both are deterministic in the infinite-population limit; the stochastic correction that distinguishes them is taken up in the next sub-section.

In the discrete Wright-Fisher selection model, each generation begins with adult genotype frequencies, multiplies them by relative fitnesses to obtain weighted contributions to the gamete pool, normalises by mean fitness $\overset{w}{ˉ}$ , and finally draws the next generation's genotypes under random mating from those gamete frequencies. For a single locus with alleles A and a at frequencies $p$ and $q = 1 - p$ , the deterministic limit (infinite population, no sampling noise) is the one-step map already derived in the Intermediate tier:

p_{t + 1} = \frac{p _{t}^{2} w _{AA} + p _{t} q _{t} w _{A a}}{w ˉ _{t}}, \overset{w}{ˉ}_{t} = p_{t}^{2} w_{AA} + 2 p_{t} q_{t} w_{A a} + q_{t}^{2} w_{aa} .

This map is the canonical population-genetic dynamical system. It has at most three fixed points on the unit interval — the two boundary points $p = 0$ and $p = 1$ , and at most one interior point where $p (w_{AA} - w_{A a}) + q (w_{A a} - w_{aa}) = 0$ . Directional selection (additive or directional dominance) has the two boundary points as a source and a sink. Heterozygote advantage ( $w_{A a} > w_{AA}, w_{aa}$ ) makes the interior point a global attractor and both boundaries repellors — a protected polymorphism. Heterozygote disadvantage flips this: the interior point becomes an unstable saddle and selection drives the population to whichever pure-strategy boundary it started closer to. The three modes of selection on a quantitative trait — directional, stabilizing, disruptive — recover these three regimes once we coarse-grain many loci of small effect (sub-section 3).

Passing to the continuous-time limit is the analyst's move. When selection coefficients are small, $w_{ij} = 1 + s_{ij}$ with $∣ s_{ij} ∣ ≪ 1$ , the per-generation discrete change $Δ p$ over a time-step of one generation is well-approximated by a smooth derivative. For a haploid population with allele A of fitness $1 + s$ competing with allele a of fitness $1$ , the change per generation $Δ p = s p (1 - p) + O (s^{2})$ becomes the logistic-style ordinary differential equation

\frac{d p}{d t} = s p (1 - p),

with explicit solution $p (t) = p_{0} / (p_{0} + (1 - p_{0}) e^{- s t})$ . This is the cleanest possible statement of directional selection: a sigmoid sweep from initial frequency $p_{0}$ to fixation at rate set by the selection coefficient. The half-time to fixation when $p_{0} ≪ 1$ is approximately $t_{1/2} \approx ln (1/ p_{0}) / s$ — logarithmic in the rarity of the advantageous allele, inverse in the selection coefficient. A 1% selective advantage starting from one copy in a population of $1 0^{6}$ requires roughly $14/0.01 \approx 1400$ generations to reach fixation deterministically; a 10% advantage requires only 140. For diploids with additive fitness, the equation is identical but with $s$ replaced by $s /2$ (the homozygote-vs-heterozygote difference halves the per-allele advantage). For diploids with arbitrary dominance, $d p / d t = pq \overset{α}{ˉ} / \overset{w}{ˉ}$ where $\overset{α}{ˉ} = p (w_{AA} - w_{A a}) + q (w_{A a} - w_{aa})$ is the average effect of an allelic substitution at the locus — a quantity that will reappear in Fisher's theorem in a moment.

Fisher's fundamental theorem of natural selection is the global organising statement of this whole framework. Fisher (1930) proved that, under selection alone, the instantaneous rate of increase in mean fitness equals the additive genetic variance in fitness:

\frac{d w ˉ}{d t} = V_{A} (w) .

Three features of this theorem deserve careful prose attention because the modern literature routinely misstates it.

First, the variance on the right-hand side is additive genetic variance, not total genetic variance. It is the variance explained by a least-squares regression of individual fitness on counts of each allele — the portion of fitness variation that is heritable in a strict additive sense. Dominance variance, epistatic variance, and environmental variance are excluded. This is why the theorem says nothing about dominance interactions: those contribute to total fitness variation but cannot be exploited by selection acting on allele frequencies in the next generation, because heterozygote effects cannot be passed on intact.

Second, $V_{A} (w) \geq 0$ is a variance, so $d \overset{w}{ˉ} / d t \geq 0$ under selection alone in a constant environment. Selection is monotone: it cannot decrease mean fitness. This is the formal expression of the climbing-the-adaptive-landscape metaphor and the justification for thinking of fitness as a Lyapunov function for the deterministic dynamics. (The Lyapunov interpretation is exact only for the additive single-locus case; with epistasis between loci the population can cross fitness valleys via correlated allele-frequency change, but never against the gradient at any single locus.)

Third — the perennially misquoted part — the theorem holds for the partial change in mean fitness due to changes in allele frequencies alone, with all other contributions (changes in environment, dominance effects, frequency-dependence, demographic stochasticity) held constant. Total mean fitness in a real population can decrease over time if the environment deteriorates faster than selection can compensate, or if mean fitness was inflated by transient heterozygosity that selection then erodes. Fisher's theorem is a local statement about the gradient component of evolutionary change, not a global guarantee about the trajectory of $\overset{w}{ˉ}$ .

The proof in the additive single-locus case is one line. Write $\overset{w}{ˉ} = p^{2} w_{AA} + 2 pq w_{A a} + q^{2} w_{aa}$ . Differentiating with respect to time and using the deterministic update $\overset{p}{˙} = pq \overset{α}{ˉ} / \overset{w}{ˉ}$ (with $\overset{α}{ˉ} = p (w_{AA} - w_{A a}) + q (w_{A a} - w_{aa})$ the average effect),

\frac{d w ˉ}{d t} = 2 p \overset{α}{ˉ} \overset{p}{˙} = \frac{2 pq α ˉ ^{2}}{w ˉ} \cdot \overset{w}{ˉ} \cdot \frac{1}{w ˉ} = \frac{2 pq α ˉ ^{2}}{w ˉ} .

The quantity $2 pq \overset{α}{ˉ}^{2} / \overset{w}{ˉ}$ is precisely the additive genetic variance in relative fitness — the variance of the regression-predicted fitness, $V_{A} (w)$ . The multi-locus statement is obtained by summing this contribution across loci and noting that linkage disequilibrium contributes additionally but does not change the structure of the bound. Hardy-Weinberg equilibrium 19.02.01 pending is the implicit substrate: it sets the genotype frequencies at the start of each generation against which the average effect $\overset{α}{ˉ}$ is measured. Without random mating, the partition of variance into additive and non-additive components shifts and Fisher's theorem must be restated in terms of the inheritance-only component of variance.

What Fisher's theorem does not say is also important. It does not predict the rate at which a population approaches an evolutionary optimum, because as selection fixes favourable alleles it depletes the additive variance that fuels further change — the rate slows, often dramatically, as $V_{A} \to 0$ . It does not guarantee that a population reaches a global fitness peak: with multiple peaks on the landscape, deterministic selection climbs the local gradient and stops at the nearest peak. And it does not apply to frequency-dependent fitness — the case where $w_{ij}$ depends on the genotype frequencies themselves — which is the regime where evolutionary game theory takes over (sub-section 4).

Stochastic drift and the diffusion limit [Master]

The deterministic Wright-Fisher dynamics of sub-section 1 are the leading-order behaviour as population size grows without bound. In any finite population, the gamete pool is a stochastic sample from the parental distribution, and the next-generation allele frequency is a binomial random variable rather than a deterministic value. This sampling noise is genetic drift, and its interplay with selection is the subject of stochastic population genetics. The cleanest mathematical treatment comes from the diffusion approximation worked out by Wright in the 1930s and developed into a full theory by Kimura in the 1950s and 60s.

The setup is as follows. In a population of $N$ diploid individuals (so $2 N$ gametes contributing each generation), the conditional distribution of the next-generation allele count given current allele frequency $p$ is binomial $Bin (2 N, p^{*})$ , where $p^{*}$ is the deterministic post-selection frequency from the Wright-Fisher update. Conditional on $p$ , the next-generation frequency has mean $p^{*}$ and variance $p^{*} (1 - p^{*}) / (2 N)$ . The deterministic dynamics capture the mean; drift adds a stochastic increment of root-mean-square magnitude $p (1 - p) / (2 N)$ per generation. When $N$ is moderate and $s$ is comparable to $1/ N$ , the deterministic and stochastic contributions are of similar size, and neither dominates.

Taking $N \to \infty$ naively kills the stochasticity (variance vanishes as $1/ N$ ). The informative diffusion limit is obtained by holding the scaled selection coefficient $σ = 2 N s$ fixed as $N \to \infty$ and rescaling time so that one unit of diffusion time corresponds to $2 N$ generations. In this limit, the discrete Markov chain on ${0, 1, \dots, 2 N} /2 N$ converges (Donsker-type) to the Wright-Fisher diffusion on $[0, 1]$ , governed by the stochastic differential equation

d p = s p (1 - p) d t + p (1 - p) d B_{t},

where $B_{t}$ is standard Brownian motion and time is now measured in units of $2 N$ generations. The forward Kolmogorov equation for the transition density $ϕ (p, t ∣ p_{0})$ is Kimura's celebrated diffusion equation:

\frac{\partial ϕ}{\partial t} = - \frac{\partial}{\partial p} [s p (1 - p) ϕ] + \frac{1}{2} \frac{\partial ^{2}}{\partial p ^{2}} [p (1 - p) ϕ] .

The first term is the deterministic drift (selection); the second is the stochastic diffusion (sampling). When $s = 0$ , the equation reduces to pure drift, with the elegant property that the variance of $p$ grows linearly in time at rate $p_{0} (1 - p_{0})$ until the boundaries absorb the trajectory.

Three quantitative consequences of this framework deserve separate prose treatment because each is a textbook fixture of modern population genetics.

The first is the fixation probability of a new allele with selection coefficient $s$ entering at frequency $p_{0} = 1/ (2 N)$ . Solving the backward Kolmogorov equation with appropriate boundary conditions gives, in the diffusion limit,

u (p_{0}) = \frac{1 - e ^{- 4 N s p_{0}}}{1 - e ^{- 4 N s}} .

For a single new copy at $p_{0} = 1/ (2 N)$ this becomes $u \approx (1 - e^{- 2 s}) / (1 - e^{- 4 N s})$ , which for $∣ s ∣ ≪ 1$ and $N s ≫ 1$ simplifies to Haldane's approximation $u \approx 2 s$ for a beneficial allele. The interpretation is striking: even an allele with substantial advantage has only a small probability of fixing when introduced as a single copy, because early in its trajectory the deterministic push is overwhelmed by drift. A 1% advantage has roughly a 2% fixation chance; a 10% advantage roughly 20%. Most beneficial mutations are lost to drift before selection can amplify them. The fixation probability of a neutral allele is simply $u = p_{0} = 1/ (2 N)$ , the standard result that a neutral mutation has fixation probability equal to its initial frequency.

The second is the selection-drift balance regime. The relative importance of selection versus drift is controlled by the product $N s$ — strictly $4 N s$ for diploids with the exponential factor above, but conventionally $∣ N s ∣$ is the order-of-magnitude diagnostic. When $∣ N s ∣ ≫ 1$ selection dominates: fates are essentially deterministic. When $∣ N s ∣ ≪ 1$ drift dominates: the allele behaves as effectively neutral and fixation depends only on $p_{0}$ . The crossover region $∣ N s ∣ \sim 1$ is where the two forces are commensurate, and is exactly where many real polymorphisms in real populations live — a fact that motivated Kimura's neutral theory of molecular evolution. The whole point of the diffusion approximation is to give a quantitative theory of this crossover regime that neither pure selection theory nor pure drift theory can describe.

The third is the role of effective population size $N_{e}$ rather than census size $N$ . Real populations violate the Wright-Fisher assumptions in many ways: sex ratios are unequal, family sizes are variable, populations fluctuate seasonally, generations overlap, populations are spatially structured. Each violation tends to amplify drift relative to what the census size alone would predict. The effective population size is defined as the size of an idealised Wright-Fisher population that would experience drift at the observed rate. For most species $N_{e}$ is smaller than $N$ — often by an order of magnitude or more. Humans have a census population of $\sim 8 \times 1 0^{9}$ but a long-term effective size of only $\sim 1 0^{4}$ , dominated by population bottlenecks during the species's deep history. All the formulas above (fixation probability, diffusion time scale, selection-drift threshold) are computed using $N_{e}$ , not $N$ . The practical upshot is that drift is a much stronger force in real populations than the raw census numbers suggest. Genetic drift 19.04.01 develops these threads further, including its consequences for variation maintenance and the neutral theory.

The deterministic limit recovered in sub-section 1 is the $N_{e} \to \infty$ limit of this stochastic theory. Fisher's fundamental theorem holds as an expectation in the stochastic version: $E [Δ \overset{w}{ˉ}] = V_{A} (w) -$ (drift-induced loss of variance) — selection still pushes mean fitness up on average, but drift introduces random walks that can transiently fix deleterious alleles or lose beneficial ones. The asymmetry between fixation probability of beneficial ( $\sim 2 s$ ) and deleterious ( $\sim 2∣ s ∣ e^{- 4 N ∣ s ∣}$ , exponentially suppressed for $N ∣ s ∣ ≫ 1$ ) alleles is what allows selection to remain effective even in finite populations: drift adds noise but does not erase the directional bias.

Quantitative-trait extension — breeder's equation and heritability [Master]

The single-locus framework treats alleles as the unit of selection. The quantitative-trait framework treats continuously-varying phenotypes — height, beak depth, milk yield, brain volume — as the unit, and folds the underlying multi-locus genetic architecture into statistical aggregates. The two frameworks are not in conflict; they are different coordinate systems on the same dynamics, and the choice is dictated by what is measurable.

For a quantitative trait $z$ with mean $\overset{z}{ˉ}$ and phenotypic variance $σ_{P}^{2}$ , the breeder's equation predicts the response to selection:

R = h^{2} S,

where $S$ is the selection differential (the difference between the mean of selected parents and the population mean) and $h^{2}$ is the narrow-sense heritability (the fraction of phenotypic variance due to additive genetic effects: $h^{2} = V_{A} / V_{P}$ ). This equation, central to both evolutionary biology and artificial breeding, partitions the response to selection into a component due to the strength of selection ( $S$ ) and a component due to the genetic architecture ( $h^{2}$ ).

Directional selection corresponds to $S \neq = 0$ . Stabilizing selection corresponds to $S = 0$ but reduces $V_{P}$ (the variance decreases because intermediates have higher fitness). Disruptive selection corresponds to $S = 0$ but increases $V_{P}$ .

Under the Lande equation, multivariate selection on a vector of correlated traits is:

Δ \overset{ˉ}{z} = G β,

where $G$ is the additive genetic variance-covariance matrix and $β$ is the vector of selection gradients (partial regression coefficients of fitness on traits). This framework allows the analysis of correlated responses to selection and evolutionary constraints imposed by genetic correlations between traits.

The mapping from the allele-frequency view to the trait view goes through the infinitesimal model. Imagine many loci, each of small additive effect, contributing to the trait. The central limit theorem then implies that the breeding value (the additively heritable component of phenotype) is approximately normally distributed, with variance $V_{A}$ equal to the sum of per-locus variances $2 p_{i} q_{i} α_{i}^{2}$ over all contributing loci $i$ . Environmental noise adds an independent normal contribution of variance $V_{E}$ , giving a phenotypic variance $V_{P} = V_{A} + V_{D} + V_{I} + V_{E}$ — additive, dominance, interaction, and environmental variances respectively. Only the additive part $V_{A}$ is faithfully passed to offspring through random mating; the rest is reshuffled each generation. This is why $h^{2} = V_{A} / V_{P}$ — the narrow-sense heritability — appears in the breeder's equation rather than the broad-sense $H^{2} = (V_{A} + V_{D} + V_{I}) / V_{P}$ .

The breeder's equation is a one-step prediction. Iterated over many generations, the response can be computed only as long as $V_{A}$ remains roughly constant, which holds in the infinitesimal-model limit because each contributing locus moves only slightly. Over long timescales $V_{A}$ erodes via two mechanisms: selection itself fixes favourable alleles and depletes variance at those loci (Fisher's fundamental theorem in action), and the depletion is replenished by new mutations at rate $V_{M}$ — the mutational variance per generation. The long-run equilibrium under stabilising selection on a quantitative trait is set by mutation-selection balance: $V_{A}^{*} \approx V_{M} / s$ in the simplest models, where $s$ is the per-trait selection strength. This is why many quantitative traits in nature show substantial standing heritabilities (typically 30-60%) despite continuous selection — input variation from mutation roughly balances the depletion by selection.

The three modes of selection on a quantitative trait have crisp predictions in this framework. Directional selection is implemented as a fitness function $w (z)$ that is monotone in $z$ , producing a non-zero selection differential $S$ in the direction of fitness increase; the breeder's equation gives a response $R = h^{2} S$ , shifting the population mean. Stabilising selection is a fitness function with a single interior optimum, say Gaussian $w (z) = exp (- (z - z_{opt})^{2} /2 ω^{2})$ . The selection differential at the population mean is zero (when the population is centred on the optimum), but the within-generation variance after selection is reduced: $V_{P}^{'} = V_{P} \cdot ω^{2} / (ω^{2} + V_{P})$ . The phenotypic distribution narrows. Over many generations, $V_{A}$ erodes faster than mutation can replenish it if $s$ is large; if $s$ is small, mutation-selection balance maintains a substantial residual variance. Disruptive selection is the inverse: a fitness function with a minimum at the population mean and higher fitness at both extremes (a U-shaped or bimodal $w (z)$ ). The within-generation variance after selection increases: $V_{P}^{'} > V_{P}$ . If the disruptive selection is strong and persistent, and if assortative mating develops between the two extreme phenotypes, the population can split into two distinct modes. This is one of the mechanisms of sympatric speciation discussed in 19.06.01.

The G-matrix $G$ encapsulates the genetic architecture across multiple traits. Its diagonal entries are the per-trait additive genetic variances $V_{A}^{(i)}$ ; its off-diagonal entries are additive genetic covariances $Cov_{A} (z_{i}, z_{j})$ that arise from pleiotropy (single alleles affecting multiple traits) and linkage disequilibrium between loci affecting different traits. The Lande equation $Δ \overset{ˉ}{z} = G β$ tells us that the response to multivariate selection is the genetic-covariance-warped projection of the selection gradient. A trait under no direct selection ( $β_{i} = 0$ ) can still evolve if it is genetically correlated with a trait under selection — a phenomenon called correlated response that underlies both artificial-breeding side-effects (selecting for milk yield in dairy cattle inadvertently increased calving difficulty via positive genetic correlation with calf size) and natural-evolutionary constraints (the inability to optimise two traits independently when they share a pleiotropic genetic basis).

The G-matrix itself evolves over time. Selection depletes variance unequally across traits; mutation regenerates variance with its own covariance structure; recombination breaks down linkage disequilibrium. On short timescales the G-matrix is approximately stable and the Lande equation is predictive; on long timescales the G-matrix can rotate and stretch as the genetic architecture responds to the same selection pressures. Empirical comparisons of G-matrices across related species often show striking conservation over millions of years, supporting the assumption of approximate stability that underlies most quantitative-genetic predictions. Where G-matrices have diverged, this divergence is itself informative about the history of selection. Full development of these threads — heritability estimation, G-matrix inference, response to artificial selection — is the subject of 19.05.01.

The conceptual unification with the single-locus framework is via the average effect $\overset{α}{ˉ}$ that appeared in Fisher's theorem. At a single locus, $\overset{α}{ˉ}$ is the regression coefficient of fitness on allele count, and $2 pq \overset{α}{ˉ}^{2}$ is that locus's contribution to $V_{A} (w)$ . Across many loci affecting a trait, the analogous decomposition replaces fitness with the trait value: $V_{A} (z) = \sum_{i} 2 p_{i} q_{i} α_{i}^{2}$ , where now $α_{i}$ is the regression coefficient of trait on count of allele $i$ . The breeder's equation is then Fisher's theorem applied to the trait-fitness regression rather than to fitness itself, and the heritability $h^{2}$ is the squared correlation between phenotype and breeding value. This is why the same constants — $V_{A}$ , average effect, heritability — appear in both the allele-frequency and quantitative-trait frameworks; they are the same underlying objects measured in different units.

Connection to evolutionary game theory and replicator dynamics [Master]

The frameworks of sub-sections 1–3 assume that the fitness of each genotype is a fixed parameter of the population — at most a function of the environment, but not of the population's own composition. Many of the most interesting selective regimes in nature violate this assumption: the fitness of a strategy depends on what other individuals in the population are doing. A peacock's elaborate tail is fit because most peahens prefer it, which is fit because most peacocks have it; a sickle-cell heterozygote is fit because most others are homozygous and contract malaria; a hawk in a population of doves prospers because doves yield, but if everyone becomes a hawk the gains evaporate in mutual escalation. Frequency-dependent selection — fitness depending on the genotype frequencies themselves — requires a separate mathematical framework, and that framework is evolutionary game theory.

The connection between population genetics and game theory was made explicit by Maynard Smith and Price in 1973 with the introduction of the evolutionarily stable strategy (ESS). An ESS is a strategy that, once adopted by an entire population, cannot be invaded by any rare mutant strategy. Formally, strategy $s^{*}$ is an ESS if, for every alternative strategy $s$ , either (i) $w (s^{*}, s^{*}) > w (s, s^{*})$ — the resident strategy does strictly better against itself than the mutant does against the resident — or (ii) $w (s^{*}, s^{*}) = w (s, s^{*})$ and $w (s^{*}, s) > w (s, s)$ — the resident and mutant do equally well against the resident, but the resident does better in the small populations of mutants that exist near fixation. The first condition is the primary one; the second is a secondary stability requirement that ensures resistance to genetic drift between equally-fit alternatives.

The dynamical complement of the ESS concept is the replicator equation. For a population of $n$ strategies with frequencies $x_{i}$ and per-encounter payoffs $a_{ij}$ (payoff to strategy $i$ when meeting strategy $j$ ), the rate of change of strategy frequencies under frequency-dependent selection is

\overset{x}{˙}_{i} = x_{i} (f_{i} (x) - \overset{ˉ}{f} (x)), f_{i} (x) = j \sum a_{ij} x_{j}, \overset{ˉ}{f} (x) = i \sum x_{i} f_{i} (x) .

The structural resemblance to the deterministic Wright-Fisher equation is exact: a strategy grows at rate equal to the difference between its own expected payoff and the population-mean payoff. The difference is that $f_{i}$ now depends on $x$ , making the dynamical system genuinely nonlinear in the frequencies. The replicator equation reduces to the deterministic Wright-Fisher dynamics in the special case of constant payoffs $a_{ij} = w_{i}$ (independent of opponent), and reduces to the Lotka-Volterra equations under a logarithmic time change.

The three modes of selection have crisp replicator-equation analogues. Directional selection corresponds to a payoff matrix with a unique dominant strategy that drives all alternatives to extinction; this is an attractor on the boundary of the simplex. Stabilising selection in the game-theoretic sense corresponds to a mixed ESS — an interior equilibrium that is attractive under the dynamics, with the population maintaining a stable mixture of strategies. Disruptive selection in this framework corresponds to a saddle point at an interior equilibrium, with the dynamics flowing toward whichever pure-strategy boundary the initial condition is nearest to — and possibly to branching points in adaptive-dynamics theory, where the evolutionary trajectory through trait space splits into two diverging branches.

The classical worked example is the Hawk-Dove game, due to Maynard Smith. Two individuals contest a resource of value $V$ . Each chooses a strategy from ${H, D}$ ("Hawk" = escalate; "Dove" = display and retreat if challenged). The payoffs are tabulated as:

Opponent	Hawk	Dove
Hawk	$(V - C) /2$	$V$
Dove	$0$	$V /2$

Hawk vs Hawk both escalate, leading to injury cost $C$ shared on average — expected payoff $(V - C) /2$ per individual. Hawk vs Dove sees Dove yield, Hawk takes all of $V$ . Dove vs Dove split the resource amicably, $V /2$ each. The interesting regime is $C > V$ : fighting is more costly than the resource is worth. In this regime, neither pure strategy is an ESS: a population of all Hawks suffers $(V - C) /2 < 0$ per encounter, and a single Dove invader gets $0 > (V - C) /2$ ; a population of all Doves gets $V /2$ , but a Hawk invader extracts the full $V > V /2$ . The unique ESS is the mixed strategy "play Hawk with probability $p^{*} = V / C$ ." At this frequency, the expected payoff to Hawk and Dove are equal:

E [Hawk] = p^{*} \cdot \frac{V - C}{2} + (1 - p^{*}) \cdot V = (1 - p^{*}) \cdot \frac{V}{2} = E [Dove],

both equal to $V (C - V) / (2 C)$ . The replicator dynamics confirm this interior point is a global attractor on the interior of the unit interval: if $p < p^{*}$ , Hawks are doing better than the average and increase; if $p > p^{*}$ , Hawks are doing worse and decrease. The system settles at $p = V / C$ .

The biological interpretation is precise. Many real conflicts over resources in animal populations — territorial contests, mating displays, dominance hierarchies — show exactly the Hawk-Dove signature: the cost of escalated conflict outweighs the resource gain, populations maintain a stable mixture of aggressive and submissive strategies, and the equilibrium ratio matches the $V / C$ ratio of the relevant payoffs in the field. The mixed strategy can be interpreted as either a population polymorphism (some individuals are always Hawks, others always Doves, at frequencies $V / C$ and $1 - V / C$ ) or as a mixed individual strategy (every individual plays Hawk with probability $V / C$ in each encounter); the equilibrium analysis does not distinguish them, but additional biological information — the heritability of the strategy, the role of contextual cues — typically does.

Two important extensions of the basic Hawk-Dove framework are kin selection and public goods. In kin selection, the recipient of a strategy may be a genetic relative of the actor, so the fitness consequences must be weighted by the relatedness coefficient $r$ . Hamilton's rule $r B > C$ governs whether an apparently altruistic strategy can spread: the benefit $B$ to the recipient, weighted by relatedness $r$ , must exceed the cost $C$ to the actor. This is exactly the replicator equation for kin-structured populations, and is the subject of 19.03.03. In public-goods games, payoffs depend not on pairwise encounters but on the strategy composition of a whole group, with characteristic phenomena including the tragedy of the commons (cooperation undermined by free-riders) and the stabilising effect of punishment of defectors. Both extensions preserve the basic replicator-equation structure and recover the directional/stabilising/disruptive classification as different qualitative regimes of the underlying payoff matrix.

The deeper structural lesson is that the three modes of selection identified phenotypically — directional, stabilising, disruptive — are not three separate mechanisms but three regimes of the same underlying dynamical system. Whether the fitness landscape is flat-with-slope (directional), domed (stabilising), or saddle-shaped (disruptive) determines the qualitative behaviour, but the equations of motion are identical. Adding frequency-dependence allows the landscape itself to deform with the population's location on it, producing genuinely new phenomena like protected polymorphisms and evolutionary branching that the fixed-landscape framework cannot describe. Sexual selection 19.03.02 is one of the most important biological arenas where frequency-dependence dominates: the fitness of a mating display is a function of how many other individuals carry it and how many prefer it, leading to runaway dynamics, Fisherian sons-and-daughters arguments, and the full apparatus of co-evolutionary game theory.

Connections [Master]

Mendelian genetics 19.01.01 pending provides the genotype-phenotype framework that selection acts upon. Allele frequencies change because genotypes differ in fitness.
Hardy-Weinberg equilibrium 19.02.01 pending is the null model against which selection is detected. Deviation from Hardy-Weinberg expectations at a locus is a signature of selection (or other evolutionary forces).
Sexual selection 19.03.02 is a special form of natural selection where the fitness differences arise from variation in mating success rather than survival.
Genetic drift 19.04.01 is the other major force changing allele frequencies. The relative strength of selection versus drift is quantified by $N s$ (population size times selection coefficient): when $N s ≫ 1$ , selection dominates; when $N s ≪ 1$ , drift dominates.
Wright-Fisher model and the diffusion approximation 19.02.05. The selection-coupled stochastic dynamics analysed in the Master sub-sections above (stochastic drift and the diffusion limit; deterministic Wright-Fisher dynamics) is developed in full in the dedicated Wright-Fisher unit. The selection equation $Δ p = s p (1 - p) / \overset{w}{ˉ}$ derived here is the deterministic limit of the Wright-Fisher drift coefficient, and Kimura's fixation probability $u (p) = (1 - e^{- 2 N s p}) / (1 - e^{- 2 N s})$ developed in 19.02.05 is the finite-population stochastic correction to the deterministic fixation criterion. The strong-selection asymptote $u \approx 2 s$ recovers Haldane's 1927 rule. The scaling parameter $2 N_{e} s$ separates the regimes treated here (deterministic) from those treated in 19.02.05 (stochastic), and the Kingman coalescent dual developed there is the genealogical substrate underneath the selection-coupled forward dynamics analysed here.
Molecular biology 17.05.01 pending provides the mechanistic basis for mutations that create the variation upon which selection acts.

Historical & philosophical context [Master]

Darwin's On the Origin of Species (1859) presented natural selection as the mechanism driving evolutionary change, built on two observations (variation exists, and more offspring are produced than can survive) and one inference (those with favourable variations survive and reproduce more). Darwin lacked a theory of heredity — he proposed an incorrect blending-inheritance mechanism — and it was not until the rediscovery of Mendel's laws in 1900 that the genetic basis for selection could be understood.

The modern synthesis (1930s-1950s) unified Mendelian genetics with Darwinian selection. Fisher's The Genetical Theory of Natural Selection (1930) provided the mathematical framework, showing that continuous variation could be explained by many Mendelian loci with small effects. Haldane (1932) worked out the dynamics of selection at single loci. Wright (1932) developed the adaptive landscape metaphor and emphasised the role of drift. Together, these three established population genetics as the mathematical core of evolutionary theory.

The peppered moth story, studied by Kettlewell in the 1950s, became the textbook example of natural selection in action. More recently, the Grants' long-term study of Darwin's finches on Daphne Major (1973-present) has provided one of the most detailed records of natural selection in the wild, showing measurable evolutionary changes over single-year timescales in response to drought and food availability.

Philosophically, natural selection raises the question of whether evolution is teleological (goal-directed). It is not. Selection has no foresight; it favours whatever works now, regardless of future consequences. The appearance of design in organisms is the product of a mindless algorithm: variation, differential reproduction, and inheritance, iterated over many generations. Richard Dawkins's metaphor of the "blind watchmaker" captures this precisely.

Bibliography [Master]

Darwin, C., On the Origin of Species by Means of Natural Selection (John Murray, 1859).
Fisher, R. A., The Genetical Theory of Natural Selection (Clarendon Press, 1930).
Haldane, J. B. S., The Causes of Evolution (Longmans Green, 1932).
Wright, S., "The roles of mutation, inbreeding, crossbreeding, and selection in evolution", Proc. 6th Int. Congr. Genet. 1 (1932), 356-366.
Kettlewell, H. B. D., "Selection experiments on industrial melanism in the Lepidoptera", Heredity 9 (1955), 323-342.
Grant, P. R. & Grant, B. R., How and Why Species Multiply: The Radiation of Darwin's Finches (Princeton UP, 2008).
Futuyma, D. J., Evolution, 4th ed. (Sinauer, 2017).
Hartl, D. L. & Clark, A. G., Principles of Population Genetics, 4th ed. (Sinauer, 2007).

Prerequisites

19.01.01 pending

Tier anchors

beginner: Campbell Biology 12th ed. Ch. 22-23; Coyne Why Evolution Is True; Crash Course Biology evolution episodes
intermediate: Futuyma Evolution 4th ed. Ch. 10-11; Hartl & Clark Principles of Population Genetics 4th ed. Ch. 3-4
master: Futuyma Ch. 11-12; Crow & Kimura Introduction to Population Genetics Theory; primary literature — Darwin 1859, Fisher 1930, Haldane 1932

References

TODO_REF pending
Darwin, C. — On the Origin of Species by Means of Natural Selection (John Murray, 1859) · Originator monograph for the theory of natural selection · see docs/catalogs/NEED_TO_SOURCE.md#bio-darwin-1859
TODO_REF pending
Fisher, R. A. — The Genetical Theory of Natural Selection (Clarendon Press, 1930) · Originator monograph for the mathematical theory of selection · see docs/catalogs/NEED_TO_SOURCE.md#bio-fisher-1930
TODO_REF pending
Futuyma, D. J. — Evolution, 4th ed. (Sinauer, 2017) · Ch. 10-11 Natural selection and adaptation · see docs/catalogs/NEED_TO_SOURCE.md#bio-futuyma-2017
TODO_REF pending
Hartl, D. L. & Clark, A. G. — Principles of Population Genetics, 4th ed. (Sinauer, 2007) · Ch. 3-4 Selection and mutation · see docs/catalogs/NEED_TO_SOURCE.md#bio-hartl-clark-2007
TODO_REF pending
Kettlewell, H. B. D. — Selection experiments on industrial melanism in the Lepidoptera (Heredity 9, 323-342, 1955) · Classic field study of natural selection in peppered moths · see docs/catalogs/NEED_TO_SOURCE.md#bio-kettlewell-1955
tong
raw/pdfs/mathbio/mathbio.pdf · Mathematical biology background — continuous population models and fitness landscape dynamics

Reviewer

Tyler (pending external biology reviewer per BIOLOGY_PLAN §6)

Estimated time

beginner: 14m
intermediate: 35m
master: 55m