21.14.04 · number-theory / sieve-methods-large-sieve

Combinatorial Sieve Methods: Brun and Selberg

shipped3 tiersLean: none

Anchor (Master): Iwaniec-Kowalski 2004 *Analytic Number Theory* Ch. 6 (combinatorial and Selberg sieves, sieve dimension, the parity problem); Halberstam-Richert 1974 *Sieve Methods* (the canonical monograph); Friedlander-Iwaniec 2010 *Opera de Cribro* (AMS Colloquium 57) Ch. 1-6 (modern combinatorial sieve, the parity barrier and its evasion); Brun 1919/1920 (the original pure sieve and twin-prime convergence); Selberg 1947 *Norsk Vid. Selsk. Forh.* 19 (the quadratic upper-bound sieve)

Intuition Beginner

Suppose you want to count the prime numbers up to a hundred. The oldest method, the sieve of Eratosthenes, is to list every number, cross out the multiples of $2$ , then the multiples of $3$ , then of $5$ , and so on; whatever survives is prime. The crossing-out can be turned into an exact count using a bookkeeping trick: add back what you removed twice, subtract what you added back too often, and keep correcting. This back-and-forth correction is called inclusion-exclusion, and for counting primes it goes by the name of Legendre.

The trouble is that the corrections pile up. Each one carries a small rounding error, and there are so many corrections that the errors swamp the answer. Legendre's exact count is useless for large numbers because of this error flood. Viggo Brun's breakthrough, around $1919$ , was simple to state: stop correcting early. If you cut off the inclusion-exclusion after a controlled number of steps, the leftover is provably an over-count or an under-count, never a wild error. You trade an exact-but-useless formula for an approximate-but-trustworthy bound.

Brun used his sieve to prove something striking about twin primes — pairs like $(11, 13)$ or $(29, 31)$ . Nobody knows whether there are infinitely many, but Brun showed that if you add up one-over-each-twin-prime, the total settles down to a finite number. Twin primes are rare enough that their reciprocals converge, even though ordinary primes' reciprocals do not.

Visual Beginner

A row of the numbers $1$ through $30$ . First the multiples of $2$ are shaded, then a second pass shades the multiples of $3$ , then a third pass the multiples of $5$ . Numbers shaded by two different passes (like $6$ , shaded by both $2$ and $3$ ) are double-counted if you simply add the shaded counts, which is why a correction is needed. The unshaded survivors past the first few are exactly the primes between $5$ and $30$ together with $1$ .

sifting primes used	numbers removed	over/under after adding	correction
just $2$	$15$	exact	none
$2, 3$	$20$	over-removed $6, 12, 18, 24, 30$	add back $5$
$2, 3, 5$	$22$	over- and under-counts	add/subtract overlaps

The table shows the seesaw: each new prime you sift by removes some numbers but creates new double-counts among the multiples that share two primes. The full correction is an alternating sum, and Brun's idea is that chopping that alternating sum off at the right place still pens the true count between known limits.

Worked example Beginner

Count, by inclusion-exclusion, how many numbers from $1$ to $30$ have no prime factor among $2, 3, 5$ .

Step 1. Start with all $30$ numbers. Remove multiples of $2$ (there are $15$ ), of $3$ (there are $10$ ), of $5$ (there are $6$ ). Naive total removed: $15 + 10 + 6 = 31$ , already more than $30$ , because of overlaps.

Step 2. Add back the numbers removed twice: multiples of $2 \times 3 = 6$ (five of them), of $2 \times 5 = 10$ (three), of $3 \times 5 = 15$ (two). Add back $5 + 3 + 2 = 10$ .

Step 3. Subtract the numbers removed three times: multiples of $2 \times 3 \times 5 = 30$ (just one, namely $30$ ). Subtract $1$ .

Step 4. The count of survivors is $30 - 31 + 10 - 1 = 8$ . Check: the numbers from $1$ to $30$ with no factor of $2, 3, 5$ are $1, 7, 11, 13, 17, 19, 23, 29$ — exactly eight.

What this tells us: the exact answer came from a four-term alternating sum that matched the structure $30 \times (1 - \frac{1}{2}) (1 - \frac{1}{3}) (1 - \frac{1}{5}) = 30 \times \frac{4}{30} = 8$ . The product of the "survival fractions" predicts the count. Brun's sieve keeps this product as the main term and bounds the error from stopping the alternating sum early.

Check your understanding Beginner

Exercise (easy, multiple choice).

Why is Legendre's exact inclusion-exclusion formula for counting primes hard to use for large $x$ ?

A. It gives the wrong answer
B. The number of correction terms is huge and their rounding errors accumulate
C. It only works for $x$ a power of $2$
D. It requires knowing all the primes in advance

Hint

Each divisor $d$ of the product of sifting primes contributes a term, and each term replaces a count by an approximation with a small error.

Answer

B. Legendre's formula is a sum over all $2^{k}$ products of the first $k$ sifting primes; each term approximates a count of multiples and carries a rounding error of size up to $1$ , so the accumulated error can reach $2^{k}$ , dwarfing the main term. Option A is wrong — the formula is exact. Option D is partly true of any sieve but is not the reason it fails. Brun's fix is to truncate so that only a controlled number of terms appear.

Formal definition Intermediate+

We follow Iwaniec-Kowalski ^{[Iwaniec-Kowalski Ch. 6]}. Throughout, $p$ denotes a prime and $μ$ , $ω (d)$ (the number of distinct prime factors), and squarefreeness are as in 21.11.01.

Definition (the sieve problem). Let $A = (a_{n})_{n \leq N}$ be a finite sequence of nonnegative reals, $X = \sum_{n} a_{n}$ its total mass, and $P$ a set of primes. For a divisor $d$ supported on $P$ put $A_{d} = \sum_{n \equiv 0 (d)} a_{n}$ , the mass of the subsequence with index divisible by $d$ . Fix a sifting level $z \geq 2$ and write $$ P(z) = \prod_{\substack{p < z \ p \in \mathcal{P}}} p . $$ The sifting function is the surviving mass $$ S(\mathcal{A}, \mathcal{P}, z) = \sum_{\substack{n \le N \ (n, P(z)) = 1}} a_n . $$ To make the problem tractable one posits a density: a multiplicative function $g$ with $0 \leq g (p) < 1$ and an approximation $$ A_d = g(d), X + r_d, \qquad g(d) = \prod_{p \mid d} g(p), $$ where $r_{d}$ is the remainder and $g (p)$ is the expected proportion of $A$ killed by $p$ . Write $V (z) = \prod_{p < z, p \in P} (1 - g (p))$ for the sifting density, the heuristic survival probability.

Definition (Legendre sieve). Since $1_{(n, P (z)) = 1} = \sum_{d ∣ (n, P (z))} μ (d)$ by Möbius inversion of 21.11.01, $$ S(\mathcal{A}, \mathcal{P}, z) = \sum_{d \mid P(z)} \mu(d), A_d = X \sum_{d \mid P(z)} \mu(d), g(d) + \sum_{d \mid P(z)} \mu(d), r_d = X,V(z) + \sum_{d \mid P(z)} \mu(d), r_d . $$ The main term $X V (z)$ is exactly what the survival-fraction product predicts. The defect of the Legendre identity is the remainder sum: $P (z)$ has $\sim z / lo g z$ prime factors, so it has $2^{π (z)}$ squarefree divisors, and even if $∣ r_{d} ∣ \leq 1$ the error can reach $2^{π (z)}$ , which is far larger than $X$ unless $z$ is tiny. The exact identity is therefore inert.

Definition (sieve dimension). The density $g$ has dimension $κ \geq 0$ if $$ \sum_{w \le p < z} g(p) \log p = \kappa \log\frac{z}{w} + O(1) \qquad (2 \le w \le z), $$ equivalently $V (z) ≍ (lo g z)^{- κ}$ . The case $κ = 1$ (the linear sieve) governs sequences like $A = {n (n + 2) : n \leq x}$ used for twin primes, where $g (p) = 2/ p$ for odd $p$ .

A truncated inclusion-exclusion replaces $μ$ by a sieve weight $(λ_{d})$ supported on squarefree $d ∣ P (z)$ with $λ_{1} = 1$ , chosen so the surviving sum is one-sided. An upper-bound sieve requires $\sum_{d ∣ n} λ_{d} \geq 1_{n = 1}$ ; a lower-bound sieve requires $\sum_{d ∣ n} λ_{d} \leq 1_{n = 1}$ . Then $S (A, P, z) \leq$ or $\geq X \sum_{d} λ_{d} g (d) + \sum_{d} ∣ λ_{d} ∣ ∣ r_{d} ∣$ , and the art is choosing $(λ_{d})$ with main term close to $V (z)$ and few nonzero entries.

Counterexamples to common slips

"The Legendre identity is wrong." It is exactly correct; it is merely useless, because the controlled main term $X V (z)$ is accompanied by a remainder sum over $2^{π (z)}$ divisors whose errors need not cancel. Truncation does not fix an error in the identity — it limits how many remainder terms appear.
"A sieve detects primes." A sieve bounds the count of integers with no small prime factor — almost-primes — not primes themselves. The parity problem shows a sieve of this combinatorial type cannot distinguish integers with an even number of prime factors from those with an odd number, so it cannot isolate the primes (one prime factor) on its own.
"Selberg's weights are an approximation to $μ$ ." The Selberg weights $λ_{d}$ minimise a quadratic form and are generally not truncations of $μ (d)$ ; they are real numbers in $[- 1, 1]$ chosen by optimisation, and only the constraint $λ_{1} = 1$ ties them to the Möbius identity.

Key theorem with proof Intermediate+

The signature theorem is Selberg's upper-bound sieve: a single nonnegativity trick produces, after diagonalising a quadratic form, an upper bound for the sifting function with main term $1/ G (z)$ .

Theorem (Selberg's $Λ^{2}$ upper bound). Let $g$ be a multiplicative density with $0 \leq g (p) < 1$ for $p ∣ P (z)$ , and define the multiplicative function $h$ on squarefree $d ∣ P (z)$ by $h (d) = \prod_{p ∣ d} \frac{g ( p )}{1 - g ( p )}$ , together with $G (z) = \sum_{d ∣ P (z), d < z} h (d)$ . Then $$ S(\mathcal{A}, \mathcal{P}, z) ;\le; \frac{X}{G(z)} ;+; \sum_{\substack{d_1, d_2 < z \ d_i \mid P(z)}} |\lambda_{d_1} \lambda_{d_2}|; \big|r_{[d_1, d_2]}\big|, $$ where the optimal weights are $λ_{d} = μ (d) \frac{g ( d )}{1 - ( \dots )} \cdot \frac{G _{d} ( z )}{G ( z )}$ with $G_{d} (z) = \sum_{e < z / d, (e, d) = 1} h (e)$ , and $∣ λ_{d} ∣ \leq 1$ .

Proof. For any reals $(λ_{d})_{d ∣ P (z)}$ with $λ_{1} = 1$ , the square

(d ∣ (n, P (z)) \sum λ_{d})^{2} \geq 1_{(n, P (z)) = 1},

because when $(n, P (z)) = 1$ the only divisor is $d = 1$ and the left side is $λ_{1}^{2} = 1$ , while otherwise the left side is a square, hence $\geq 0$ . Summing $a_{n}$ against this inequality,

S (A, P, z) \leq n \sum a_{n} (d ∣ (n, P (z)) \sum λ_{d})^{2} = d_{1}, d_{2} \sum λ_{d_{1}} λ_{d_{2}} A_{[d_{1}, d_{2}]},

since $d_{1} ∣ n$ and $d_{2} ∣ n$ together mean $[d_{1}, d_{2}] ∣ n$ . Inserting $A_{[d_{1}, d_{2}]} = g ([d_{1}, d_{2}]) X + r_{[d_{1}, d_{2}]}$ and using that $g$ is multiplicative with $g ([d_{1}, d_{2}]) g ((d_{1}, d_{2})) = g (d_{1}) g (d_{2})$ on squarefree arguments, the main term is the quadratic form

Q (λ) = d_{1}, d_{2} < z \sum λ_{d_{1}} λ_{d_{2}} g ([d_{1}, d_{2}]) .

Diagonalise $Q$ . Write $g ([d_{1}, d_{2}]) = g (d_{1}) g (d_{2}) / g ((d_{1}, d_{2}))$ and use the identity $\frac{1}{g (( d _{1} , d _{2} ))} = \sum_{e ∣ (d_{1}, d_{2})} h (e)$ , valid because $h$ is the multiplicative function with $\sum_{e ∣ m} h (e) = 1/ g (m)$ on squarefree $m$ (locally $1 + \frac{g ( p )}{1 - g ( p )} = \frac{1}{1 - g ( p )}$ , and the telescoping product gives $1/ g$ only after the substitution below; concretely set $h (p) = g (p) / (1 - g (p))$ ). Substituting and swapping the order of summation,

Q (λ) = e < z \sum h (e) (d < z, e ∣ d \sum λ_{d} g (d))^{2} =: e < z \sum h (e) y_{e}^{2}, y_{e} = e ∣ d \sum λ_{d} g (d) .

The change of variables $λ_{d} \mapsto y_{e}$ is invertible by Möbius inversion, and the constraint $λ_{1} = 1$ becomes the linear constraint $\sum_{e} μ (e) y_{e} = 1$ (taking $d = 1$ ). Minimising the positive-definite form $\sum_{e} h (e) y_{e}^{2}$ subject to $\sum_{e} μ (e) y_{e} = 1$ is a Lagrange-multiplier problem; the minimum is attained at $y_{e} = μ (e) / (h (e) G (z))$ with $G (z) = \sum_{e < z} μ (e)^{2} / h (e)$ . Since $μ (e)^{2} / h (e) = \prod_{p ∣ e} \frac{1 - g ( p )}{g ( p )}$ , a routine re-indexing identifies $G (z)$ with $\sum_{d < z} h (d)$ for the reciprocal density, and the minimal value of $Q$ is $1/ G (z)$ . Thus the main term is $X / G (z)$ , and tracking the remainder through $A_{[d_{1}, d_{2}]}$ gives the stated error sum. The bound $∣ λ_{d} ∣ \leq 1$ follows from the explicit formula for $λ_{d}$ in terms of the partial sums $G_{d} (z) / G (z) \leq 1$ . $□$

Bridge. Selberg's sieve builds toward 21.16.04 pending the large sieve and Bombieri-Vinogradov, where the same quadratic-form positivity reappears as the dual large-sieve inequality and the remainder sum $\sum ∣ r_{[d_{1}, d_{2}]} ∣$ is controlled on average over moduli; it appears again in 21.14.03 the linear sieve, where the Selberg upper bound is paired with a matching lower bound to pin almost-primes. The foundational reason the method works is that $(\sum_{d ∣ n} λ_{d})^{2}$ is a nonnegative majorant of $1_{n = 1}$ for any weights with $λ_{1} = 1$ , so optimisation over a finite-dimensional space of weights replaces the divergent Legendre alternating sum; this is exactly the move from an exact-but-inert identity to an inequality with a controllable error. The diagonalisation $Q (λ) = \sum_{e} h (e) y_{e}^{2}$ generalises the orthogonality that the large sieve will exploit, and the main term $1/ G (z) \sim V (z)$ as $z \to \infty$ shows the optimised bound recovers the heuristic survival density. Putting these together, the Selberg main term is dual to Brun's truncated inclusion-exclusion: both approximate $X V (z)$ , one by an optimised quadratic majorant and one by a Bonferroni-truncated alternating sum, and the central insight is that controlling the number of remainder terms — not their individual size — is what rescues the sieve.

Exercises Intermediate+

Exercise 3 (medium, symbolic).

Prove the Bonferroni inequalities: for any finite collection of events and the indicator of "no event occurs", the truncated inclusion-exclusion sum $\sum_{∣ S ∣ \leq 2 k} (- 1)^{∣ S ∣} 1_{⋂_{i \in S}}$ is an over-count, and the truncation at $\leq 2 k + 1$ is an under-count. Deduce that Brun's truncation of $\sum_{d ∣ P (z)} μ (d) A_{d}$ at $ω (d) \leq 2 k$ gives an upper bound for $S$ .

Hint

For a number $n$ with exactly $r \geq 1$ of the sifting primes dividing it, the relevant truncated sum is $\sum_{j = 0}^{2 k} (j r) (- 1)^{j}$ . Show this partial binomial sum has sign $(- 1)^{2 k} = +$ relative to the tail.

Answer

Fix $n$ and let $r = ω ((n, P (z)))$ be the number of sifting primes dividing $n$ . The full sum $\sum_{d ∣ (n, P (z))} μ (d) = \sum_{j = 0}^{r} (j r) (- 1)^{j} = (1 - 1)^{r} = 1_{r = 0}$ . The truncation keeping $ω (d) \leq 2 k$ is $T_{2 k} (r) = \sum_{j = 0}^{m i n (2 k, r)} (j r) (- 1)^{j}$ . The identity $\sum_{j = 0}^{m} (j r) (- 1)^{j} = (- 1)^{m} (m r - 1)$ (proved by induction on $m$ via Pascal's rule) gives $T_{2 k} (r) - 1_{r = 0} = (- 1)^{2 k} (2 k r - 1) = (2 k r - 1) \geq 0$ for $r \geq 1$ , and $= 0$ for $r = 0$ . Hence $T_{2 k} (r) \geq 1_{r = 0}$ pointwise, and similarly $T_{2 k + 1} (r) = 1_{r = 0} - (2 k + 1 r - 1) \leq 1_{r = 0}$ . Multiplying by $a_{n} \geq 0$ and summing, $\sum_{n} a_{n} T_{2 k} (ω (n, P (z))) \geq \sum_{n} a_{n} 1_{(n, P (z)) = 1} = S$ . The left side equals $\sum_{d ∣ P (z), ω (d) \leq 2 k} μ (d) A_{d}$ , so Brun's even truncation is an upper bound. Rubric: full credit for the partial-binomial identity and the sign deduction; the lower bound is the odd-truncation analogue.

Exercise 4 (medium, symbolic).

Show that Selberg's quadratic form $Q (λ) = \sum_{d_{1}, d_{2}} λ_{d_{1}} λ_{d_{2}} g ([d_{1}, d_{2}])$ is positive definite on the space of weights, and that its minimum subject to $λ_{1} = 1$ is $1/ G (z)$ where $G (z) = \sum_{d < z} h (d)$ .

Hint

Use the diagonalisation $Q (λ) = \sum_{e} h (e) y_{e}^{2}$ with $y_{e} = \sum_{e ∣ d} λ_{d} g (d)$ , then minimise $\sum_{e} h (e) y_{e}^{2}$ under the linear constraint $\sum_{e} μ (e) y_{e} = 1$ by Cauchy-Schwarz.

Answer

After the substitution $g ([d_{1}, d_{2}]) = \sum_{e ∣ (d_{1}, d_{2})} h (e) g (d_{1}) g (d_{2})$ (valid because $\sum_{e ∣ m} h (e) = 1/ g (m)$ on squarefree $m$ for $h (p) = g (p) / (1 - g (p))$ ), swapping summation order gives $Q (λ) = \sum_{e} h (e) y_{e}^{2}$ with $y_{e} = \sum_{e ∣ d} λ_{d} g (d)$ . Since $h (e) > 0$ , $Q \geq 0$ with equality only if all $y_{e} = 0$ , i.e. $λ \equiv 0$ ; so $Q$ is positive definite. The constraint $λ_{1} = 1$ in the $y$ -variables is $\sum_{e} μ (e) y_{e} = 1$ (invert $y_{e} = \sum_{e ∣ d} λ_{d} g (d)$ at $d = 1$ ). By Cauchy-Schwarz, $$ 1 = \Big(\sum_e \mu(e) y_e\Big)^2 = \Big(\sum_e \frac{\mu(e)}{\sqrt{h(e)}} \cdot \sqrt{h(e)}, y_e\Big)^2 \le \Big(\sum_e \frac{\mu(e)^2}{h(e)}\Big)\Big(\sum_e h(e) y_e^2\Big) = G(z), Q(\lambda), $$ so $Q (λ) \geq 1/ G (z)$ , with equality when $h (e) y_{e} \propto μ (e) / h (e)$ , i.e. $y_{e} = μ (e) / (h (e) G (z))$ . Re-indexing $μ (e)^{2} / h (e) = \prod_{p ∣ e} (1 - g (p)) / g (p)$ identifies $\sum_{e} μ (e)^{2} / h (e)$ with $\sum_{d < z} h^{*} (d)$ for the reciprocal density, written $G (z)$ . Rubric: full credit for positive-definiteness, the Cauchy-Schwarz lower bound, and the equality case.

Exercise 5 (medium, symbolic).

Using the twin-prime density $g (p) = 2/ p$ for odd $p$ and $g (2) = 1/2$ (the sequence $A = {n (n + 2) : n \leq x}$ ), show $V (z) = \prod_{p < z} (1 - g (p)) ≍ (lo g z)^{- 2}$ , identifying the sieve dimension $κ$ .

Hint

Take logarithms: $lo g V (z) = \sum_{p < z} lo g (1 - g (p)) = - \sum_{p < z} g (p) + O (1)$ . Use Mertens' theorem $\sum_{p < z} 1/ p = lo g lo g z + O (1)$ from 21.11.03.

Answer

For $p$ odd, $g (p) = 2/ p$ , so $lo g (1 - g (p)) = - 2/ p + O (1/ p^{2})$ . Summing, $$ \log V(z) = \sum_{p < z}\log(1 - g(p)) = -\sum_{2 < p < z}\frac{2}{p} + O(1) = -2\big(\log\log z + O(1)\big) + O(1) = -2\log\log z + O(1), $$ using Mertens' second theorem $\sum_{p < z} 1/ p = lo g lo g z + M + o (1)$ from 21.11.03 and the convergence of $\sum_{p} 1/ p^{2}$ . Exponentiating, $V (z) = e^{O (1)} (lo g z)^{- 2} ≍ (lo g z)^{- 2}$ . Comparing with $V (z) ≍ (lo g z)^{- κ}$ gives dimension $κ = 2$ — the twin-prime / Goldbach sieve is two-dimensional because each prime $p$ removes two residue classes (the roots of $n (n + 2) \equiv 0$ ). Rubric: full credit for the Mertens input and the identification $κ = 2$ .

Exercise 6 (medium, symbolic).

State and apply the fundamental lemma: if $g$ has dimension $κ$ and $s = lo g X / lo g z \geq 1$ , then $S (A, P, z) = X V (z) (1 + O (e^{- s})) + (remainder)$ . For $A = {n \leq x}$ , $z = x^{1/ s}$ , estimate the count of integers $\leq x$ free of prime factors below $z$ when $s$ is large.

Hint

Here $g (p) = 1/ p$ , $κ = 1$ , and $V (z) \sim e^{- γ} / lo g z$ by Mertens' third theorem. Substitute $z = x^{1/ s}$ .

Answer

The fundamental lemma asserts that for a $κ$ -dimensional sieve with $s = lo g X / lo g z$ , both the upper and lower Brun (or Selberg) bounds equal $X V (z) (1 + O (e^{- s l o g s + O (s)}))$ , so for fixed large $s$ the relative error is $O (e^{- s})$ uniformly, with a remainder $\sum_{d < z^{O (s)}} ∣ r_{d} ∣$ . For $A = {n \leq x}$ , $g (p) = 1/ p$ (dimension $1$ ), Mertens gives $V (z) \sim e^{- γ} / lo g z$ , so the count of $n \leq x$ with no prime factor $< z = x^{1/ s}$ is $$ \Phi(x, z) \sim x V(z)(1 + O(e^{-s})) \sim \frac{e^{-\gamma} x}{\log z} = \frac{e^{-\gamma} s, x}{\log x}. $$ The factor $e^{- γ}$ is the Mertens constant of 21.11.03; the true Buchstab function $ω (s)$ refines $e^{- γ}$ for bounded $s$ , but the fundamental lemma's $X V (z)$ is exact to within $O (e^{- s})$ as $s \to \infty$ . Rubric: full credit for stating the lemma with the $O (e^{- s})$ error and substituting Mertens' third theorem.

Exercise 7 (hard, symbolic).

Prove the twin-prime upper bound $π_{2} (x) = # {p \leq x : p + 2 prime} ≪ x (lo g lo g x)^{2} / (lo g x)^{2}$ from Brun's pure sieve, and deduce that $\sum_{p : p + 2 prime} 1/ p$ converges (Brun's theorem).

Hint

Sieve $A = {n (n + 2) : n \leq x}$ by primes $p < z$ with $z = x^{1/ (10 l o g l o g x)}$ or similar; the dimension is $κ = 2$ , so $V (z) ≍ (lo g z)^{- 2}$ . Twin primes $p \leq x$ with $p > z$ survive the sieve. Then apply Abel summation against the count.

Answer

Take $A = {n (n + 2) : n \leq x}$ , sift by all primes $p < z$ . Any twin prime pair $(p, p + 2)$ with $z \leq p \leq x$ produces an $n = p$ with $n (n + 2)$ free of prime factors $< z$ , so $π_{2} (x) \leq S (A, P, z) + z$ . Brun's pure sieve with truncation parameter $k ≍ lo g lo g x$ gives $S (A, P, z) ≪ x V (z) + R$ , where the remainder $R ≪ \sum_{d < z^{2 k}} ∣ r_{d} ∣ ≪ z^{2 k} \leq x^{1/2}$ for $z = x^{1/ (20 l o g l o g x)}$ and the implied truncation, hence negligible. With $κ = 2$ , $V (z) ≍ (lo g z)^{- 2} ≍ (lo g lo g x)^{2} / (lo g x)^{2}$ for this $z$ . Therefore $π_{2} (x) ≪ x (lo g lo g x)^{2} / (lo g x)^{2}$ . For convergence of $\sum 1/ p$ over twin primes, set $T (x) = π_{2} (x)$ and apply Abel summation: $$ \sum_{\substack{p \le x \ p+2 \text{ prime}}}\frac1p = \frac{T(x)}{x} + \int_2^x \frac{T(t)}{t^2},dt \ll \int_2^\infty \frac{(\log\log t)^2}{t (\log t)^2},dt < \infty, $$ since the integrand is $≪ (lo g t)^{- 3/2} / t$ for large $t$ , which is integrable to infinity. Hence the twin-prime reciprocal sum converges to a finite limit (Brun's constant). Rubric: full credit for the sieve bound with the correct $κ = 2$ density, the negligible-remainder choice of $z$ , and the Abel-summation convergence.

Exercise 8 (hard, symbolic).

Apply the Selberg sieve to bound the number of representations of an even $N$ as a sum of two primes: show $r (N) = # {p + q = N : p, q prime} ≪ \prod_{p ∣ N} (1 + \frac{1}{p}) \frac{N}{( lo g N ) ^{2}}$ , the expected order of the Goldbach count up to the singular-series constant.

Hint

Sieve $A = {n (N - n) : n \leq N}$ by primes $p < z = N^{1/4}$ . The density is $g (p) = 2/ p$ for $p ∤ N$ and $g (p) = 1/ p$ for $p ∣ N$ ; dimension $κ = 2$ . The Selberg main term is $N / G (z)$ .

Answer

Each Goldbach representation $N = p + q$ with $p, q > z$ gives an $n = p \leq N$ with $n (N - n) = pq$ free of prime factors $< z$ , so $r (N) \leq S (A, P, z) + 2 z$ for $A = {n (N - n)}$ . The local density is $g (p) = 2/ p$ when $p ∤ N$ (two roots of $n (N - n) \equiv 0 mod p$ ) and $g (p) = 1/ p$ when $p ∣ N$ (the roots coincide), so $κ = 2$ . Selberg's theorem gives $S \leq N / G (z) + \sum_{d_{1}, d_{2} < z} ∣ r_{[d_{1}, d_{2}]} ∣$ , and with $z = N^{1/4}$ the remainder is $≪ \sum_{d < N^{1/2}} τ (d) ≪ N^{1/2} lo g N$ , negligible against the main term. Evaluating $G (z) = \sum_{d < z} h (d)$ with $h (p) = g (p) / (1 - g (p))$ : for the two-dimensional density $G (z) ≍ \frac{1}{2} S (N)^{- 1} (lo g z)^{2}$ where the local factors assemble into the singular series $S (N) = \prod_{p ∣ N} \frac{p - 1}{p - 2} \prod_{p ∤ N} (1 - \frac{1}{( p - 1 ) ^{2}}) ≍ \prod_{p ∣ N} (1 + 1/ p)$ up to an absolute constant. Hence $$ r(N) \ll \frac{N}{G(z)} \ll \mathfrak{S}(N),\frac{N}{(\log z)^2} \ll \prod_{p \mid N}\Big(1 + \frac1p\Big),\frac{N}{(\log N)^2}, $$ using $lo g z = \frac{1}{4} lo g N$ . This is the correct order: the Hardy-Littlewood conjecture predicts $r (N) \sim S (N) N / (lo g N)^{2}$ , and the sieve delivers the upper bound with the right constant up to a factor (the sieve gives an upper bound $\leq C S (N) N / (lo g N)^{2}$ with $C$ near $8$ in the unoptimised form). Rubric: full credit for the density with the $p ∣ N$ distinction, the $κ = 2$ main term $N / G (z)$ , and the singular-series identification.

Advanced results Master

Theorem 1 (Brun's pure sieve and the fundamental lemma). For a $κ$ -dimensional density $g$ and $z \leq X^{1/ s}$ , the truncated Möbius sum over squarefree $d ∣ P (z)$ with $ω (d) \leq 2 k$ satisfies ^{[Iwaniec-Kowalski Ch. 6]} $$ S(\mathcal{A}, \mathcal{P}, z) = X V(z)\Big(1 + O\big(e^{-s(\log s - \log\log 3s - 2)}\big)\Big) + O\Big(\sum_{\substack{d < z^{2k} \ d \mid P(z)}} |r_d|\Big), $$ with both an upper bound (even truncation $2 k$ ) and a lower bound (odd truncation $2 k + 1$ ) by the Bonferroni inequalities (Exercise 3). The error $e^{- s l o g s + O (s)}$ is the fundamental lemma: once the sifting level $z$ is a small power of $X$ ( $s$ large), the sifting function is asymptotic to the heuristic $X V (z)$ with super-polynomially small relative error. Brun's original $1919$ version ^{[Brun 1920]}, with cruder truncation, already sufficed for the twin-prime bound $π_{2} (x) ≪ x (lo g lo g x)^{2} / (lo g x)^{2}$ and the convergence of $\sum_{twins} 1/ p$ to Brun's constant.

Theorem 2 (Selberg's sieve and its self-duality). The $Λ^{2}$ upper bound of the Key Theorem ^{[Selberg 1947]} gives $S \leq X / G (z) + R$ with $G (z) = \sum_{d < z} h (d)$ , and is optimal among all upper-bound sieves of the form $(\sum_{d} λ_{d})^{2}$ . The Selberg weights, unlike Brun's, are not $\pm 1$ but real numbers obtained by minimising a positive-definite quadratic form, and the diagonalisation $Q (λ) = \sum_{e} h (e) y_{e}^{2}$ is the discrete analogue of completing the square. As $z \to \infty$ , $1/ G (z) \sim V (z)$ , so the optimised constant matches the heuristic; for the linear sieve $κ = 1$ the Selberg upper bound is within a factor $2 + o (1)$ of the truth, the deficit being precisely the parity barrier.

Theorem 3 (the sieve dimension and the linear sieve). The hypothesis $\prod_{w \leq p < z} (1 - g (p))^{- 1} = (lo g z / lo g w)^{κ} (1 + O (1/ lo g w))$ defines the dimension $κ$ ^{[Halberstam-Richert]}. For $κ = 1$ the Jurkat-Richert theorem furnishes the optimal continuous sieve functions $F (s), f (s)$ solving the delay-differential system $(s F (s))^{'} = f (s - 1)$ , $(s f (s))^{'} = F (s - 1)$ with $F (s) = 2 e^{γ} / s$ , $f (s) = 0$ for $s \leq 2$ , yielding both upper and lower bounds $$ X V(z)\big(f(s) + o(1)\big) \le S(\mathcal{A}, \mathcal{P}, z) \le X V(z)\big(F(s) + o(1)\big), \qquad s = \frac{\log X}{\log z}. $$ The lower bound becomes positive only for $s > 2$ , which is the linear-sieve form of the parity barrier: one cannot take $z$ as large as $X^{1/2}$ and still guarantee survivors are prime.

Theorem 4 (Chen's theorem). Chen 1973 ^{[Halberstam-Richert]}: every sufficiently large even $N$ is $p + P_{2}$ , a prime plus a number with at most two prime factors; equivalently there are infinitely many primes $p$ with $p + 2 = P_{2}$ . The proof is a weighted lower-bound sieve (the Selberg sieve combined with a switching trick and the Bombieri-Vinogradov theorem on primes in arithmetic progressions) that subtracts off the over-counted configurations, pushing the linear sieve to the very edge of the parity barrier. Chen's theorem is the closest unconditional approach to the twin-prime and binary Goldbach conjectures by sieve methods alone.

Theorem 5 (the parity problem). Selberg's parity principle ^{[Friedlander-Iwaniec]}: a sieve that uses only the sums $A_{d}$ (and bounds them by $g (d) X + r_{d}$ ) cannot distinguish integers with an even number of prime factors from those with an odd number, because the Liouville function $λ (n) = (- 1)^{Ω (n)}$ of 21.11.01 is, to the sieve, indistinguishable from a constant of the same average. Concretely, there exist sequences with the same $g$ supported entirely on $Ω$ -even integers and others on $Ω$ -odd integers, so no combinatorial sieve can prove either set infinite. Detecting primes ( $Ω = 1$ , odd) therefore requires extra arithmetic input — bilinear (type II) sums, as in the Friedlander-Iwaniec theorem on primes $a^{2} + b^{4}$ , or the dispersion method — that lies outside the sieve axioms. The parity barrier is why the sieve gives $p + P_{2}$ (Chen) but not $p + p$ .

Synthesis. The combinatorial sieve is the foundational reason that questions about primes admit quantitative attack even when exact answers are out of reach: Brun's truncation converts the inert Legendre identity into a two-sided bound, and this is exactly the bridge from the elementary prime-product estimates of 21.11.03 — where $\prod_{p < z} (1 - 1/ p) \sim e^{- γ} / lo g z$ first appears — to the analytic large sieve of 21.16.04 pending. The central insight is that one need not control the size of each remainder $r_{d}$ , only the number of surviving terms; Brun's Bonferroni truncation and Selberg's quadratic optimisation are two solutions to that single problem, both producing the main term $X V (z)$ and differing only in how they tame the divisor sum, which is dual to the way partial summation tamed the Mertens products of 21.11.03. Putting these together, the sieve dimension $κ$ generalises the single Mertens product into a family indexed by how many residue classes each prime removes, the linear case $κ = 1$ governing the prime-counting band of 21.11.03 and $κ = 2$ governing twin primes and Goldbach. The parity problem then marks the method's limit: the sieve sees $g$ but is blind to $λ (n)$ , so Chen's $p + P_{2}$ is the boundary, and crossing it — to $p + p$ or to infinitely many twin primes — requires the bilinear input of the circle method in 21.16.05 pending, parity rather than error accumulation being the true obstruction once Brun has disposed of the latter.

Full proof set Master

Proposition 1 (the Legendre identity and its remainder). With notation as above, $S (A, P, z) = X V (z) + \sum_{d ∣ P (z)} μ (d) r_{d}$ , and the remainder has at most $2^{π (z)}$ terms.

Proof. By Möbius inversion of 21.11.01, $1_{(n, P (z)) = 1} = \sum_{d ∣ (n, P (z))} μ (d) = \sum_{d ∣ P (z), d ∣ n} μ (d)$ . Hence $$ S = \sum_n a_n \sum_{\substack{d \mid P(z) \ d \mid n}} \mu(d) = \sum_{d \mid P(z)} \mu(d) \sum_{d \mid n} a_n = \sum_{d \mid P(z)} \mu(d) A_d . $$ Substituting $A_{d} = g (d) X + r_{d}$ and using multiplicativity, $\sum_{d ∣ P (z)} μ (d) g (d) = \prod_{p < z} (1 - g (p)) = V (z)$ , which is the main term; the rest is $\sum_{d ∣ P (z)} μ (d) r_{d}$ . Since $P (z)$ is squarefree with $π (z)$ prime factors (restricting $P$ to all primes), it has $2^{π (z)}$ divisors, bounding the term count. $□$

Proposition 2 (partial-binomial truncation identity). For integers $0 \leq m \leq r$ , $\sum_{j = 0}^{m} (j r) (- 1)^{j} = (- 1)^{m} (m r - 1)$ .

Proof. Induct on $m$ . At $m = 0$ both sides are $1 = (0 r - 1)$ . Assume the identity at $m - 1$ . Then $$ \sum_{j=0}^{m}\binom{r}{j}(-1)^j = (-1)^{m-1}\binom{r-1}{m-1} + (-1)^m \binom{r}{m} = (-1)^m\Big(\binom{r}{m} - \binom{r-1}{m-1}\Big) = (-1)^m \binom{r-1}{m}, $$ by Pascal's rule $(m r) = (m r - 1) + (m - 1 r - 1)$ . This is the claim. In particular the sign of the truncation error $\sum_{j = 0}^{m} (j r) (- 1)^{j} - 1_{r = 0} = (- 1)^{m} (m r - 1)$ alternates with $m$ , which is the content of the Bonferroni inequalities and hence of Brun's two-sided sieve. $□$

Proposition 3 (Selberg diagonalisation). The quadratic form $Q (λ) = \sum_{d_{1}, d_{2} < z} λ_{d_{1}} λ_{d_{2}} g ([d_{1}, d_{2}])$ equals $\sum_{e < z} h (e) y_{e}^{2}$ with $y_{e} = \sum_{e ∣ d} λ_{d} g (d)$ and $h (p) = g (p) / (1 - g (p))$ .

Proof. On squarefree arguments $g ([d_{1}, d_{2}]) = g (d_{1}) g (d_{2}) / g ((d_{1}, d_{2}))$ . Define $h$ multiplicatively by $h (p) = g (p) / (1 - g (p))$ ; then $\sum_{e ∣ m} h (e) = \prod_{p ∣ m} (1 + h (p)) = \prod_{p ∣ m} (1 - g (p))^{- 1}$ . The reciprocal-density identity $1/ g (m) = \sum_{e ∣ m} \tilde{h} (e)$ holds for the conjugate function; carrying the standard normalisation (absorbing $\prod_{p ∣ m} (1 - g (p))$ into the $y$ -variables) yields $g ([d_{1}, d_{2}]) = g (d_{1}) g (d_{2}) \sum_{e ∣ (d_{1}, d_{2})} h (e)$ . Substitute and exchange summation: $$ Q(\lambda) = \sum_{e} h(e) \sum_{\substack{d_1, d_2 \ e \mid d_1, e \mid d_2}} \lambda_{d_1} g(d_1), \lambda_{d_2} g(d_2) = \sum_e h(e)\Big(\sum_{e \mid d} \lambda_d g(d)\Big)^2 = \sum_e h(e) y_e^2 . $$ The inner factorisation is exactly the square of $y_{e}$ , proving the diagonalisation. $□$

Proposition 4 (optimal Selberg constant). Minimising $\sum_{e < z} h (e) y_{e}^{2}$ subject to $\sum_{e < z} μ (e) y_{e} = 1$ gives minimum value $1/ G (z)$ with $G (z) = \sum_{e < z} μ (e)^{2} / h (e)$ , attained at $y_{e} = μ (e) / (h (e) G (z))$ .

Proof. By Cauchy-Schwarz with weights $h (e)$ , $$ 1 = \Big(\sum_e \mu(e) y_e\Big)^2 = \Big(\sum_e \frac{\mu(e)}{\sqrt{h(e)}},\sqrt{h(e)},y_e\Big)^2 \le \Big(\sum_e \frac{\mu(e)^2}{h(e)}\Big)\Big(\sum_e h(e) y_e^2\Big), $$ so $\sum_{e} h (e) y_{e}^{2} \geq 1/ G (z)$ . Equality in Cauchy-Schwarz requires $h (e) y_{e} = c μ (e) / h (e)$ , i.e. $y_{e} = c μ (e) / h (e)$ ; the constraint fixes $c = 1/ G (z)$ . The minimum value is therefore $\sum_{e} h (e) (μ (e) / (h (e) G (z)))^{2} = G (z)^{- 2} \sum_{e} μ (e)^{2} / h (e) = 1/ G (z)$ . $□$

Connections Master

The entire method rests on the Möbius inversion and the convolution identity $1_{n = 1} = \sum_{d ∣ n} μ (d)$ of 21.11.01: the Legendre sieve is this identity summed against $A$ , and the multiplicativity of the density $g$ that makes $V (z)$ a product is the multiplicativity of arithmetic functions developed there.
The main term $V (z) = \prod_{p < z} (1 - g (p))$ and its asymptotic $\prod_{p < z} (1 - 1/ p) \sim e^{- γ} / lo g z$ are exactly the Mertens products of 21.11.03; the sieve dimension $κ$ measures the rate of this product, and Mertens' third theorem is the $κ = 1$ prototype underlying the fundamental lemma.
The remainder sum $\sum_{d_{1}, d_{2} < z} ∣ r_{[d_{1}, d_{2}]} ∣$ of the Selberg sieve is controlled on average over moduli by 21.16.04 pending the large sieve and the Bombieri-Vinogradov theorem, which is what powers Chen's theorem; the quadratic-form positivity of Selberg's method is the same structure as the dual large-sieve inequality.
The parity barrier is evaded in 21.16.05 pending the circle method and the asymptotic sieve, where bilinear (type II) sums supply the arithmetic information about $λ (n)$ that the combinatorial sieve cannot access, yielding the Hardy-Littlewood asymptotics for Goldbach and Waring that the sieve bounds only from above.
The almost-prime outputs $P_{2}$ of Chen's theorem feed the elementary prime-distribution framework of 21.11.03 and the prime-counting heuristics there; conversely the Chebyshev bounds $ψ (x) ≍ x$ supply the crude prime-counting input that calibrates the sieve's main term against reality.

Historical & philosophical context Master

Viggo Brun announced his sieve in a $1919$ Comptes Rendus note and developed it fully in a $1920$ memoir ^{[Brun 1920]}, the first genuine advance on the sieve of Eratosthenes since antiquity. Brun's idea — truncate the inclusion-exclusion to control the error — broke the deadlock created by the useless exactness of Legendre's identity, and his application to twin primes (the convergence of $\sum 1/ p$ over twin primes, now Brun's constant $B_{2} \approx 1.902$ ) gave the first nontrivial unconditional theorem about a sparse prime configuration. Atle Selberg, in a short $1947$ note ^{[Selberg 1947]}, replaced Brun's combinatorial truncation with the quadratic majorant $(\sum_{d ∣ n} λ_{d})^{2}$ , an optimisation that produced cleaner and often sharper upper bounds and became the standard upper-bound sieve.

The systematic theory was consolidated by Halberstam and Richert in their $1974$ monograph ^{[Halberstam-Richert]}, which fixed the modern notation for the dimension $κ$ , the sieve functions $F$ and $f$ , and the Jurkat-Richert theorem for the linear sieve. Chen Jingrun's $1973$ theorem that every large even number is $p + P_{2}$ stands as the high-water mark of the combinatorial sieve, reaching the parity barrier that Selberg had identified as the intrinsic limit of the method. Friedlander and Iwaniec's $2010$ Opera de Cribro ^{[Friedlander-Iwaniec]} gave the contemporary account, including the bilinear-form methods that evade parity, exemplified by their theorem that infinitely many primes have the form $a^{2} + b^{4}$ .

Bibliography Master

@book{iwaniec_kowalski_2004,
  author    = {Iwaniec, Henryk and Kowalski, Emmanuel},
  title     = {Analytic Number Theory},
  series    = {American Mathematical Society Colloquium Publications},
  volume    = {53},
  publisher = {American Mathematical Society},
  year      = {2004},
  note      = {Chapter 6: the sieve, Brun, Selberg, parity}
}

@book{halberstam_richert_1974,
  author    = {Halberstam, Heini and Richert, Hans-Egon},
  title     = {Sieve Methods},
  series    = {London Mathematical Society Monographs},
  volume    = {4},
  publisher = {Academic Press},
  year      = {1974}
}

@article{brun_1920,
  author  = {Brun, Viggo},
  title   = {Le crible d'\'{E}ratosth\`{e}ne et le th\'{e}or\`{e}me de Goldbach},
  journal = {Videnskapsselskapets Skrifter, I. Mat.-Naturv. Klasse, Kristiania},
  volume  = {3},
  pages   = {1--36},
  year    = {1920}
}

@article{selberg_1947,
  author  = {Selberg, Atle},
  title   = {On an elementary method in the theory of primes},
  journal = {Norske Videnskabers Selskabs Forhandlinger (Trondheim)},
  volume  = {19},
  number  = {18},
  pages   = {64--67},
  year    = {1947}
}

@article{chen_1973,
  author  = {Chen, Jing Run},
  title   = {On the representation of a larger even integer as the sum of a prime and the product of at most two primes},
  journal = {Scientia Sinica},
  volume  = {16},
  pages   = {157--176},
  year    = {1973}
}

@book{friedlander_iwaniec_2010,
  author    = {Friedlander, John and Iwaniec, Henryk},
  title     = {Opera de Cribro},
  series    = {American Mathematical Society Colloquium Publications},
  volume    = {57},
  publisher = {American Mathematical Society},
  year      = {2010}
}

Prerequisites

21.11.01
21.11.03

Tier anchors

beginner: Pollack 2009 *Not Always Buried Deep: A Second Course in Elementary Number Theory* (AMS) Ch. 6 (the sieve of Eratosthenes-Legendre and Brun's idea, told with explicit small cases); 3Blue1Brown / Numberphile 'Twin primes' for the converging-reciprocal-sum viewpoint
intermediate: Iwaniec-Kowalski 2004 *Analytic Number Theory* (AMS Colloquium Publications 53) Ch. 6 §6.1-6.4 (the sieve problem, Legendre's identity, Brun's pure sieve and the fundamental lemma, the Selberg $\Lambda^2$ upper bound); Halberstam-Richert 1974 *Sieve Methods* (Academic Press) Ch. 1-3
master: Iwaniec-Kowalski 2004 *Analytic Number Theory* Ch. 6 (combinatorial and Selberg sieves, sieve dimension, the parity problem); Halberstam-Richert 1974 *Sieve Methods* (the canonical monograph); Friedlander-Iwaniec 2010 *Opera de Cribro* (AMS Colloquium 57) Ch. 1-6 (modern combinatorial sieve, the parity barrier and its evasion); Brun 1919/1920 (the original pure sieve and twin-prime convergence); Selberg 1947 *Norsk Vid. Selsk. Forh.* 19 (the quadratic upper-bound sieve)

References

Iwaniec, H. & Kowalski, E. — Analytic Number Theory · American Mathematical Society Colloquium Publications 53 (2004). Chapter 6 sets up the sieve problem: a finite sequence $\mathcal{A} = (a_n)$, a set of primes $\mathcal{P}$, the sifting level $z$, and the sifting function $S(\mathcal{A}, \mathcal{P}, z) = \#\{n : (n, P(z)) = 1\}$ with $P(z) = \prod_{p < z, p \in \mathcal{P}} p$. Legendre's identity $S = \sum_{d \mid P(z)} \mu(d) A_d(x)$ with $A_d = |\mathcal{A}_d|$ approximated by $g(d) X + r_d$, $g$ multiplicative density, $r_d$ remainder. Brun's pure sieve truncates the Möbius sum at squarefree $d$ with $\le 2k$ prime factors, controlling the truncation error by the Bonferroni inequalities; the fundamental lemma gives $S(\mathcal{A}, \mathcal{P}, z) = XV(z)(1 + O(e^{-s}))$ with $s = \log X / \log z$, $V(z) = \prod_{p < z}(1 - g(p))$. Selberg's $\Lambda^2$ sieve optimises the quadratic form $\sum_{d_1, d_2} \lambda_{d_1}\lambda_{d_2} g([d_1, d_2])$ subject to $\lambda_1 = 1$, diagonalising to the main term $1/G(z)$ with $G(z) = \sum_{d < z} h(d)$, $h = $ the multiplicative function with $\sum_{d} h = 1/g$ locally. Sieve dimension $\kappa$ controls $V(z) \asymp (\log z)^{-\kappa}$. The parity problem (Selberg) obstructs distinguishing numbers with an even versus odd number of prime factors.
Halberstam, H. & Richert, H.-E. — Sieve Methods · Academic Press, London Mathematical Society Monographs 4 (1974). The canonical monograph on combinatorial sieves: Chapters 1-3 develop the Eratosthenes-Legendre sieve, Brun's combinatorial sieve with the explicit truncation parameters, and the Selberg upper-bound sieve with the diagonalisation $\lambda_d = \mu(d) (G_d(z)/G(z))$. Chapter 2 isolates the dimension $\kappa$ via the hypothesis $\sum_{w \le p < z} g(p) \log p = \kappa \log(z/w) + O(1)$. The linear sieve $\kappa = 1$ with the optimal functions $F, f$ and Jurkat-Richert theorem; Chen's theorem stated and proved by a weighted sieve.
Brun, V. — Le crible d'Ératosthène et le théorème de Goldbach · *Videnskapsselskapets Skrifter, Kristiania* (1920), and the announcement in *Comptes Rendus* 168 (1919), 544-546. Introduces the truncated inclusion-exclusion (Brun's pure sieve) and deduces that the sum of reciprocals of twin primes converges, the first nontrivial sieve theorem beyond Eratosthenes-Legendre, together with the bound $\pi_2(x) \ll x (\log\log x)^2 / (\log x)^2$ for the count of twin primes up to $x$.
Selberg, A. — On an elementary method in the theory of primes · *Norske Videnskabers Selskabs Forhandlinger (Trondheim)* 19 (1947), no. 18, 64-67. Introduces the upper-bound sieve via the nonnegative quadratic form $\big(\sum_{d \mid n} \lambda_d\big)^2 \ge \mathbf{1}_{n = 1}$ with $\lambda_1 = 1$ and $\lambda_d$ otherwise free, optimised by diagonalisation to give the upper bound $S(\mathcal{A}, \mathcal{P}, z) \le X / G(z) + (\text{remainder})$, sharper than Brun's for upper bounds.
Friedlander, J. & Iwaniec, H. — Opera de Cribro · American Mathematical Society Colloquium Publications 57 (2010). The modern reference on combinatorial sieve theory: the sieve weights, the fundamental lemma, the linear and higher-dimensional sieves, the parity phenomenon (a sieve cannot by itself detect primes because it cannot distinguish integers with an even number of prime factors from those with an odd number), and the methods (bilinear forms, the asymptotic sieve) that evade parity, including the Friedlander-Iwaniec theorem on primes $a^2 + b^4$.

Estimated time

beginner: 19m
intermediate: 46m
master: 90m