40.06.06 · combinatorics / design-coding-theory

Linear Codes and the Hamming, Singleton, and Gilbert-Varshamov Bounds

shipped3 tiersLean: none

Anchor (Master): van Lint-Wilson 2001 *A Course in Combinatorics* 2e (Cambridge) Ch. 18-20 (linear codes, weight enumerators, MacWilliams, the bounds); MacWilliams-Sloane 1977 *The Theory of Error-Correcting Codes* (North-Holland) Ch. 1-5, 19 (the definitive treatment of weight enumerators and the bounds); Hamming 1950 *Bell System Tech. J.* 29 (the sphere-packing bound and the original single-error-correcting codes); Gilbert 1952 *Bell System Tech. J.* 31; Varshamov 1957 *Dokl. Akad. Nauk SSSR* 117; Singleton 1964 *IEEE Trans. Inform. Theory* 10; MacWilliams 1963 (the weight-enumerator duality)

Intuition Beginner

When a message travels down a noisy channel, some of its symbols flip. If you send the single bit $1$ and it arrives as $0$ , the receiver has no way to notice, let alone repair the damage. The fix is to add planned redundancy: instead of one bit, send a longer string in which the extra symbols are bound to the message by fixed rules. If noise breaks one of those rules, the receiver sees that something is wrong, and if only a little noise occurred, the receiver can even guess back the original. A code is a chosen collection of allowed strings, called codewords; everything you ever transmit is one of them.

The key number is how far apart the codewords sit. Measure the distance between two strings by counting the positions where they differ. If every pair of codewords differs in at least $d$ positions, then a handful of flipped symbols cannot carry one codeword all the way to another. The received string stays closer to the codeword you sent than to any other, so decoding to the nearest codeword recovers the message. The larger $d$ is, the more errors you survive.

This sets up a tug of war. You want many codewords, so you can send many different messages; you want them long-spaced, so you correct many errors; and you want the strings short, so you waste little channel. These three wishes fight each other, and the whole subject is about the exact terms of the trade. The bounds in this unit are the rules of that fight: how good a code is even possible, and how good a code is guaranteed to exist.

Visual Beginner

Picture every string of length $n$ as a point, and draw an edge between two strings that differ in exactly one position. The codewords are a scattered handful of these points, deliberately spread out. Around each codeword draw a ball: all the strings within a small number of flips of it. If the balls do not overlap, then any lightly corrupted string sits inside exactly one ball, and you decode it to that ball's centre.

string	differs from $00000$ in	inside the ball of $00000$ ?
$00000$	$0$ positions	yes (it is the centre)
$10000$	$1$ position	yes, if the ball has radius $1$
$11000$	$2$ positions	no, if the ball has radius $1$

The packing question is geometric: how many non-overlapping balls of a given radius fit in the space of strings? Too many codewords and the balls collide, so decoding becomes ambiguous. The bounds count how the balls fit.

Worked example Beginner

Build the simplest useful code by hand: the binary repetition code that sends each bit three times. The two codewords are $000$ and $111$ . The allowed strings have length $n = 3$ , and there are $2$ of them, one for the message $0$ and one for the message $1$ .

Step 1. Find the distance between the two codewords. Compare $000$ and $111$ position by position: they differ in all $3$ places. So the minimum distance is $d = 3$ .

Step 2. Decide how many flips you survive. With $d = 3$ , two codewords are $3$ apart, so a received string within $1$ flip of a codeword is still strictly closer to that codeword than to the other. The code corrects $1$ error.

Step 3. Decode a received word. Suppose $101$ arrives. Count its distance to each codeword: it differs from $000$ in $2$ positions and from $111$ in $1$ position. The nearer codeword is $111$ , so you decode to $111$ , recovering the message bit $1$ .

Step 4. Decode by majority. Notice the rule is just a majority vote: $101$ has two $1$ s and one $0$ , so the answer is $1$ . Nearest-codeword decoding and majority voting agree here.

What this tells us: spending three symbols to send one bit buys the power to fix any single flip. The price is the rate $1/3$ — one message bit per three channel symbols. Every code is a deal of this shape, and the bounds say how good a deal you can strike.

Check your understanding Beginner

Exercise (easy, true-false).

True or false: the binary repetition code ${000, 111}$ can detect that two of its three symbols were flipped, even though it cannot always correct two flips.

Hint

If you send $000$ and two symbols flip, you receive a string with two $1$ s. Is that string one of the two codewords?

Answer

True. Sending $000$ and flipping two symbols gives something like $110$ , which is neither $000$ nor $111$ , so the receiver knows an error occurred — detection succeeds. But $110$ is closer to $111$ than to $000$ , so nearest-codeword decoding would correct it the wrong way. With $d = 3$ the code detects up to $2$ errors but corrects only $1$ . Feedback-correct: detection asks only whether the word is a codeword; correction asks which codeword is nearest. Feedback-wrong: detecting and correcting are different powers, and a code generally detects more than it corrects.

Formal definition Intermediate+

Throughout, $F_{q}$ is the finite field with $q$ elements (its existence and structure are developed in 21.02.01; here it is applied, not reconstructed), and $F_{q}^{n}$ is the standard $n$ -dimensional vector space of row vectors, whose linear-algebraic theory is taken from 01.01.04. The integer $q \geq 2$ is a prime power.

Definition (linear code). A linear code of length $n$ over $F_{q}$ is a $k$ -dimensional linear subspace $C \subseteq F_{q}^{n}$ . Its elements are codewords; the parameters are written $[n, k]_{q}$ , and $∣ C ∣ = q^{k}$ . The rate is $R = k / n$ .

Definition (Hamming weight and distance). For $x, y \in F_{q}^{n}$ , the Hamming distance $d (x, y) = ∣ {i : x_{i} \neq = y_{i}} ∣$ counts the positions in which $x$ and $y$ differ; it is a metric. The Hamming weight $wt (x) = d (x, 0)$ counts the nonzero coordinates. The minimum distance of $C$ is $$ d(C) = \min_{\substack{x, y \in C \ x \neq y}} d(x, y). $$

Lemma (linearity collapses distance to weight). Because $C$ is closed under subtraction, $d (x, y) = wt (x - y)$ with $x - y \in C$ , so $$ d(C) = \min_{\substack{c \in C \ c \neq 0}} \mathrm{wt}(c). $$ A code with parameters $[n, k]_{q}$ and minimum distance $d$ is called an $[n, k, d]_{q}$ code. By the metric triangle inequality, nearest-codeword (minimum-distance) decoding corrects any error pattern of weight at most $t = ⌊(d - 1) /2 ⌋$ and detects any nonzero pattern of weight at most $d - 1$ .

Definition (generator and parity-check matrices). A generator matrix $G$ is a $k \times n$ matrix over $F_{q}$ whose rows are a basis of $C$ , so $C = {u G : u \in F_{q}^{k}}$ . The dual code is $C^{⊥} = {v \in F_{q}^{n} : v \cdot c = 0 for all c \in C}$ , where $v \cdot c = \sum_{i} v_{i} c_{i}$ ; it is an $[n, n - k]$ code. A parity-check matrix $H$ is an $(n - k) \times n$ generator matrix of $C^{⊥}$ , so that $$ C = { x \in \mathbb{F}q^n : H x^{\mathsf T} = 0 }. $$ When $G = [I_{k} ∣ A]$ is in standard form, $H = [, -A^{\mathsf T} \mid I{n-k} ,] $i s a p a r i t y - c h ec k ma t r i x, s in ce t h e n$ G H^{\mathsf T} = -A + A = 0$.

Definition (syndrome and standard array). The syndrome of a received word $y \in F_{q}^{n}$ is $S (y) = H y^{T} \in F_{q}^{n - k}$ . Two words have the same syndrome iff they lie in the same coset of $C$ ; writing $y = c + e$ with $c \in C$ , the syndrome $S (y) = H e^{T}$ depends only on the error $e$ . Syndrome decoding fixes, in each coset, a coset leader of least weight, and decodes $y$ to $y - (the coset leader of S (y))$ . The table of cosets, with leaders down the first column, is the standard array.

Counterexamples to common slips Intermediate+

Minimum distance is a property of the whole code, not of one pair. It is the least weight over all nonzero codewords; finding two codewords at distance $3$ does not establish $d = 3$ unless no nonzero codeword has smaller weight.
The dual code is not the set-complement. $C^{⊥}$ is the orthogonal subspace under the standard bilinear form, and over $F_{q}$ a code may be self-dual ( $C = C^{⊥}$ , forcing $n = 2 k$ ) or self-orthogonal ( $C \subseteq C^{⊥}$ ); a codeword can be orthogonal to itself since $c \cdot c = \sum c_{i}^{2}$ may vanish in characteristic dividing the weight.
Parity-check rows count, not parity-check equations chosen ad hoc. Any $(n - k) \times n$ matrix of rank $n - k$ whose row space is $C^{⊥}$ is a valid $H$ ; different choices give the same code but different syndromes, hence different standard arrays.
A generator matrix need not be in standard form. Row operations and column permutations bring $G$ to $[I_{k} ∣ A]$ only up to a permutation-equivalent code; the permutation changes which coordinates are message symbols but preserves all parameters $[n, k, d]$ .

Key theorem with proof Intermediate+

The structural fact that turns minimum distance into linear algebra is the parity-check criterion: the distance of a code is read directly off the columns of any parity-check matrix. Everything downstream — the Singleton bound, MDS codes, the construction half of Gilbert-Varshamov — flows from it.

Theorem (parity-check distance criterion). Let $C$ be an $[n, k]_{q}$ code with parity-check matrix $H$ , and let $d \geq 1$ . Then $d (C) \geq d$ if and only if every set of $d - 1$ columns of $H$ is linearly independent. Equivalently, $d (C)$ equals the least number of linearly dependent columns of $H$ ^{[van Lint 1999]}.

Proof. Write the columns of $H$ as $h_{1}, \dots, h_{n} \in F_{q}^{n - k}$ . A vector $x \in F_{q}^{n}$ lies in $C$ iff $H x^{T} = 0$ , that is, iff $\sum_{i = 1}^{n} x_{i} h_{i} = 0$ . Let $c \in C$ be nonzero with support $supp (c) = {i : c_{i} \neq = 0}$ of size $w = wt (c)$ . Then $\sum_{i \in supp (c)} c_{i} h_{i} = 0$ is a nonzero linear dependence among exactly $w$ columns of $H$ , with all coefficients $c_{i}$ nonzero. Conversely, any linear dependence $\sum_{i \in T} a_{i} h_{i} = 0$ with $T \neq = \emptyset$ and all $a_{i} \neq = 0$ defines a codeword $c$ with $c_{i} = a_{i}$ for $i \in T$ and $c_{i} = 0$ otherwise, of weight $∣ T ∣$ .

Hence the weights of nonzero codewords are exactly the sizes of minimal sets of columns of $H$ admitting a dependence with all coefficients nonzero — which, after discarding zero coefficients, are the sizes of linearly dependent column sets. The minimum weight $d (C)$ is therefore the least size of a linearly dependent set of columns. In particular $d (C) \geq d$ exactly when no $d - 1$ columns are dependent, i.e. every $d - 1$ columns are independent. $□$

Corollary (Singleton bound). Every $[n, k, d]_{q}$ code satisfies $d \leq n - k + 1$ ^{[Singleton 1964]}. The matrix $H$ has $n - k$ rows, so any $n - k + 1$ columns are linearly dependent. By the criterion the least dependent column set has size at most $n - k + 1$ , i.e. $d \leq n - k + 1$ . A code meeting this with equality, $d = n - k + 1$ , is maximum distance separable (MDS).

Bridge. The parity-check criterion builds toward all three classical bounds by converting a metric question into a question about column rank, and it appears again in the master-tier construction of Gilbert-Varshamov, where a parity-check matrix is grown one column at a time so that no small set of columns becomes dependent. The foundational reason the Singleton bound holds is exactly this: a matrix of $n - k$ rows cannot keep more than $n - k$ columns independent, so the simple case of full-rank linear algebra already caps the distance. This is dual to the Hamming bound, which limits distance from below by counting volume rather than rank: where Singleton is an upper bound on $d$ forced by the row count of $H$ , sphere-packing is an upper bound on $∣ C ∣$ forced by the radius of decoding balls. Putting these together, an MDS code is the case where the rank obstruction is the only obstruction, and the central insight — that the columns of $H$ encode the code's whole error-correcting geometry — is what generalises, in the master tier, into the weight enumerator and its MacWilliams dual.

Exercises Intermediate+

Exercise 3 (medium, symbolic).

Let $C$ be the binary code with parity-check matrix $H = (1001111001)$ . Find $n$ , $k$ , and $d$ , and write down a generator matrix.

Hint

$H$ is $(n - k) \times n$ . Use the distance criterion on the columns; then read $G = [I_{k} ∣ A]$ from $H = [A^{T} ∣ I_{n - k}]$ after suitable column ordering, or directly find a basis of ${x : H x^{T} = 0}$ .

Answer

Here $H$ is $2 \times 5$ , so $n = 5$ , $n - k = 2$ , and $k = 3$ . The columns are $h_{1} = (1, 0)^{T}$ , $h_{2} = (0, 1)^{T}$ , $h_{3} = (1, 1)^{T}$ , $h_{4} = (1, 0)^{T}$ , $h_{5} = (0, 1)^{T}$ . No single column is zero, so $d \geq 2$ . Columns $1$ and $4$ are equal, giving $h_{1} + h_{4} = 0$ , a dependence of size $2$ , so $d = 2$ . Thus $C$ is a $[5, 3, 2]_{2}$ code. Writing the parity equations $x_{1} + x_{3} + x_{4} = 0$ and $x_{2} + x_{3} + x_{5} = 0$ , take $x_{1}, x_{2}, x_{3}$ free: a generator matrix is $$ G = \begin{pmatrix} 1 & 0 & 0 & 1 & 0 \ 0 & 1 & 0 & 0 & 1 \ 0 & 0 & 1 & 1 & 1 \end{pmatrix}, $$ whose rows satisfy $G H^{T} = 0$ . Rubric: full credit for correct $n, k, d$ and any valid $G$ with $G H^{T} = 0$ and rank $3$ .

Exercise 5 (medium, symbolic).

Show that a $q$ -ary MDS code with parameters $[n, k, n - k + 1]_{q}$ has the property that any $k$ coordinates may serve as message positions: the projection of $C$ onto any $k$ coordinates is a bijection onto $F_{q}^{k}$ .

Hint

If two codewords agreed on some $k$ coordinates, their difference would be a nonzero codeword vanishing on $k$ positions, hence of weight at most $n - k$ . Compare with the minimum distance.

Answer

Fix a set $T$ of $k$ coordinates and consider the projection $π_{T} : C \to F_{q}^{k}$ . Suppose $π_{T} (c) = π_{T} (c^{'})$ for codewords $c \neq = c^{'}$ . Then $c - c^{'}$ is a nonzero codeword that is zero on all $k$ positions of $T$ , so its weight is at most $n - k$ . But the minimum distance is $d = n - k + 1 > n - k$ , a contradiction. Hence $π_{T}$ is injective; since $∣ C ∣ = q^{k} = ∣ F_{q}^{k} ∣$ , it is a bijection. Thus every $k$ -subset of coordinates is an information set, which is the defining feature of MDS codes (and the reason Reed-Solomon codes, the canonical MDS family, support flexible erasure recovery). Rubric: full credit for the difference-codeword weight argument and the cardinality count giving bijectivity.

Exercise 6 (medium, numeric).

Use the Gilbert-Varshamov bound to show a binary linear code of length $n = 10$ and minimum distance $d = 3$ exists with dimension $k \geq 5$ . The sufficient condition is $V_{2} (n - 1, d - 2) < 2^{n - k}$ . Find the largest $n - k$ this forces and conclude the dimension.

Hint

Compute $V_{2} (9, 1) = 1 + 9 = 10$ . You need $2^{n - k} > 10$ , i.e. the redundancy $n - k$ satisfies $2^{n - k} > 10$ . Solve for the least $n - k$ .

Answer

The Varshamov sufficient condition for an $[n, k, d]_{2}$ code is $V_{2} (n - 1, d - 2) < 2^{n - k}$ . Here $V_{2} (9, 1) = 1 + 9 = 10$ , so it suffices that $2^{n - k} > 10$ , i.e. $n - k \geq 4$ (since $2^{4} = 16 > 10$ while $2^{3} = 8 < 10$ ). With $n = 10$ this gives a code of redundancy $4$ , hence dimension $k = 10 - 4 = 6 \geq 5$ . So a $[10, 6, 3]_{2}$ code is guaranteed to exist. Rubric: full credit for $V_{2} (9, 1) = 10$ , the least redundancy $n - k = 4$ , and dimension $k = 6$ .

Exercise 7 (hard, symbolic).

Prove the Gilbert-Varshamov bound in its parity-check (Varshamov) form: if $V_{q} (n - 1, d - 2) < q^{n - k}$ , then an $[n, k, d]_{q}$ linear code exists. Construct a parity-check matrix column by column.

Hint

Build $H$ with $n - k$ rows by choosing each new column to avoid all linear combinations of $d - 2$ or fewer previously chosen columns. Count how many vectors that excludes.

Answer

We construct an $(n - k) \times n$ matrix $H$ over $F_{q}$ such that every $d - 1$ columns are linearly independent; by the parity-check distance criterion the resulting code has minimum distance $\geq d$ . Choose columns $h_{1}, \dots, h_{n} \in F_{q}^{n - k}$ greedily. Having chosen $h_{1}, \dots, h_{j - 1}$ (with $j \leq n$ ), we must pick $h_{j}$ that is not a linear combination of any $d - 2$ or fewer of the previous columns — for then no set of $\leq d - 1$ columns including $h_{j}$ is dependent. The number of vectors expressible as a combination of at most $d - 2$ of the $j - 1 \leq n - 1$ earlier columns is at most $$ \sum_{i=0}^{d-2} \binom{n-1}{i}(q-1)^i = V_q(n-1, d-2), $$ counting choices of $i$ columns and nonzero coefficients (the zero combination is included as the $i = 0$ term). If $V_{q} (n - 1, d - 2) < q^{n - k}$ , this forbidden set is strictly smaller than all of $F_{q}^{n - k}$ , so a valid $h_{j}$ remains. Inductively all $n$ columns can be chosen, yielding $H$ with every $d - 1$ columns independent, hence an $[n, k, d]_{q}$ code (its dimension is $\geq n - rank (H) = k$ , taking $H$ of full row rank). Rubric: full credit for the column-avoidance count, the strict inequality giving a free vector, and the appeal to the distance criterion.

Exercise 8 (hard, symbolic).

Prove the Plotkin bound in the binary case: for an $[n, k, d]_{2}$ code with $2 d > n$ , $$ |C| \leq 2 \left\lfloor \frac{d}{2d - n} \right\rfloor. $$ Use the average pairwise distance.

Hint

Sum $d (x, y)$ over all ordered pairs of distinct codewords two ways: as $\geq M (M - 1) d$ where $M = ∣ C ∣$ , and column by column. In each column with $a$ zeros and $M - a$ ones, the pairs contribute $2 a (M - a) \leq M^{2} /2$ .

Answer

Let $M = ∣ C ∣$ and $S = \sum_{x \neq = y} d (x, y)$ over ordered pairs of distinct codewords. Since each unordered pair is at distance $\geq d$ , $S \geq M (M - 1) d$ . Computing $S$ by columns: in coordinate $j$ , let $a_{j}$ be the number of codewords with a $0$ there, so $M - a_{j}$ have a $1$ ; the ordered pairs differing in column $j$ number $2 a_{j} (M - a_{j})$ . Thus $S = \sum_{j = 1}^{n} 2 a_{j} (M - a_{j}) \leq \sum_{j = 1}^{n} \frac{M ^{2}}{2} = \frac{n M ^{2}}{2}$ , using $2 a (M - a) \leq M^{2} /2$ . Combining, $$ M(M-1) d \leq \tfrac{n M^2}{2} \quad\Longrightarrow\quad M(2d - n) \leq 2d, $$ after dividing by $M /2$ and rearranging. When $2 d > n$ the factor $2 d - n > 0$ , so $M \leq \frac{2 d}{2 d - n}$ . A parity/integrality refinement (splitting on the value of a fixed coordinate and applying the estimate to each half) sharpens this to $M \leq 2 ⌊ d / (2 d - n)⌋$ . Rubric: full credit for the two-way count of $S$ , the column estimate $2 a (M - a) \leq M^{2} /2$ , and the rearrangement to $M (2 d - n) \leq 2 d$ ; the floor refinement earns the final mark.

Advanced results Master

The bounds frame the achievable region; the weight enumerator records a code's full distance profile and is governed by a single duality, the MacWilliams identity. We collect the structural results: the asymptotic Gilbert-Varshamov bound, the sphere-packing bound and perfect codes, the weight enumerator and MacWilliams transform, and the place of MDS codes.

Theorem 1 (Hamming sphere-packing bound, $q$ -ary). Any $q$ -ary code (linear or not) of length $n$ , size $M$ , and minimum distance $d$ correcting $t = ⌊(d - 1) /2 ⌋$ errors satisfies $M \cdot V_{q} (n, t) \leq q^{n}$ , where $V_{q} (n, r) = \sum_{i = 0}^{r} (i n) (q - 1)^{i}$ is the volume of a Hamming ball of radius $r$ ^{[Hamming 1950]}. The $t$ -balls about codewords are disjoint and lie in $F_{q}^{n}$ . Equality holds exactly for perfect codes, where the balls tile the space; the binary and $q$ -ary Hamming codes, the binary $[23, 12, 7]$ and ternary $[11, 6, 5]$ Golay codes, the repetition codes, and the degenerate whole-space codes exhaust all perfect codes over fields (van Lint-Tietäväinen), pursued in the co-produced perfect-codes unit.

Theorem 2 (asymptotic Gilbert-Varshamov bound). Define the $q$ -ary entropy $H_{q} (δ) = δ lo g_{q} (q - 1) - δ lo g_{q} δ - (1 - δ) lo g_{q} (1 - δ)$ for $0 \leq δ \leq 1 - 1/ q$ . Then for every relative distance $δ \in (0, 1 - 1/ q)$ there exist (linear) codes of rate $R$ arbitrarily close to $1 - H_{q} (δ)$ with relative minimum distance $\geq δ$ , because $V_{q} (n, δ n) = q^{n H_{q} (δ) (1 + o (1))}$ and the Varshamov condition $V_{q} (n - 1, d - 2) < q^{n - k}$ becomes $R < 1 - H_{q} (δ) - o (1)$ ^{[Gilbert 1952]}. For $q < 49$ this remained, until the 1982 Tsfasman-Vlăduţ-Zink algebraic-geometry codes, the best known lower bound on achievable rate; for binary codes it still is.

Theorem 3 (weight enumerator). The weight enumerator of an $[n, k]_{q}$ code $C$ is the homogeneous polynomial $$ W_C(x, y) = \sum_{c \in C} x^{,n - \mathrm{wt}(c)}, y^{,\mathrm{wt}(c)} = \sum_{i=0}^{n} A_i, x^{n-i} y^{i}, $$ where $A_{i} = ∣ {c \in C : wt (c) = i} ∣$ is the weight distribution; $A_{0} = 1$ and $A_{i} = 0$ for $0 < i < d$ . The enumerator packages error probabilities, the undetected-error rate, and (via $A_{i}$ ) the minimum distance.

Theorem 4 (MacWilliams identity). For a linear $[n, k]_{q}$ code $C$ with dual $C^{⊥}$ , $$ W_{C^\perp}(x, y) = \frac{1}{|C|}, W_C\big(x + (q-1)y,; x - y\big) $$ ^{[MacWilliams-Sloane Ch. 5]}. Equivalently the dual weight distribution is $A_{i}^{⊥} = \frac{1}{∣ C ∣} \sum_{j = 0}^{n} A_{j} K_{i} (j)$ , where $K_{i} (j) = \sum_{s = 0}^{i} (- 1)^{s} (q - 1)^{i - s} (s j) (i - s n - j)$ is the $q$ -ary Krawtchouk polynomial. The proof is a discrete Fourier/Poisson-summation argument over the additive characters of $F_{q}^{n}$ : summing the character sum $\sum_{c \in C} χ (c \cdot v)$ over $C$ collapses to $∣ C ∣$ on $C^{⊥}$ and $0$ off it, and re-expanding the generating function in the weight variables produces the Krawtchouk transform. The identity lets one compute $C^{⊥}$ 's spectrum from $C$ 's without enumerating $C^{⊥}$ , and underlies the linear-programming (Delsarte) bound.

Theorem 5 (MDS duality and the singleton case). If $C$ is MDS with parameters $[n, k, n - k + 1]_{q}$ , then $C^{⊥}$ is MDS with parameters $[n, n - k, k + 1]_{q}$ , and the weight distribution of an MDS code is determined entirely by $n, k, q$ : $$ A_i = \binom{n}{i}\sum_{j=0}^{,i - d}(-1)^j\binom{i}{j}\big(q^{,i - d + 1 - j} - 1\big), \qquad i \geq d = n-k+1, $$ a closed form forced by the MacWilliams relations together with $A_{0} = 1$ , $A_{1} = \dots = A_{d - 1} = 0$ . Generalised Reed-Solomon codes realise MDS parameters for all $1 \leq k \leq n \leq q$ ; the MDS conjecture asserts $n \leq q + 1$ for MDS codes with $2 \leq k \leq q - 1$ (except two exceptions at even $q$ ), proved for $q$ prime by Ball (2012).

Synthesis. Putting these together, the three bounds and the weight enumerator are one circle of ideas pinned by the parity-check matrix. The Singleton bound is the rank ceiling on distance; the sphere-packing bound is the volume ceiling on size; the Gilbert-Varshamov bound is the volume floor guaranteeing existence — and the foundational reason all three speak the same language is that each counts column-configurations of $H$ , exactly as the distance criterion of the Key theorem demands. This is dual, in the precise sense of the MacWilliams identity, to a statement about $C^{⊥}$ : the weight enumerator of a code and of its dual determine each other through the Krawtchouk transform, so the geometry of decoding balls (sphere-packing) and the algebra of parity checks (Singleton) are two readings of the same generating function. The central insight is that an MDS code is the simultaneous extremal case — meeting Singleton with equality, self-dual in parameters, with a weight distribution forced by MacWilliams alone — and this is exactly where the achievable region of $(R, δ)$ touches its rank boundary. The bridge from these finite identities to the asymptotic theory is the entropy expansion of the ball volume, which turns Gilbert-Varshamov into the rate $1 - H_{q} (δ)$ and turns sphere-packing into the Hamming rate bound, framing Shannon's channel-coding theorem (stated as a pointer, its information-theoretic proof reserved) as the probabilistic completion of this combinatorial picture.

Full proof set Master

Proposition 1 (dimension of the dual; reflexivity). For an $[n, k]_{q}$ code $C$ , $dim C^{⊥} = n - k$ and $(C^{⊥})^{⊥} = C$ .

Proof. The bilinear form $⟨ u, v ⟩ = \sum_{i} u_{i} v_{i}$ on $F_{q}^{n}$ is nondegenerate. The map $Φ : F_{q}^{n} \to (F_{q}^{n})^{*}$ , $Φ (v) = ⟨ v, - ⟩$ , is a linear isomorphism (nondegeneracy), and $C^{⊥} = Φ^{- 1} (Ann (C))$ where $Ann (C) = {φ : φ ∣_{C} = 0}$ . By the rank-nullity / annihilator dimension formula of 01.01.05, $dim Ann (C) = n - dim C = n - k$ , so $dim C^{⊥} = n - k$ . Applying this to $C^{⊥}$ gives $dim (C^{⊥})^{⊥} = n - (n - k) = k = dim C$ , and $C \subseteq (C^{⊥})^{⊥}$ holds by definition of the dual; equal finite dimensions force $(C^{⊥})^{⊥} = C$ . $□$

Proposition 2 (Singleton bound and the MDS dual is MDS). Every $[n, k, d]_{q}$ code has $d \leq n - k + 1$ , and if equality holds then $C^{⊥}$ is $[n, n - k, k + 1]_{q}$ , again meeting Singleton.

Proof. The Singleton inequality is the Corollary to the Key theorem: a parity-check matrix has $n - k$ rows, so any $n - k + 1$ columns are dependent, forcing $d \leq n - k + 1$ . Suppose $d = n - k + 1$ (MDS). Project $C$ onto any $k$ coordinates: as shown for MDS codes, no nonzero codeword vanishes on $k$ coordinates (such a word would have weight $\leq n - k < d$ ), so the projection is injective, hence bijective onto $F_{q}^{k}$ , and every $k$ -subset of coordinates is an information set. Dually, a generator matrix $G$ of $C$ has every $k$ columns of rank $k$ (independent). The parity-check matrix $H$ of $C$ is a generator matrix of $C^{⊥}$ ; the MDS property of $C$ states every $k$ columns of $G$ are independent, equivalently (a standard determinantal duality for complementary minors) every $n - k$ columns of $H$ are independent, so $C^{⊥}$ has minimum distance $\geq (n - k) + 1 = k + 1$ . Singleton for $C^{⊥}$ gives $d^{⊥} \leq n - (n - k) + 1 = k + 1$ , so $d^{⊥} = k + 1$ and $C^{⊥}$ is MDS. $□$

Proposition 3 (sphere-packing bound; equality is perfection). Any code of length $n$ , size $M$ , minimum distance $d$ over $F_{q}$ satisfies $M \sum_{i = 0}^{t} (i n) (q - 1)^{i} \leq q^{n}$ with $t = ⌊(d - 1) /2 ⌋$ , with equality iff the radius- $t$ balls about codewords partition $F_{q}^{n}$ .

Proof. For $c \neq = c^{'}$ in $C$ , the balls $B (c, t)$ and $B (c^{'}, t)$ are disjoint: a common point $z$ would give $d (c, c^{'}) \leq d (c, z) + d (z, c^{'}) \leq 2 t \leq d - 1 < d$ , contradicting the minimum distance. Each ball has exactly $V_{q} (n, t) = \sum_{i = 0}^{t} (i n) (q - 1)^{i}$ points — choose $i$ of the $n$ coordinates to alter and one of $q - 1$ nonzero changes in each. The $M$ disjoint balls lie in $F_{q}^{n}$ , so $M \cdot V_{q} (n, t) \leq q^{n}$ . Equality holds iff the balls exhaust $F_{q}^{n}$ with no leftover points, i.e. every word is within $t$ of a unique codeword — the definition of a perfect code. $□$

Proposition 4 (Gilbert-Varshamov existence, Varshamov form). If $V_{q} (n - 1, d - 2) < q^{n - k}$ , then an $[n, k, d]_{q}$ linear code exists.

Proof. Construct a full-row-rank $(n - k) \times n$ parity-check matrix $H$ with every $d - 1$ columns independent, by the greedy column choice of Exercise 7: having placed $j - 1 \leq n - 1$ columns, the vectors that are linear combinations of at most $d - 2$ of them number at most $\sum_{i = 0}^{d - 2} (i n - 1) (q - 1)^{i} = V_{q} (n - 1, d - 2)$ ; the hypothesis makes this strictly less than $q^{n - k} = ∣ F_{q}^{n - k} ∣$ , so a column $h_{j}$ outside every such combination exists, and adding it keeps every $\leq d - 1$ columns independent. After $n$ steps the code $C = {x : H x^{T} = 0}$ has, by the parity-check distance criterion, minimum distance $\geq d$ and dimension $n - rank (H) = k$ . $□$

Proposition 5 (MacWilliams identity). For a linear $[n, k]_{q}$ code, $W_{C^{⊥}} (x, y) = ∣ C ∣^{- 1} W_{C} (x + (q - 1) y, x - y)$ .

Proof. Fix a nonprincipal additive character $χ$ of $F_{q}$ . For $v \in F_{q}^{n}$ define $g (v) = x^{n - wt (v)} y^{wt (v)}$ and its character transform $\overset{g}{^} (u) = \sum_{v \in F_{q}^{n}} χ (⟨ u, v ⟩) g (v)$ . Because $g$ factors over coordinates, $\overset{g}{^}$ does too: in each coordinate $\sum_{a \in F_{q}} χ (u_{i} a) x^{[a = 0]} y^{[a \neq = 0]}$ equals $x + (q - 1) y$ when $u_{i} = 0$ and $x - y$ when $u_{i} \neq = 0$ (since $\sum_{a \neq = 0} χ (u_{i} a) = - 1$ for $u_{i} \neq = 0$ ). Hence $\overset{g}{^} (u) = (x + (q - 1) y)^{n - wt (u)} (x - y)^{wt (u)}$ . Now apply the Poisson summation formula for the finite group $F_{q}^{n}$ and the subgroup $C$ : $$ \sum_{u \in C^\perp} g(u) ;=; \frac{1}{|C|}\sum_{v \in C}\hat g(v). $$ The left side is $W_{C^{⊥}} (x, y)$ . The right side is $∣ C ∣^{- 1} \sum_{v \in C} (x + (q - 1) y)^{n - wt (v)} (x - y)^{wt (v)} = ∣ C ∣^{- 1} W_{C} (x + (q - 1) y, x - y)$ . The two are equal. $□$

Connections Master

The parity-check distance criterion rests on the rank theory of finite-dimensional spaces in 01.01.05 (rank-nullity) and 01.01.04 (subspace, basis, dimension): a code is a subspace, its dual is the annihilator under the standard form, and $dim C + dim C^{⊥} = n$ is the rank-nullity identity applied to the syndrome map $x \mapsto H x^{T}$ . The foundational reason the bounds are linear-algebraic, not merely combinatorial, is that the whole error-geometry is encoded in the column space of $H$ .
The arithmetic of the alphabet is the finite-field theory of 21.02.01: codes over $F_{q}$ inherit MDS constructions (Reed-Solomon codes evaluate polynomials at field elements, so $n \leq q$ or $q + 1$ ) and the MacWilliams proof runs over the additive characters of $F_{q}^{n}$ , where the nondegenerate trace form supplies the Fourier duality. This is dual to the multiplicative structure used in cyclic codes, generated by factorisations of $x^{n} - 1$ over $F_{q}$ .
The co-produced unit 40.06.07 on perfect and Hamming codes is the equality case of the sphere-packing bound proved here: the Hamming codes are exactly the single-error-correcting perfect codes whose parity-check columns enumerate all nonzero vectors of $F_{q}^{n - k}$ , and the Golay codes are the remaining nondegenerate perfect codes. This unit supplies the sphere-packing inequality; 40.06.07 characterises when it is met.
The co-produced unit 40.06.05 on Hadamard matrices and codes connects through the Plotkin bound and its equality case: a Hadamard matrix of order $4 m$ yields a $[4 m, lo g_{2} (8 m), 2 m]$ code meeting the Plotkin bound, the high-distance regime $2 d > n$ where the bounds of this unit are sharpest, and the same matrices build the symmetric $2$ -designs of this chapter, tying codes to designs through their incidence structure.

Historical & philosophical context Master

Coding theory begins in 1948-1950 at Bell Labs. Claude Shannon's 1948 A Mathematical Theory of Communication proved, non-constructively, that reliable transmission is possible at any rate below the channel capacity, but exhibited no codes; the channel-coding theorem is recorded here only as the pointer to which the combinatorial bounds aspire. Richard Hamming, frustrated by weekend batch-computer failures, built the first explicit single-error-correcting codes and the sphere-packing bound, published in 1950 ^{[Hamming 1950]}; the $[7, 4]$ Hamming code is the prototype perfect code.

The existence bound is due to Edgar Gilbert in 1952 ^{[Gilbert 1952]}, who gave a sphere-covering greedy argument over arbitrary codes; Rom Varshamov in 1957 ^{[Varshamov 1957]} sharpened it to guarantee a linear code by the column-avoidance construction reproduced above, so the combined statement is the Gilbert-Varshamov bound. Richard Singleton stated the rank bound $d \leq n - k + 1$ in 1964 ^{[Singleton 1964]} and named the maximum-distance-separable codes meeting it, though the bound was implicit in earlier work of Joshi and Komamiya. Jessie MacWilliams established the weight-enumerator duality in her 1962 Harvard thesis and 1963 paper, giving the identity that bears her name and that, with Neil Sloane, became the organising theorem of the 1977 treatise ^{[MacWilliams-Sloane Ch. 5]}. The asymptotic Gilbert-Varshamov bound stood as the best known lower bound on binary code rate for thirty years and was first surpassed, for $q \geq 49$ , by the 1982 algebraic-geometry codes of Tsfasman, Vlăduţ, and Zink.

Bibliography Master

@article{shannon1948,
  author  = {Shannon, Claude E.},
  title   = {A Mathematical Theory of Communication},
  journal = {Bell System Technical Journal},
  volume  = {27},
  year    = {1948},
  pages   = {379--423, 623--656}
}

@article{hamming1950,
  author  = {Hamming, Richard W.},
  title   = {Error detecting and error correcting codes},
  journal = {Bell System Technical Journal},
  volume  = {29},
  number  = {2},
  year    = {1950},
  pages   = {147--160}
}

@article{gilbert1952,
  author  = {Gilbert, Edgar N.},
  title   = {A comparison of signalling alphabets},
  journal = {Bell System Technical Journal},
  volume  = {31},
  year    = {1952},
  pages   = {504--522}
}

@article{varshamov1957,
  author  = {Varshamov, Rom R.},
  title   = {Estimate of the number of signals in error correcting codes},
  journal = {Doklady Akademii Nauk SSSR},
  volume  = {117},
  year    = {1957},
  pages   = {739--741}
}

@article{singleton1964,
  author  = {Singleton, Richard C.},
  title   = {Maximum distance $q$-nary codes},
  journal = {IEEE Transactions on Information Theory},
  volume  = {10},
  number  = {2},
  year    = {1964},
  pages   = {116--118}
}

@article{macwilliams1963,
  author  = {MacWilliams, F. Jessie},
  title   = {A theorem on the distribution of weights in a systematic code},
  journal = {Bell System Technical Journal},
  volume  = {42},
  year    = {1963},
  pages   = {79--94}
}

@book{macwilliamssloane1977,
  author    = {MacWilliams, F. Jessie and Sloane, Neil J. A.},
  title     = {The Theory of Error-Correcting Codes},
  publisher = {North-Holland},
  year      = {1977}
}

@book{vanlint1999,
  author    = {van Lint, Jacobus H.},
  title     = {Introduction to Coding Theory},
  series    = {Graduate Texts in Mathematics},
  volume    = {86},
  edition   = {3rd},
  publisher = {Springer},
  year      = {1999}
}

@book{vanlintwilson2001,
  author    = {van Lint, Jacobus H. and Wilson, Richard M.},
  title     = {A Course in Combinatorics},
  edition   = {2nd},
  publisher = {Cambridge University Press},
  year      = {2001}
}

@article{tvz1982,
  author  = {Tsfasman, Michael A. and Vl\u{a}du\c{t}, Serge G. and Zink, Thomas},
  title   = {Modular curves, Shimura curves, and Goppa codes, better than Varshamov-Gilbert bound},
  journal = {Mathematische Nachrichten},
  volume  = {109},
  year    = {1982},
  pages   = {21--28}
}

Prerequisites

none — this is a leaf unit

Tier anchors

beginner: Hill 1986 *A First Course in Coding Theory* (Oxford) Ch. 1-5 (codes as sets of strings, Hamming distance, error correction by nearest neighbour, the parity-check idea); 3Blue1Brown 'Hamming codes' parts 1-2 for the parity-check picture
intermediate: van Lint 1999 *Introduction to Coding Theory* 3e (Springer GTM 86) Ch. 3-5 (linear codes, generator and parity-check matrices, the Singleton and Hamming bounds, MDS and perfect codes, syndrome decoding); van Lint-Wilson 2001 *A Course in Combinatorics* 2e (Cambridge) Ch. 18-20
master: van Lint-Wilson 2001 *A Course in Combinatorics* 2e (Cambridge) Ch. 18-20 (linear codes, weight enumerators, MacWilliams, the bounds); MacWilliams-Sloane 1977 *The Theory of Error-Correcting Codes* (North-Holland) Ch. 1-5, 19 (the definitive treatment of weight enumerators and the bounds); Hamming 1950 *Bell System Tech. J.* 29 (the sphere-packing bound and the original single-error-correcting codes); Gilbert 1952 *Bell System Tech. J.* 31; Varshamov 1957 *Dokl. Akad. Nauk SSSR* 117; Singleton 1964 *IEEE Trans. Inform. Theory* 10; MacWilliams 1963 (the weight-enumerator duality)

References

van Lint, J. H. — Introduction to Coding Theory · Springer Graduate Texts in Mathematics 86, 3rd edition (1999). Chapters 3-5 develop linear codes over a finite field, the generator matrix and parity-check matrix, the dual code, the characterisation of minimum distance as the least number of linearly dependent columns of a parity-check matrix, syndrome decoding and the standard array, the Singleton bound and MDS codes, the Hamming (sphere-packing) bound and perfect codes, and the Gilbert-Varshamov existence bound.
van Lint, J. H. & Wilson, R. M. — A Course in Combinatorics · Cambridge University Press, 2nd edition (2001). Chapters 18-20 treat codes inside combinatorics: linear codes as subspaces, the weight enumerator, the MacWilliams identity relating a code's and its dual's weight enumerators, the Singleton, Hamming, Plotkin, and Gilbert-Varshamov bounds, and the link between codes, designs, and Hadamard matrices.
MacWilliams, F. J. & Sloane, N. J. A. — The Theory of Error-Correcting Codes · North-Holland (1977). The standard reference: Chapters 1-5 cover linear codes, generator/parity-check matrices, the standard array, the Singleton, Hamming, Plotkin, and Gilbert-Varshamov bounds, and MDS and perfect codes; Chapter 19 develops the weight enumerator and the MacWilliams identity in full, including the dual-distance distribution and the linear-programming bound.
Hamming, R. W. — Error detecting and error correcting codes · Bell System Technical Journal 29 (1950), 147-160. The founding paper: Hamming distance, the sphere-packing (Hamming) bound, the single-error-correcting Hamming codes, and the perfect-code condition.
Gilbert, E. N. — A comparison of signalling alphabets · Bell System Technical Journal 31 (1952), 504-522. The greedy/sphere-covering existence argument giving a lower bound on the size of a code with prescribed minimum distance.
Varshamov, R. R. — Estimate of the number of signals in error correcting codes · Doklady Akademii Nauk SSSR 117 (1957), 739-741. The parity-check (column-avoidance) refinement of Gilbert's bound, guaranteeing a *linear* code of the stated parameters.
Singleton, R. C. — Maximum distance q-nary codes · IEEE Transactions on Information Theory 10 (1964), 116-118. The bound d <= n - k + 1 and the definition of maximum-distance-separable (MDS) codes meeting it with equality.

Estimated time

beginner: 18m
intermediate: 45m
master: 85m