17.05.03 · mol-cell-bio / gene-expression

Translation

draft3 tiersLean: nonepending prereqs

Anchor (Master): Steitz, *A structural understanding of the dynamic ribosome machine* (2008 Nobel Lecture); Noller, *Evolution of protein synthesis from an RNA world* (2004); Ramakrishnan, *Ribosome Structure and the Mechanism of Translation* (2002)

Intuition [Beginner]

Transcription made an RNA copy of a gene. Translation reads that RNA copy and builds a protein. The name is literal: the ribosome translates a nucleotide "language" into an amino acid "language."

The translation dictionary is the genetic code. Every three nucleotides (one codon) specifies one amino acid. There are 64 possible codons ( $4^{3}$ ) and only 20 standard amino acids, so most amino acids are encoded by more than one codon. This redundancy is called degeneracy and it serves as a buffer against mutations.

The key players are: the mRNA (the message), the ribosome (the machine that reads it), and transfer RNA (tRNA) molecules (the adapters). Each tRNA has an anticodon on one end that base-pairs with a specific codon on the mRNA, and carries the corresponding amino acid on the other end.

Translation proceeds in three stages. Initiation: the ribosome assembles on the mRNA at the start codon (AUG). Elongation: the ribosome moves along the mRNA, one codon at a time, adding amino acids to the growing chain. Termination: when a stop codon (UAA, UAG, UGA) is reached, the ribosome releases the finished protein.

Visual [Beginner]

Picture the ribosome as a molecular factory with three workstations: the A site (arrival), the P site (polymerization), and the E site (exit). The mRNA threads through the ribosome like tape through a player piano.

A tRNA carrying the correct amino acid enters the A site, matching its anticodon to the mRNA codon. The ribosome transfers the growing protein chain from the tRNA in the P site to the amino acid on the tRNA in the A site, forming a new peptide bond. Then the ribosome shifts one codon: the empty tRNA moves to E and exits, the chain-bearing tRNA shifts from A to P, and a new tRNA arrives at A.

Worked example [Beginner]

Translate the mRNA sequence AUG-CCU-UAU-GAA-UAA.

Step by step, using the genetic code:

Codon	Amino acid
AUG	Met (start)
CCU	Pro
UAU	Tyr
GAA	Glu
UAA	Stop

The protein is: Met-Pro-Tyr-Glu (4 amino acids). The AUG at the beginning is the start codon (always codes for methionine), and UAA is one of three stop codons that signal termination. The stop codon is NOT translated into an amino acid — it tells the ribosome to release the chain.

Check your understanding [Beginner]

Formal definition [Intermediate+]

Translation is the process by which the ribosome synthesizes a polypeptide chain by reading the nucleotide sequence of an mRNA and using tRNA adaptors to incorporate the corresponding amino acids in the specified order.

Ribosome structure

The ribosome is a ribonucleoprotein complex composed of a large and a small subunit, each built from rRNA and proteins.

Bacterial ribosome (70S):

Small subunit (30S): 16S rRNA + 21 proteins
Large subunit (50S): 23S rRNA + 5S rRNA + 31 proteins

Eukaryotic ribosome (80S):

Small subunit (40S): 18S rRNA + ~33 proteins
Large subunit (60S): 28S rRNA + 5.8S rRNA + 5S rRNA + ~49 proteins

The S values (Svedberg units) measure sedimentation rate and are not additive. The ribosome's catalytic center (peptidyl transferase) is composed entirely of rRNA — the ribosome is a ribozyme (RNA enzyme), with proteins playing structural and regulatory roles.

The genetic code

The standard genetic code maps 61 sense codons to 20 amino acids and 3 codons (UAA, UAG, UGA) to stop signals. The code is nearly universal across all life, with minor variations in mitochondria and some protozoa.

Key properties:

Degeneracy: 61 codons for 20 amino acids. The third position is often redundant (wobble).
Non-overlapping: Each nucleotide belongs to exactly one codon.
Commaless (no gaps): Codons are read sequentially without skipped nucleotides.
Unambiguous: Each codon specifies exactly one amino acid.

Wobble base pairing

The genetic code's degeneracy is accommodated by wobble at the third codon position. The base at the 5' position of the anticodon (first position of the anticodon, which pairs with the 3' position of the codon) can form non-standard base pairs:

G in anticodon pairs with U or C in codon
U in anticodon pairs with A or G in codon
Inosine (I, a modified base in tRNA) pairs with U, C, or A

This means a single tRNA species can recognize multiple codons. For example, a tRNA with anticodon 3'-AAG-5' recognizes both CUU and CUC codons (both code for leucine).

Translation mechanism

Initiation (eukaryotic):

The 40S subunit binds eIF1, eIF1A, eIF3, and the ternary complex (eIF2-GTP-Met-tRNAi).
The 43S pre-initiation complex binds the 5' cap of mRNA (via eIF4F).
The complex scans the mRNA 5' $\to$ 3' until it finds the AUG start codon in a favorable Kozak context (GCCACCAUGG).
eIF2 hydrolyzes GTP, initiation factors dissociate, and the 60S subunit joins to form the 80S initiation complex.

Elongation:

A-site entry: eEF1A-GTP delivers the correct aminoacyl-tRNA to the A site. Correct codon-anticodon matching triggers GTP hydrolysis and locks the tRNA in place.
Peptidyl transfer: The peptidyl transferase center (23S rRNA in bacteria; 28S rRNA in eukaryotes) transfers the growing peptide from the P-site tRNA to the A-site aminoacyl-tRNA, forming a new peptide bond.
Translocation: eEF2-GTP drives the ribosome forward by one codon. The deacylated tRNA moves from P to E (exit), the peptidyl-tRNA moves from A to P, and the A site opens for the next tRNA. GTP hydrolysis makes this step irreversible.

Termination: When a stop codon enters the A site, release factors (eRF1 in eukaryotes; RF1/RF2 in bacteria) recognize it. eRF1 mimics a tRNA shape and binds the A site, triggering hydrolysis of the peptidyl-tRNA bond in the P site. The polypeptide is released, and the ribosome dissociates into subunits (with help from eRF3-GTP and ABCE1).

Counterexamples to common slips

The ribosome is a protein enzyme. The peptidyl transferase center is composed entirely of rRNA. No protein side chain lies within 18 Angstroms of the catalytic site. The ribosome is a ribozyme, and proteins serve structural roles.
Each codon requires its own tRNA. Wobble base pairing (G-U, inosine) allows approximately 45 tRNA species to decode all 61 sense codons. A single tRNA with inosine at the wobble position can read three codons.
Translation always starts at the first AUG. Leaky scanning, internal ribosome entry sites (IRES), and reinitiation after upstream open reading frames (uORFs) all provide mechanisms for initiating at non-first AUGs.

Key theorem with proof [Intermediate+]

Theorem (Crick's frozen accident). The standard genetic code maps 61 sense codons to 20 amino acids and 3 codons to stop signals. This mapping is used (with very minor variations limited to mitochondria and a few protozoa) by all three domains of life — Bacteria, Archaea, and Eukarya. Since the mapping is chemically arbitrary (no physical principle links a particular codon to a particular amino acid once the aminoacyl-tRNA synthetases have evolved), and the probability of independent convergence on the same arbitrary code is vanishingly small, the code was established once in the last universal common ancestor and inherited by all descendants.

Proof. The argument proceeds in five steps.

The number of possible genetic codes is astronomically large. Assigning 20 amino acids to 61 sense codons (with 3 codons reserved for stop) admits approximately $2 0^{61}$ possible mappings, though physical constraints (error-buffering, stereochemical preferences) reduce the viable subset.
The mapping is arbitrary. Once aminoacyl-tRNA synthetases evolved to charge specific tRNAs with specific amino acids, the codon-amino acid correspondence was fixed by the synthetase-tRNA recognition identity, not by direct codon-amino acid chemical affinity.
All known organisms use the same standard code. The exceptions (mitochondrial codes, ciliate nuclear codes) are derived reassignments from the standard code — they differ at only 1-4 codon assignments and are phylogenetically restricted, consistent with rare secondary changes in an otherwise frozen code ^{[Crick 1968]}.
The probability that two independent origins of life would converge on the same code from among $\sim 1 0^{80}$ viable alternatives is of order $1 0^{- 80}$ — negligibly small.
Therefore the code was established once, before the divergence of the three domains of life, and inherited by all descendants with only rare local modifications.

$□$

Bridge. The universality argument builds toward 17.06.01 pending, where the code's degeneracy provides the first line of defence against point mutations — synonymous changes in the third codon position are silently tolerated. This is exactly the error-buffering architecture that makes the genetic code robust against random nucleotide substitutions: the foundational reason the code has survived billions of years of evolution is that its degenerate structure converts many potentially deleterious mutations into silent ones. The central insight — that an arbitrary mapping is conserved because changing it is lethal to every downstream process — appears again in 15.13.01 pending, where the uniform sugar-phosphate backbone of nucleic acids plays a similar conservative role in maintaining the chemical substrate for genetic information storage. The bridge is between the code's information content and the thermodynamic cost of preserving it, a theme that generalises across molecular biology.

Exercises [Intermediate+]

Exercise 3 (medium, symbolic).

A frameshift mutation deletes one nucleotide from the coding sequence. The original mRNA reads: AUG-CCU-UAU-GAA-UAA. If the third nucleotide (the G of AUG) is deleted, write the new reading frame and translate it.

Hint

After deleting the G, re-read in triplets from the start.

Answer

Original: A UG | CCU | UAU | GAA | UAA After deleting G: A UCC | UUA | UGA | AUA | A... New reading frame: AUC-CUU-AUG-AAU-AA...

Translation: Ile-Leu-Met-Asn... (continuing until a stop codon is encountered).

The frameshift changes every amino acid downstream of the deletion and almost certainly eliminates the original stop codon, producing a completely different (and likely non-functional) protein. This is why frameshift mutations are typically much more damaging than point mutations.

Exercise 4 (medium, symbolic).

Explain the wobble hypothesis. How does wobble explain why there are only ~45 different tRNA molecules in human cells rather than 61 (one per sense codon)?

Hint

The third codon position is less constrained. What does this mean for anticodon-codon pairing?

Answer

Crick's wobble hypothesis (1966) states that the base-pairing between the third position of the codon and the first position of the anticodon is less stringent than at the first two positions. This allows a single tRNA to recognize multiple codons that differ only in the third position.

For example, the four glycine codons (GGU, GGC, GGA, GGG) can be read by just two tRNAs: one with anticodon GCC (reads GGU and GGC) and one with anticodon UCC (reads GGA and GGG). With wobble pairing (G-U, I-U/C/A), ~45 tRNAs can cover all 61 sense codons. This economy of tRNA species is evolutionarily efficient.

Exercise 5 (medium, symbolic).

Puromycin is an antibiotic that resembles an aminoacyl-tRNA. It enters the A site and the ribosome adds the growing peptide chain to puromycin instead of to a real aminoacyl-tRNA. However, puromycin then dissociates from the ribosome. Explain why this prematurely terminates translation.

Hint

What happens when the peptide chain is transferred to something that does not stay in the ribosome?

Answer

Puromycin has an amino group (mimicking the aminoacyl end of tRNA) that accepts the peptidyl chain through a normal peptide bond formation. But puromycin lacks the tRNA backbone and anticodon — it has no way to remain anchored in the P site. After the peptide is transferred to puromycin, the puromycin-peptide conjugate diffuses out of the ribosome. The ribosome is left with an empty P site and cannot continue elongation. This is why puromycin is a chain terminator: it causes premature release of incomplete polypeptides.

Exercise 6 (hard, numeric).

Calculate the minimum number of high-energy phosphate bonds (GTP + ATP) consumed to synthesize a protein of 100 amino acids. Account for: initiation (1 GTP for eIF2, 1 GTP for eIF5B in eukaryotes), each elongation cycle (1 GTP for eEF1A delivery + 1 GTP for eEF2 translocation), aminoacyl-tRNA charging (2 ATP equivalents per amino acid — AMP + PPi, so 2 high-energy bonds), and termination (1 GTP for eRF3).

Hint

Add up: charging costs for 100 amino acids + initiation costs + 99 elongation cycles + termination cost.

Answer

Aminoacyl-tRNA charging: 100 amino acids $\times$ 2 high-energy bonds = 200 ATP equivalents.

Initiation: 2 GTP = 2 high-energy bonds.

Elongation: 99 cycles (first amino acid placed during initiation, so 99 additions) $\times$ 2 GTP/cycle = 198 GTP.

Termination: 1 GTP = 1 high-energy bond.

Total: 200 + 2 + 198 + 1 = 401 high-energy phosphate bonds per 100-amino-acid protein.

This works out to ~4 high-energy bonds per amino acid: 2 for charging the tRNA and ~2 for the ribosome cycle. This enormous energy cost is why translation is one of the most energy-intensive processes in the cell — a rapidly growing bacterial cell devotes up to 80% of its energy budget to protein synthesis.

Exercise 7 (hard, symbolic).

The ribosome's peptidyl transferase center is composed entirely of rRNA — no protein side chains are within 18 Angstroms of the catalytic site. Explain why this discovery (by Steitz, Moore, and colleagues in 2000) supports the "RNA world" hypothesis for the origin of life.

Hint

If the most fundamental catalytic activity in biology is performed by RNA, what does that suggest about early evolution?

Answer

The RNA world hypothesis proposes that early life used RNA both as genetic material and as catalyst, before the evolution of DNA (more stable storage) and protein (more versatile catalysis). The ribosome — the molecular machine that synthesizes all proteins — has a catalytic center made of RNA. This means that the most fundamental and ancient enzymatic activity in biology (peptide bond formation) is performed by RNA, not protein.

The argument is:

Translation is the process that makes all proteins.
Translation is catalyzed by a ribozyme (RNA enzyme).
This creates a chicken-and-egg problem if the ribosome were protein-based (proteins making proteins).
The RNA-based ribosome resolves the paradox: RNA can both store information and catalyze reactions, so an RNA-based translation system could have existed before proteins evolved.
The conserved core of the ribosome (the peptidyl transferase center and the decoding center) is composed of rRNA that is universal across all life, indicating it predates the divergence of the three domains of life.

Exercise 8 (hard, symbolic).

A point mutation changes the codon GAG (glutamic acid) to GUG (valine) in the beta-globin gene. This is the sickle-cell mutation. Explain why this single amino acid change causes hemoglobin to polymerize under low oxygen conditions, leading to sickle-shaped red blood cells. Consider the chemical properties of glutamic acid vs. valine.

Hint

Glutamic acid is charged and hydrophilic. Valine is non-polar and hydrophobic. What happens when a hydrophobic patch appears on the surface of a protein?

Answer

Glutamic acid (Glu, E) has a negatively charged carboxylate side chain at physiological pH. It is hydrophilic and typically faces outward on protein surfaces, interacting with water.

Valine (Val, V) is a non-polar, hydrophobic branched-chain amino acid. When it replaces Glu at position 6 of beta-globin, it creates a hydrophobic patch on the surface of deoxygenated hemoglobin.

This hydrophobic patch fits into a complementary hydrophobic pocket on a neighboring hemoglobin molecule. One hemoglobin's Val-6 inserts into another's hydrophobic pocket, and vice versa, creating long polymer chains (fibers) of deoxygenated hemoglobin. These fibers distort the red blood cell into the characteristic sickle shape.

The sickle cells are rigid and block small blood vessels (vaso-occlusion), causing pain crises, organ damage, and hemolytic anemia. The mutation is maintained at high frequency in malaria-endemic regions because heterozygotes (carriers) have partial protection against Plasmodium falciparum malaria — an example of balancing selection.

Lean formalization [Intermediate+]

lean_status: none. Mathlib does not provide definitions for biological translation processes: the ribosome, tRNA, the genetic code as a codon-to-amino-acid mapping, or elongation-factor kinetics. The genetic code could in principle be formalised as a function from Fin 64 to a sum type of 20 amino acid constructors plus a stop constructor, but the biochemical machinery that executes this function — the ribosome, aminoacyl-tRNA synthetases, elongation factors — is molecular biology, not mathematics. The kinetic-proofreading error-rate calculation (Hopfield 1974) involves a two-step Markov process that could be formalised with Mathlib's probability infrastructure, but the biological context that motivates the calculation is outside Mathlib's scope. This unit ships without a lean_module, reviewer-attested per BIOLOGY_PLAN.md §6.

Advanced results [Master]

Ribosome architecture and the ribozyme core

The ribosome is one of the largest and most conserved molecular machines in biology. The bacterial 70S ribosome has a molecular weight of approximately 2.5 MDa and consists of roughly two-thirds RNA by mass. The eukaryotic 80S ribosome is larger still at ~4.3 MDa, with additional rRNA expansion segments and extra ribosomal proteins. Despite the increase in size, the catalytic core — the peptidyl transferase center and the decoding center — is conserved across all life, with rRNA sequences in these regions showing >90% identity between bacteria and eukaryotes ^{[Noller 2004]}.

The peptidyl transferase center resides in domain V of 23S rRNA (bacteria) or 28S rRNA (eukaryotes). The key nucleotide is A2451 (E. coli numbering), which positions the alpha-amino group of the A-site aminoacyl-tRNA for nucleophilic attack on the ester bond between the peptidyl chain and the P-site tRNA. High-resolution crystal structures (to 2.4 Angstroms by the Steitz and Yonath laboratories, independently, in 2000) showed that no protein side chain lies within 18 Angstroms of the peptide bond being formed ^{[Steitz 2008]}. The reaction proceeds through a proton shuttle mechanism involving the 2'-OH of the P-site tRNA's terminal adenosine (A76), with A2451 facilitating substrate positioning rather than direct chemical catalysis. The transition state is stabilised by the precise geometry of the rRNA fold.

The decoding center sits in the body of the small subunit (16S rRNA in bacteria, 18S rRNA in eukaryotes). Nucleotides A1492, A1493, and G530 (E. coli numbering) flip out from helix 44 to inspect the minor groove of the codon-anticodon helix at the first two positions. Correct Watson-Crick geometry at positions 1 and 2 induces a conformational closure of the shoulder domain of the 30S subunit, which is communicated to EF-Tu (or eEF1A) as a signal to hydrolyse GTP. Position 3 (the wobble position) is monitored less stringently: G-U pairs are tolerated, and inosine-containing tRNAs exploit the relaxed geometry. Ramakrishnan's 2.8-Angstrom structure of the 30S subunit with cognate and near-cognate tRNA fragments in 2001 directly visualised this discrimination mechanism ^{[Ramakrishnan 2002]}.

The three tRNA binding sites — A (aminoacyl), P (peptidyl), and E (exit) — span both subunits. The anticodon end of each tRNA contacts the small subunit (decoding), while the acceptor end contacts the large subunit (peptide bond formation). During translocation, the tRNAs move through hybrid states (A/P then P/E) before reaching their final positions. Cryo-EM structures captured by Frank and Agrawal in 2000 revealed that translocation involves a large-scale intersubunit rotation (the ratchet motion): the small subunit rotates approximately 6-10 degrees relative to the large subunit, driven by EF-G-GTP hydrolysis. This rotation physically shifts the tRNA-mRNA complex by one codon.

The ribosome's antibiotic binding sites map directly onto its functional centers, which is clinically significant. Chloramphenicol binds in the peptidyl transferase center, blocking the A-site and preventing peptide bond formation. Tetracycline binds near the A site of the 30S subunit, physically blocking aminoacyl-tRNA entry. Erythromycin (a macrolide) binds in the peptide exit tunnel of the 50S subunit, sterically blocking the growing polypeptide chain from exiting — the ribosome can form a few peptide bonds before the chain hits the antibiotic wall and stalls. Clindamycin and linezolid target overlapping but distinct sites within the peptidyl transferase center. The structural basis for antibiotic action was established by the same crystallographic programs that solved the ribosome, and rational drug design against antibiotic-resistant mutants now proceeds from these structures.

Initiation, scanning, and alternative start mechanisms

Eukaryotic translation initiation is the most complex stage of the translation cycle, requiring at least 12 initiation factors (eIF1, eIF1A, eIF2, eIF3, eIF4A, eIF4B, eIF4E, eIF4G, eIF5, eIF5B, eIF6, and PABP) compared to just three in bacteria (IF1, IF2, IF3). The complexity reflects the additional regulatory layers that eukaryotes have evolved to control when and where translation begins.

The canonical pathway begins with the assembly of the 43S pre-initiation complex: the 40S subunit, eIF1, eIF1A, eIF3, and the ternary complex (eIF2 bound to GTP and the initiator Met-tRNAi). This complex is recruited to the 5' end of the mRNA by the eIF4F cap-binding complex, which consists of eIF4E (the cap-binding protein), eIF4G (a scaffold), and eIF4A (an RNA helicase). eIF4G simultaneously binds eIF4E at the cap and PABP (poly-A binding protein) at the 3' end, circularising the mRNA and enabling efficient reinitiation of ribosomes that have completed one round of translation.

The 43S complex then scans the 5' untranslated region (5' UTR) in the 5'-to-3' direction, unwinding secondary structure with eIF4A (stimulated by eIF4B). Scanning continues until the complex encounters an AUG codon in an optimal sequence context — the Kozak consensus (GCCACCAUGG in vertebrates, determined by Marilyn Kozak in 1986 through systematic mutagenesis of the start-codon flanking sequences) ^{[Kozak 1986]}. eIF1 maintains the scanning-competent open conformation of the 40S subunit; upon start-codon recognition, eIF1 is displaced, the 40S subunit closes around the codon-anticodon helix, and eIF2 hydrolyses its bound GTP (stimulated by eIF5). This conformational switch from scanning to locked is irreversible. The 60S subunit then joins, catalysed by eIF5B-GTP, and GTP hydrolysis by eIF5B releases the remaining initiation factors, producing the 80S initiation complex ready for elongation.

Leaky scanning occurs when the first AUG is in a weak Kozak context (e.g., UUAUGU rather than ACCAUGG). The 43S complex fails to recognise it as the start codon and continues scanning downstream. This mechanism allows a single mRNA to produce multiple protein isoforms from different start codons — the shorter isoform initiated from the downstream AUG often lacks an N-terminal domain present in the longer isoform. Roughly 40% of human mRNAs have at least one upstream AUG (uAUG) that could in principle be used, but leaky scanning bypasses many of them.

Internal ribosome entry sites (IRES) provide a cap-independent mechanism for translation initiation. First discovered in picornaviruses (poliovirus and EMCV) by Pelletier and Sonenberg in 1988 ^{[Pelletier and Sonenberg 1988]}, IRES elements are structured RNA sequences (typically 200-500 nucleotides) that directly recruit the 40S subunit to an internal position on the mRNA, bypassing the need for 5' cap recognition and scanning. Cellular IRES elements exist in mRNAs encoding proteins needed under stress conditions (e.g., HIF-1alpha, Apaf-1, XIAP, VEGF, Myc) when cap-dependent translation is suppressed. The molecular mechanisms of cellular IRES function are diverse: some mimic tRNA structure, others recruit specific IRES trans-acting factors (ITAFs), and viral IRES elements (like the hepatitis C virus IRES) can bind the 40S subunit directly without any initiation factors.

Upstream open reading frames (uORFs) are short coding sequences in the 5' UTR, upstream of the main coding sequence. Roughly 50% of human mRNAs contain at least one uORF. Most uORFs reduce translation of the downstream main ORF because ribosomes that translate the uORF frequently dissociate at its stop codon rather than reinitiating. The key regulatory example is ATF4, a transcription factor that activates stress-response genes. Under normal conditions, ribosomes translate uORF1, reinitiate at uORF2 (which overlaps the ATF4 start codon in a different reading frame), and terminate — ATF4 is not produced. Under stress, eIF2alpha phosphorylation reduces ternary complex availability. Ribosomes translating uORF1 now scan through uORF2 without reinitiating (because they lack Met-tRNAi) and reinitiate at the ATF4 start codon instead. This elegant mechanism, characterised by Hinnebusch and colleagues in yeast GCN4 and conserved in mammalian ATF4, converts a global reduction in translation into a selective increase in specific stress-response proteins ^{[Hinnebusch 2005]}.

Elongation fidelity and kinetic proofreading

The ribosome achieves a remarkably low error rate of approximately $1 0^{- 4}$ per codon (one misincorporation per 10,000 codons), far below the thermodynamic discrimination ratio of $\sim 1 0^{- 2}$ that single-step equilibrium selection could provide given the small free-energy differences between cognate and near-cognate codon-anticodon pairs. The gap is closed by kinetic proofreading, a mechanism proposed by Hopfield in 1974 and independently by Ninio in 1975 ^{[Hopfield 1974]}.

The key insight is that EF-Tu (in bacteria; eEF1A in eukaryotes) introduces an irreversible step — GTP hydrolysis — between two independent selection events. In the first selection step (initial selection), aminoacyl-tRNA arrives at the A site as a ternary complex (EF-Tu-GTP-aa-tRNA). Cognate tRNA induces the correct codon-anticodon geometry, triggering domain closure of the 30S shoulder and accelerating GTPase activation of EF-Tu by approximately 650-fold relative to near-cognate tRNA. Near-cognate tRNA has a higher probability of dissociating before GTP hydrolysis. After GTP hydrolysis and EF-Tu release, the aminoacyl-tRNA must accommodate into the peptidyl transferase center — a second selection step (proofreading). Near-cognate tRNA that survived initial selection now has a second chance to dissociate, with the probability of rejection at this step again depending on the codon-anticodon geometry. The two-step selection multiplies the discrimination ratios: if each step provides a factor of $F \approx 100$ , the combined selectivity is $F^{2} \approx 10, 000$ , matching the observed fidelity.

The thermodynamic cost is one GTP molecule per proofreading event — GTP hydrolysis is the irreversible timing step that separates the two selections. This is the fundamental trade-off: accuracy requires energy. Translating a protein of 300 amino acids at an error rate of $1 0^{- 4}$ consumes approximately 300 GTP molecules just on EF-Tu delivery (one per elongation cycle), with the proofreading expenditure implicit in the GTP hydrolysis that occurs whether the tRNA is correct or not. The total energy budget per amino acid (including charging and translocation) is approximately 4 high-energy phosphate bonds.

Translocation moves the mRNA-tRNA complex by one codon through the ribosome. In bacteria, EF-G-GTP drives this process; in eukaryotes, eEF2-GTP. The mechanism involves the ratchet-like intersubunit rotation described above. After peptide bond formation, the tRNAs occupy hybrid states (A/P and P/E). EF-G-GTP binding stabilises the rotated state and, upon GTP hydrolysis, drives the small subunit back to its non-rotated conformation, physically shifting the mRNA-tRNA complex by exactly one codon. Cryo-EM structures at sub-nanometer resolution by Frank, Ramakrishnan, and others between 2000 and 2010 visualised the ratchet motion and the role of EF-G's domain IV in mimicking tRNA anticodon stem-loop to maintain the reading frame during translocation.

The ribosome incorporates amino acids at approximately 20 residues per second in bacteria and 6 per second in eukaryotes. Codon usage bias — the non-uniform usage of synonymous codons — correlates with tRNA abundance in many organisms. Highly expressed genes use codons that match abundant tRNA species, optimising elongation speed and accuracy. Rare codons, matched to low-abundance tRNAs, cause translational pausing, which can affect co-translational protein folding by providing time windows for nascent-chain domains to fold before emerging fully from the ribosome exit tunnel. The relationship between codon usage, tRNA abundance, translation speed, and protein folding is an active area of research, with implications for heterologous protein expression (where host tRNA pools may not match the codon bias of the expressed gene) and for understanding the fitness consequences of synonymous mutations.

Translational regulation and quality control

Translation is regulated at multiple levels, with initiation being the primary control point. The mTOR (mechanistic target of rapamycin) pathway is the central signalling hub that couples nutrient availability, growth factor signalling, and energy status to translation rates. mTOR complex 1 (mTORC1) phosphorylates two key substrates: 4E-BP (eIF4E binding protein) and S6K (ribosomal protein S6 kinase).

4E-BP is the more consequential target. In its hypophosphorylated state, 4E-BP binds tightly to eIF4E (the cap-binding protein), sequestering it and preventing eIF4F assembly. mTORC1 phosphorylates 4E-BP on multiple residues (Thr37, Thr46, Ser65, Thr70 in human 4E-BP1), causing its release from eIF4E and allowing cap-dependent translation to proceed. Rapamycin (and its analogues, "rapalogs") partially inhibits mTORC1, reducing 4E-BP phosphorylation and selectively suppressing the translation of mRNAs with complex 5' UTR secondary structures — typically growth factors, proto-oncogenes, and cell-cycle regulators. These "TOP mRNAs" (named for their terminal oligopyrimidine tract) are disproportionately dependent on high eIF4E activity, making mTOR inhibition selectively toxic to cancer cells with dysregulated translation.

MicroRNA-mediated repression operates through the RISC (RNA-induced silencing complex), which contains an Argonaute protein loaded with a ~22-nucleotide miRNA. The miRNA base-pairs (imperfectly, in most animal cases) to complementary sites in the 3' UTR of target mRNAs. RISC represses translation through two mechanisms: (1) inhibition of initiation, by interfering with eIF4F assembly or promoting eIF4E sequestration, and (2) mRNA destabilisation, by recruiting deadenylase complexes that shorten the poly-A tail and trigger exonucleolytic decay. The balance between these mechanisms is context-dependent. More than 60% of human mRNAs are predicted targets of at least one miRNA, making miRNA-mediated regulation one of the most widespread post-transcriptional control mechanisms.

Nonsense-mediated mRNA decay (NMD) is the primary quality-control pathway for detecting and degrading mRNAs containing premature termination codons (PTCs). The molecular signal for NMD is the presence of an exon junction complex (EJC) downstream of a stop codon. During the pioneer round of translation, the ribosome displaces EJCs as it traverses the mRNA. If translation terminates at a normal stop codon (typically in the last exon), all EJCs have been removed. If termination occurs prematurely (at a PTC), one or more EJCs remain bound downstream. The SURF complex (SMG1 kinase, UPF1, eRF1, eRF3) assembles at the termination site, and UPF1 interacts with the EJC-bound proteins UPF2 and UPF3. This interaction triggers SMG1-mediated phosphorylation of UPF1, which recruits SMG5, SMG6, and SMG7, leading to mRNA degradation by both endonucleolytic cleavage (SMG6) and exonucleolytic decay. NMD prevents the production of truncated proteins that could have dominant-negative effects. Clinically, approximately 12% of disease-causing mutations introduce PTCs, and NMD modulates their phenotypic severity — if NMD degrades the mRNA efficiently, no truncated protein is made (often a milder phenotype than if the truncated protein accumulates).

No-go decay (NGD) targets mRNAs on which ribosomes have stalled during elongation — for example, at strong secondary structures, rare codon clusters, or damaged bases. Stalled ribosomes are detected by the ribosome quality control trigger (RQT) complex (Hel2 in yeast, ZNF598 in mammals), which ubiquitinates ribosomal proteins. This leads to ribosome splitting by the ASC-1 complex and endonucleolytic cleavage of the mRNA near the stall site. The resulting mRNA fragments are degraded by the exosome (3'-to-5') and Xrn1 (5'-to-3').

Ribosome-associated quality control (RQC) handles the protein products of stalled translation. After ribosome splitting, the nascent polypeptide remains attached to the P-site tRNA as a peptidyl-tRNA. The RQC complex, anchored by NEMF/Rqc2 in the yeast/mammalian systems, recruits the E3 ubiquitin ligase Ltn1/Listerin, which ubiquitinates the stalled nascent chain for proteasomal degradation. In a striking non-canonical elongation reaction, Rqc2 recruits tRNAs to the 60S subunit and adds CAT tails (repeating C-terminal Ala-Thr extensions) to the stalled nascent chain without mRNA or the 40S subunit — a translation-like reaction that occurs entirely on the 60S subunit. CAT tails serve as a degradation signal and may also promote the folding of stalled nascent chains into structures that are more accessible to the proteasome.

The integrated stress response (ISR) provides a unified mechanism by which four distinct stress sensors converge on a single target: phosphorylation of eIF2alpha on Ser51. The four kinases are PERK (activated by unfolded proteins in the ER, linking to the unfolded protein response), GCN2 (activated by uncharged tRNAs during amino acid starvation), PKR (activated by double-stranded RNA during viral infection), and HRI (activated by heme deficiency in erythroid cells). Phosphorylated eIF2alpha is a competitive inhibitor of eIF2B (the GEF that recycles eIF2-GDP to eIF2-GTP), reducing ternary complex availability and globally suppressing translation initiation. Paradoxically, certain mRNAs — most notably ATF4 — are translationally upregulated under eIF2alpha phosphorylation through the uORF mechanism described above. The ISR thus performs a wholesale reprogramming of the translatome: most protein synthesis is shut down (conserving resources), while stress-response proteins are selectively synthesised.

Synthesis. The foundational reason translation works as a reliable molecular machine is that the ribosome is a ribozyme — RNA catalysing peptide bond formation — and this is exactly the molecular fossil of the RNA world that predates the last universal common ancestor. Putting these together with the kinetic proofreading machinery of EF-Tu, the fidelity of translation (~1 error per $1 0^{4}$ codons) emerges not from thermodynamic equilibrium but from energy-driven kinetic barriers that exploit GTP hydrolysis as a timing mechanism. The central insight is that accuracy costs energy: the bridge is between the information-theoretic channel capacity of the genetic code and the thermodynamic cost of maintaining it. This pattern recurs in 15.14.01 pending enzyme mechanism, where catalytic precision also trades chemical energy for specificity, and generalises to 17.07.01 pending cell signalling, where phosphorylation cascades amplify weak signals through similar energy-dissipating proofreading steps.

Full proof set [Master]

Proposition 1 (Hopfield kinetic proofreading). Let $F$ be the thermodynamic discrimination ratio — the equilibrium binding constant for correct (cognate) tRNA divided by that for near-cognate tRNA at a given codon. In a single-step equilibrium selection, the misincorporation probability is $ϵ_{0} = 1/ F$ . In a two-step kinetic proofreading scheme with an irreversible GTP hydrolysis step separating the two selections, the misincorporation probability is reduced to $ϵ \approx 1/ F^{2}$ , at an energy cost of one GTP molecule per selection cycle.

Proof. Consider the selection pathway for aminoacyl-tRNA at the ribosomal A site. At initial selection, the ternary complex (EF-Tu-GTP-aa-tRNA) binds the A site. The cognate tRNA has binding constant $K_{C}$ and the near-cognate has $K_{N C}$ , with $F = K_{C} / K_{N C}$ . The rejection probability at initial selection is $p_{1} \approx 1$ for near-cognate (most dissociate before GTP hydrolysis) and $p_{1} \approx 1/ F$ for cognate (most proceed). GTP hydrolysis is irreversible and commits the tRNA to the second selection step (accommodation/proofreading). At proofreading, the same discrimination ratio $F$ applies because the codon-anticodon geometry is re-checked independently. The rejection probability at proofreading is again $p_{2} \approx 1$ for near-cognate and $p_{2} \approx 1/ F$ for cognate.

The overall misincorporation probability for near-cognate tRNA is the probability of surviving both selections: $ϵ = (1 - p_{1}) (1 - p_{2}) \approx (1/ F) \cdot (1/ F) = 1/ F^{2}$ . With $F \approx 100$ (the typical thermodynamic discrimination for a single mismatch at one codon position), $ϵ \approx 1 0^{- 4}$ , matching the observed ribosomal error rate. The cost is one GTP per delivery cycle regardless of whether the tRNA is correct (the GTP is hydrolysed in both cases upon EF-Tu release). $□$

Proposition 2 (Energy cost of translation). The synthesis of a polypeptide of $n$ amino acids in eukaryotes requires $4 n + 1$ high-energy phosphate bonds (ATP and GTP equivalents) at minimum: $2 n$ for aminoacyl-tRNA charging, $2 (n - 1)$ for elongation factor cycles, $2$ for initiation, and $1$ for termination.

Proof. Count the high-energy bonds consumed at each stage.

Charging: Each amino acid is attached to its cognate tRNA by an aminoacyl-tRNA synthetase using ATP $\to$ AMP + PPi. The pyrophosphosphate is subsequently hydrolysed by pyrophosphatase (PPi $\to$ 2 Pi), making the overall reaction irreversible. This consumes 2 high-energy phosphate bonds per amino acid (the phosphoanhydride bonds in ATP: one from ATP to ADP, one from PPi hydrolysis). Total: $2 n$ .

Initiation: The 43S pre-initiation complex requires eIF2-GTP (1 GTP) and 60S subunit joining requires eIF5B-GTP (1 GTP). Total: 2.

Elongation: Each elongation cycle adds one amino acid and consumes 2 GTP molecules: one for eEF1A-GTP delivery of the aminoacyl-tRNA (GTP hydrolysis upon correct codon-anticodon matching), and one for eEF2-GTP-driven translocation. The first amino acid (Met-tRNAi) is positioned during initiation, so there are $n - 1$ elongation cycles. Total: $2 (n - 1)$ .

Termination: eRF3-GTP drives polypeptide release. Total: 1.

Grand total: $2 n + 2 + 2 (n - 1) + 1 = 2 n + 2 + 2 n - 2 + 1 = 4 n + 1$ .

For a protein of 300 amino acids: $4 (300) + 1 = 1201$ high-energy phosphate bonds. At approximately 50 kJ/mol per phosphate bond, the total energy input is roughly 60 MJ/mol of protein — roughly 100 times the free energy of folding of a typical globular protein. $□$

Connections [Master]

Transcription 17.05.02 pending. Transcription produces the mRNA template that the ribosome reads, and mRNA processing events (5' capping, splicing, polyadenylation) directly determine translational efficiency. The 5' cap is the binding site for eIF4E; the poly-A tail recruits PABP; and the splicing history of the mRNA deposits exon junction complexes that serve as markers for nonsense-mediated decay. Transcription and translation are thus mechanistically coupled through the mRNA itself.
Amino acids and protein chemistry 15.12.01 pending. Translation polymerises amino acids into peptide chains through amide bond formation. The ester bond between the amino acid and the 3' OH of the terminal adenosine (A76) on tRNA is the activated intermediate for peptide bond formation — a nucleophilic acyl substitution reaction whose chemistry is the same as laboratory amide synthesis but catalysed by rRNA rather than by a chemical coupling reagent. The side-chain chemistry of each amino acid (hydrophobic, charged, polar, aromatic) determines the protein's folding and function.
Enzyme mechanism 15.14.01 pending. Kinetic proofreading by EF-Tu shares its operating principle with enzyme catalytic discrimination: both use irreversible energy-consuming steps to amplify small thermodynamic differences into large selectivity ratios. The Hopfield mechanism (GTP hydrolysis separating two independent selections) is the same principle that DNA polymerases use for replication fidelity (dNTP binding followed by conformational change), and that the immune system uses for B-cell receptor affinity maturation. The pattern recurs across molecular biology: specificity is bought with ATP or GTP.
Cell signalling 17.07.01 pending. mTOR signalling directly regulates translation initiation through 4E-BP phosphorylation, and the integrated stress response modulates global translation through eIF2alpha phosphorylation. Signal transduction pathways thus control the cell's protein synthesis capacity, coupling extracellular conditions (nutrient availability, growth factors, stress) to the rate of translation. The unfolded protein response, triggered by ER stress, feeds back on translation through PERK-mediated eIF2alpha phosphorylation — a direct signalling connection between translation quality and translation rate.
Action potential — ionic basis 17.09.02 pending. Voltage-gated ion channels are membrane proteins produced by translation; their density at the axon membrane depends on translation rates and trafficking efficiency. Conversely, action potentials consume ATP (for Na+/K+ pump activity), and the energy budget of a neuron is dominated by the cost of translation to replace ion channels and synaptic proteins. The action potential and translation are thus linked through the cell's energy budget: each spike has a downstream translational cost.

Historical & philosophical context [Master]

The genetic code was cracked between 1961 and 1966 in a series of experiments that rank among the most elegant in twentieth-century biology. Nirenberg and Matthaei's poly-U experiment (30 May 1961, reported at the International Congress of Biochemistry in Moscow) showed that a synthetic RNA consisting entirely of uracil (poly-U) directed the synthesis of polyphenylalanine in a cell-free system — establishing that UUU codes for phenylalanine, the first codon assignment ^{[NirenbergMatthaei1961 Nirenberg and Matthaei 1961]}. Khorana's laboratory synthesised defined-sequence polynucleotides (repeating dinucleotides, trinucleotides, and tetranucleotides) to map the remaining codons. By 1966 the complete code was known; Nirenberg, Khorana, and Holley shared the 1968 Nobel Prize in physiology or medicine.

Crick's 1966 wobble hypothesis ^{[Crick 1966]} explained why fewer than 61 tRNAs could decode all 61 sense codons, predicting non-standard base pairing at the third codon position before any modified bases in tRNA anticodon loops had been chemically identified. His 1968 paper "The origin of the genetic code" ^{[Crick 1968]} introduced the phrase "frozen accident" — the idea that the code was an arbitrary assignment established once in the earliest self-replicating systems and conserved because any change would be lethal across all downstream gene products. The frozen-accident hypothesis predicts that variant codes should be rare and phylogenetically restricted (derived rather than independent), which is borne out by the data: mitochondrial and ciliate code variants differ from the standard code at only a handful of codon assignments and are found in specific lineages.

The atomic structures of the ribosome, determined by Yonath, Steitz, and Ramakrishnan using X-ray crystallography and published in a cluster of landmark papers in 2000, revealed the ribosome at 2.4-Angstrom resolution ^{[Steitz 2008]}. Yonath had begun crystallising ribosomal subunits in 1980 and persisted through two decades of technical obstacles. The structures showed that the peptidyl transferase center is composed entirely of rRNA with no protein side chains near the catalytic site — the most dramatic structural confirmation of the ribozyme concept and, by extension, of the RNA world hypothesis. Ramakrishnan's subsequent structures of the 30S subunit with tRNA fragments visualised the decoding mechanism at atomic resolution ^{[Ramakrishnan 2002]}. The three shared the 2009 Nobel Prize in chemistry.

Bibliography [Master]

@article{NirenbergMatthaei1961,
  author = {Nirenberg, M. W. and Matthaei, J. H.},
  title = {The dependence of cell-free protein synthesis in {\it E. coli} upon naturally occurring or synthetic polyribonucleotides},
  journal = {Proc. Natl. Acad. Sci. USA},
  volume = {47},
  year = {1961},
  pages = {1588--1602},
}

@article{Crick1966wobble,
  author = {Crick, F. H. C.},
  title = {Codon--anticodon pairing: the wobble hypothesis},
  journal = {J. Mol. Biol.},
  volume = {19},
  year = {1966},
  pages = {548--555},
}

@article{Crick1968,
  author = {Crick, F. H. C.},
  title = {The origin of the genetic code},
  journal = {J. Mol. Biol.},
  volume = {38},
  year = {1968},
  pages = {367--379},
}

@article{Hopfield1974,
  author = {Hopfield, J. J.},
  title = {Kinetic proofreading: a new mechanism for reducing errors in biosynthetic processes requiring high specificity},
  journal = {Proc. Natl. Acad. Sci. USA},
  volume = {71},
  year = {1974},
  pages = {4135--4139},
}

@article{Ninio1975,
  author = {Ninio, J.},
  title = {Kinetic amplification of enzyme discrimination},
  journal = {Biochimie},
  volume = {57},
  year = {1975},
  pages = {587--595},
}

@article{Ramakrishnan2002,
  author = {Ramakrishnan, V.},
  title = {Ribosome structure and the mechanism of translation},
  journal = {Cell},
  volume = {108},
  year = {2002},
  pages = {557--572},
}

@article{Steitz2008,
  author = {Steitz, T. A.},
  title = {A structural understanding of the dynamic ribosome machine},
  journal = {Nat. Rev. Mol. Cell Biol.},
  volume = {9},
  year = {2008},
  pages = {242--253},
}

@article{Noller2004,
  author = {Noller, H. F.},
  title = {Evolution of protein synthesis from an {RNA} world},
  journal = {Cold Spring Harb. Symp. Quant. Biol.},
  volume = {69},
  year = {2004},
  pages = {33--44},
}

@article{Kozak1986,
  author = {Kozak, M.},
  title = {Point mutations define a sequence flanking the {AUG} initiator codon that modulates translation by eukaryotic ribosomes},
  journal = {Cell},
  volume = {44},
  year = {1986},
  pages = {283--292},
}

@article{Pelletier1988,
  author = {Pelletier, J. and Sonenberg, N.},
  title = {Internal initiation of translation of eukaryotic mRNA directed by a sequence derived from poliovirus {RNA}},
  journal = {Nature},
  volume = {334},
  year = {1988},
  pages = {320--325},
}

@article{Hinnebusch2005,
  author = {Hinnebusch, A. G.},
  title = {Translational regulation of {GCN4} and the general amino acid control of yeast},
  journal = {Annu. Rev. Microbiol.},
  volume = {59},
  year = {2005},
  pages = {407--450},
}

@book{Alberts2014,
  author = {Alberts, B. and Johnson, A. and Lewis, J. and Morgan, D. and Raff, M. and Roberts, K. and Walter, P.},
  title = {Molecular Biology of the Cell},
  edition = {6th},
  publisher = {Garland Science},
  year = {2014},
}

@book{Lodish2016,
  author = {Lodish, H. and Berk, A. and Kaiser, C. A. and Krieger, M. and Bretscher, A. and Ploegh, H. and Amon, A. and Martin, K. C.},
  title = {Molecular Cell Biology},
  edition = {8th},
  publisher = {W. H. Freeman},
  year = {2016},
}

Prerequisites

17.05.02 pending

Used in

15.12.01
17.10.01

Tier anchors

beginner: Khan Academy (translation); Amoeba Sisters — Protein Synthesis
intermediate: Alberts et al., *Molecular Biology of the Cell* (6th ed., Garland 2014), Ch. 6; Lodish et al., *Molecular Cell Biology* (8th ed., W. H. Freeman 2016), Ch. 5
master: Steitz, *A structural understanding of the dynamic ribosome machine* (2008 Nobel Lecture); Noller, *Evolution of protein synthesis from an RNA world* (2004); Ramakrishnan, *Ribosome Structure and the Mechanism of Translation* (2002)

References

NirenbergMatthaei1961 pending
Nirenberg, M. W. & Matthaei, J. H. — The dependence of cell-free protein synthesis in E. coli upon naturally occurring or synthetic polyribonucleotides · Proc. Natl. Acad. Sci. USA 47 (1961) 1588-1602 · see docs/catalogs/NEED_TO_SOURCE.md#bio-wave1-nirenberg-matthaei-1961
tong
raw/pdfs/statphys/statphys.pdf · §1.3 The Canonical Ensemble — the free-energy and partition-function framework that underpins the thermodynamic cost of kinetic proofreading in translation
TODO_REF pending
Alberts et al. — Molecular Biology of the Cell (6th ed., Garland 2014) · Ch. 6 — How Cells Read the Genome: From DNA to Protein; §§ From RNA to Protein · see docs/catalogs/NEED_TO_SOURCE.md#bio-wave1-mboc-ch6
TODO_REF pending
Steitz — A structural understanding of the dynamic ribosome machine (Nobel Lecture) · Angew. Chem. Int. Ed. 49 (2010) 4381-4398; Nobel Prize in Chemistry 2009 · see docs/catalogs/NEED_TO_SOURCE.md#bio-wave1-steitz-nobel
TODO_REF pending
Ramakrishnan — Ribosome Structure and the Mechanism of Translation · Cell 108 (2002) 557-572 · see docs/catalogs/NEED_TO_SOURCE.md#bio-wave1-ramakrishnan2002
TODO_REF pending
Noller — Evolution of protein synthesis from an RNA world · Cold Spring Harb. Symp. Quant. Biol. 69 (2004) 33-44 · see docs/catalogs/NEED_TO_SOURCE.md#bio-wave1-noller2004

Reviewer

Tyler (pending external biology reviewer per BIOLOGY_PLAN §6)

Estimated time

beginner: 15m
intermediate: 35m
master: 70m