15.13.01 · orgchem / biomolecules-na

Nucleic acid chemistry

draft3 tiersLean: none

Anchor (Master): Blackburn — Nucleic Acids: Structures, Properties, and Functions; Saenger — Principles of Nucleic Acid Structure; Carey & Sundberg — Advanced Organic Chemistry Part A

Intuition [Beginner]

DNA and RNA carry genetic information in the sequence of their nucleotide building blocks. Each nucleotide has three parts: a sugar (ribose in RNA, deoxyribose in DNA), a phosphate group, and a nitrogen-containing base. The bases are the information-carrying part — there are five of them, abbreviated A, T, G, C, and U.

A nucleoside is the sugar-plus-base unit without the phosphate. A nucleotide is the nucleoside plus one or more phosphates. When nucleotides link together through phosphodiester bonds between the sugar of one and the phosphate of the next, they form a polynucleotide strand. DNA is a double-stranded polynucleotide; RNA is usually single-stranded.

The bases come in two families. Purines (adenine, A; guanine, G) have a fused two-ring structure. Pyrimidines (cytosine, C; thymine, T; uracil, U) have a single six-membered ring. DNA uses A, T, G, C. RNA uses A, U, G, C — uracil replaces thymine. The difference between thymine and uracil is a single methyl group on thymine.

The critical chemical insight is Watson-Crick base pairing. Adenine pairs with thymine (in DNA) or uracil (in RNA) through two hydrogen bonds. Guanine pairs with cytosine through three hydrogen bonds. This pairing is the molecular basis for the double helix: two DNA strands run in opposite directions (antiparallel), held together by A-T and G-C hydrogen bonds between the bases. The sequence of one strand determines the sequence of the other.

Visual [Beginner]

The DNA double helix looks like a twisted ladder. The sides of the ladder are the sugar-phosphate backbones (identical repeating units). The rungs are the base pairs: A pairs with T (two hydrogen bonds), G pairs with C (three hydrogen bonds).

Schematic of the DNA double helix showing two antiparallel strands. Left: a flat representation showing A-T and G-C base pairs with hydrogen bonds indicated. Right: the double-helix structure with major and minor grooves labelled. Below: the structure of adenosine monophosphate (AMP) with the ribose sugar, adenine base, and phosphate group identified.

You can see that G-C pairs, with three hydrogen bonds, hold the two strands together more tightly than A-T pairs. Regions of DNA with many G-C pairs require more heat to separate — a property used in PCR (polymerase chain reaction) to control when DNA strands come apart and re-anneal.

Worked example [Beginner]

Draw the structure of adenosine monophosphate (AMP) and show Watson-Crick base pairing for A-T.

AMP consists of three parts:

  1. Ribose — a five-carbon sugar with the formula in its open form, cyclised to a furanose ring. In RNA nucleotides the 2'-position has an OH group; in DNA (deoxy) it has only H.

  2. Adenine — a purine base attached to the ribose via a beta-N-glycosidic bond from N9 of adenine to C1' of the ribose.

  3. Phosphate — one phosphate group esterified to the 5'-OH of the ribose, forming the 5'-monophosphate.

The full structure: adenine-N9-C1'-ribose-O5'-phosphate.

A-T base pairing. Adenine (A) in DNA pairs with thymine (T) through two hydrogen bonds:

  • The N6 amino group of adenine () donates a hydrogen bond to the O4 carbonyl oxygen of thymine.
  • The N1 nitrogen of adenine accepts a hydrogen bond from the N3 hydrogen of thymine.

The geometry places the two bases in a coplanar arrangement with the hydrogen bonds holding them at a fixed distance, maintaining the uniform width of the double helix regardless of which base pair occupies a given position.

What this tells us: the two hydrogen bonds in A-T (and three in G-C) are the chemical basis for genetic information transfer. Every time a cell copies its DNA, these specific pairings ensure that the sequence is faithfully reproduced.

Check your understanding [Beginner]

Formal definition [Intermediate+]

A nucleoside is an -glycoside formed by the attachment of a purine or pyrimidine base to the anomeric carbon (C1') of a pentose sugar through a beta-glycosidic linkage. For purines the glycosidic bond is N9-C1'; for pyrimidines it is N1-C1'.

A nucleotide is a nucleoside phosphate — a phosphoric acid ester of a nucleoside. Monophosphates (AMP, GMP, CMP, UMP, TMP) have one phosphate; diphosphates (ADP, GDP) have two; triphosphates (ATP, GTP) have three in an anhydride chain. The high-energy phosphate bonds of ATP ( of hydrolysis ) are the universal energy currency of the cell.

Phosphodiester bond. In polynucleotides, the phosphate links the 3'-OH of one nucleotide's sugar to the 5'-OH of the next, forming a 3',5'-phosphodiester bond. The backbone has a repeating charge of -1 per phosphate at physiological pH, making nucleic acids polyanions. The directionality is always specified: strand polarity runs 5' to 3'.

Watson-Crick base pairing. In the B-form DNA double helix (the most common form under physiological conditions):

  • A-T: two hydrogen bonds. Donor-acceptor pairs: (N6-H of A) to (O4 of T), and (N3-H of T) to (N1 of A).
  • G-C: three hydrogen bonds. Donor-acceptor pairs: (N1-H of G) to (N3 of C), (N2-H of G) to (O2 of C), and (O6 of G) to (N4-H of C).

Double-helix geometry (B-form). 10 base pairs per turn, pitch 3.4 nm, helix diameter 2.0 nm. Major groove (2.2 nm wide) and minor groove (1.2 nm wide) provide distinct chemical environments for protein-DNA recognition. The backbone follows a right-handed helical path with the base pairs stacked perpendicular to the helix axis.

Chemical stability. RNA is susceptible to base-catalysed hydrolysis at the phosphodiester bond because the 2'-OH can act as an intramolecular nucleophile, attacking the adjacent phosphate and cleaving the backbone. DNA lacks the 2'-OH and is resistant to this mechanism, which is why DNA is the stable long-term genetic storage molecule and RNA serves shorter-lived roles (messenger, catalytic, regulatory).

Counterexamples to common slips

  • RNA is not just single-stranded. RNA molecules fold into complex secondary and tertiary structures through intramolecular base pairing, forming stem-loops, hairpins, pseudoknots, and catalytic active sites (ribozymes).

  • Not all base pairing is Watson-Crick. Wobble base pairs (G-U in RNA), Hoogsteen pairs, and base triples occur in RNA structures and in DNA under certain conditions. The genetic code's degeneracy at the third codon position is mediated by wobble pairing.

  • DNA has three common helical forms. B-DNA (right-handed, physiological conditions), A-DNA (right-handed, dehydrated conditions, also adopted by RNA duplexes), and Z-DNA (left-handed, GC-rich sequences under high salt).

  • Base stacking contributes more to duplex stability than hydrogen bonding. The vertical stacking of aromatic bases on top of one another — driven by pi-pi interactions, dipole-dipole stabilisation, and van der Waals contacts — provides the dominant thermodynamic contribution to double-helix stability. The hydrogen bonds between base pairs are essential for specificity (selecting the correct partner), but the overall free energy that holds the helix together comes mainly from stacking.

Key theorem with proof [Intermediate+]

Proposition (Chargaff's rules and complementarity). In double-stranded DNA, the molar amount of adenine equals thymine (), and the molar amount of guanine equals cytosine (). This follows from Watson-Crick base pairing.

Proof. Watson-Crick pairing pairs every A with a T and every G with a C. In a double-stranded DNA molecule, every base on one strand is paired with a complementary base on the other strand. Therefore the total number of A residues equals the total number of T residues (each A-T pair contributes one A and one T), and the total number of G residues equals the total number of C residues (each G-C pair contributes one G and one C). This gives and . A further consequence: the sum , i.e., total purines equal total pyrimidines.

Chargaff discovered these ratios experimentally in 1950 [Chargaff 1950]; Watson and Crick explained them structurally in 1953 [Watson-Crick 1953].

Bridge. Chargaff's complementarity builds toward 17.05.01 pending DNA replication, where semiconservative copying requires each daughter strand to inherit one parent strand, and toward 17.05.02 pending transcription, where the template strand directs mRNA synthesis through the same A-T/U and G-C pairing chemistry. The foundational reason this system works is that hydrogen-bond geometry fixes the allowable partners — adenine's donor-acceptor pattern matches only thymine or uracil, guanine's matches only cytosine. This is exactly the chemical constraint that makes the genetic code a digital information system: four symbols, two complementary pairs, unambiguous reading. The bridge from organic-chemistry base pairing to molecular-biology information flow appears again in 15.14.01 pending enzyme mechanism, where polymerases read and write this complementarity at the catalytic level.

Exercises [Intermediate+]

DNA structure beyond the B-form helix [Master]

The B-form double helix (10 bp/turn, right-handed, C2'-endo sugar pucker) is the dominant conformation of DNA at physiological water activity and moderate salt concentration. But the polynucleotide backbone has six rotatable torsion angles per nucleotide (, , , , , ) and the glycosidic bond between base and sugar has an additional torsion angle (). The combination of these seven degrees of freedom, modulated by base sequence, counterion identity, water activity, and superhelical stress, produces several distinct helical families.

A-form DNA. Under dehydrating conditions (ethanol precipitation, high salt, low humidity), DNA undergoes a cooperative transition to the A-form. A-DNA is right-handed with 11 bp/turn, a pitch of 2.8 nm, and a helix diameter of 2.3 nm. The sugar pucker shifts from C2'-endo to C3'-endo, which tilts the base pairs approximately 20 degrees from perpendicular to the helix axis and pushes them away from the axis, creating a hollow central core. The major groove becomes deep and narrow (unfavourable for protein access), while the minor groove becomes wide and shallow. RNA duplexes adopt A-form geometry by default because the 2'-OH sterically prevents the C2'-endo sugar pucker required for B-form. Protein-RNA recognition in ribosomes and spliceosomes operates on A-form geometry, not B-form. The B-to-A transition was characterised by fibre diffraction by Fuller, Wilkins, and collaborators in 1965, and the structural parameters were refined by Dickerson and colleagues using single-crystal X-ray crystallography in the 1970s and 1980s.

Z-form DNA. Z-DNA is a left-handed helical conformation adopted by alternating purine-pyrimidine sequences (notably repeats) under conditions of high salt or negative supercoiling. The backbone follows a zigzag path (hence "Z") because the glycosidic torsion angle alternates between anti at pyrimidines and syn at purines — a stark departure from the all-anti conformation of B-DNA. Z-DNA has 12 bp/turn (6 dinucleotide repeats), a diameter of 1.8 nm, and a single deep minor groove with no discernible major groove. The biological significance of Z-DNA remained controversial for decades after its discovery by Wang, Quigley, and Rich in 1979, but the identification of Z-DNA-binding proteins (notably ADAR1 and ZBP1) has established that Z-DNA forms transiently during transcription at regions of negative superhelical stress and serves as a signalling scaffold for innate immune responses and RNA editing. The B-to-Z transition requires high energy (approximately kcal/mol per base pair relative to B-form in physiological salt), so only sequences with favourable stacking geometry (alternating purine-pyrimidine) can access Z-form under physiological conditions.

Triplex DNA (H-DNA). Polypurine-polypyrimidine tracts can form three-stranded structures in which a third polypyrimidine strand binds in the major groove of a Watson-Crick duplex through Hoogsteen hydrogen bonds. The canonical triplex motifs are T-AT (pyrimidine motif, requiring slightly acidic pH to protonate cytosine in C-GC triples) and G-GC (purine motif, pH-independent). Intramolecular triplexes (H-DNA) form when one half of a mirror-repeat polypurine-polypyrimidine tract folds back and inserts itself into the major groove of the other half, releasing the complementary strand as single-stranded DNA. H-DNA was identified by Mirkin, Lyamichev, and Frank-Kamenetskii in 1987. Triplex-forming oligonucleotides have been explored as sequence-specific gene-silencing agents and as tools for site-directed DNA modification.

G-quadruplexes. Guanine-rich sequences, particularly those found at telomeres (the eukaryotic telomeric repeat is TTAGGG in humans) and in promoter regions of oncogenes (c-MYC, KRAS, VEGF), can form four-stranded structures built from stacked G-quartets. A G-quartet is a planar cyclic arrangement of four guanines connected by eight Hoogsteen hydrogen bonds, stabilised by a monovalent cation (K or Na) coordinated in the central channel. The cation sits between successive G-quartets, neutralising the electrostatic repulsion of the carbonyl oxygens pointing into the channel. G-quadruplex topologies vary: the strand can be parallel, antiparallel, or hybrid, depending on the loop geometry and the cation. The human telomeric sequence forms a basket-type antiparallel quadruplex in Na solution and a hybrid-1/2 parallel-antiparallel form in K solution, as established by NMR (Wang and Patel 1993) and X-ray crystallography (Parkinson, Lee, and Neidle 2002). The biological relevance of G-quadruplexes is now supported by the identification of quadruplex-specific helicases (BLM, WRN, FANCJ), by G4-specific antibodies, and by sequencing-based genome-wide mapping (Chambers et al. 2015, Nature Genetics). Telomerase inhibitors that stabilise G-quadruplexes at telomeres are in clinical trials as anticancer agents.

i-motif. The complementary structure to the G-quadruplex, formed by cytosine-rich sequences under mildly acidic conditions (pH 4.5–6.5). The i-motif consists of intercalated C-C base pairs: two parallel-stranded cytosine-rich duplexes intercalate in an antiparallel orientation, with each C-C pair held by three hydrogen bonds involving one protonated and one neutral cytosine. The requirement for cytosine protonation restricts i-motif formation to acidic pH, though stabilising sequences and molecular crowding can shift the pK upward. The i-motif was discovered by Gehring, Leroy, and Gueron in 1993. Its biological existence in cells was confirmed in 2018 by Zeraati et al. using iMab antibodies, and it appears to form preferentially in the regulatory regions of actively transcribed genes.

Chemical modifications of nucleic acids [Master]

The four canonical bases (A, T, G, C in DNA; A, U, G, C in RNA) are only the starting point. Cells chemically modify both DNA and RNA at specific positions, and these modifications carry regulatory information without altering the underlying genetic sequence.

DNA methylation: 5-methylcytosine and the epigenetic mark. The most prevalent DNA modification in vertebrates is methylation at the 5-position of cytosine, producing 5-methylcytosine (5mC). This reaction is catalysed by DNA methyltransferases (DNMT1 for maintenance methylation during replication; DNMT3A/3B for de novo methylation) using S-adenosylmethionine (SAM) as the methyl donor. The methyl group projects into the major groove, where it is read by methyl-CpG-binding domain (MBD) proteins that recruit chromatin-remodelling complexes to silence gene expression. In mammals, 5mC occurs predominantly at CpG dinucleotides (60–80% of CpGs are methylated in somatic cells). The unmethylated CpG islands near gene promoters are characteristic of actively transcribed genes.

The mechanism of DNMT1 involves flipping the target cytosine out of the DNA helix into the enzyme active site (base flipping, discovered by Klimasauskas et al. 1994), where a conserved cysteine attacks C6 of the cytosine ring, activating C5 for electrophilic methyl transfer from SAM. The covalent enzyme-DNA intermediate is then resolved by beta-elimination, regenerating the enzyme and releasing the methylated product. The 5mC mark is heritable through cell division: after replication, the hemi-methylated DNA (one methylated parent strand, one unmethylated daughter strand) is recognised by DNMT1's replication foci targeting sequence, which directs the enzyme to restore full methylation on the daughter strand.

Active demethylation proceeds through oxidation of 5mC by ten-eleven translocation (TET) enzymes, producing 5-hydroxymethylcytosine (5hmC), then 5-formylcytosine (5fC), then 5-carboxylcytosine (5caC). The final product 5caC is recognised and excised by thymine DNA glycosylase (TDG), and the resulting abasic site is repaired by the base-excision repair pathway, restoring an unmodified cytosine. This oxidation-excision cycle, discovered by Tahiliani et al. in 2009, provides an enzymatic route for dynamic DNA demethylation.

N6-methyladenine in DNA. 6mA was long considered restricted to bacterial restriction-modification systems, where it marks host DNA for self-recognition (the corresponding restriction endonuclease cleaves unmethylated foreign DNA). In 2015, high-sensitivity sequencing revealed that 6mA is also present in eukaryotic DNA (Greer et al. in C. elegans; Fu et al. in Drosophila; Zhang et al. in vertebrates), where it is enriched at transposon sequences and appears to activate rather than repress gene expression. The enzyme N6AMT1 catalyses 6mA deposition in mammalian cells. The demethylase ALKBH1 (a member of the AlkB family of Fe(II)/alpha-ketoglutarate-dependent dioxygenases) removes 6mA by oxidative reversal, analogous to the TET-mediated demethylation of 5mC.

RNA modifications: the epitranscriptome. RNA carries over 170 distinct chemical modifications, far exceeding the diversity of DNA modifications. The most abundant internal mRNA modification is N6-methyladenosine (m6A), deposited by the METTL3-METTL14 methyltransferase complex at the consensus motif DRACH (D = A/G/U, R = A/G, H = A/C/U). m6A affects mRNA stability, splicing, nuclear export, and translation efficiency. The "writer-reader-eraser" framework parallels DNA epigenetics: METTL3/14 write, YTH-domain proteins read, and FTO and ALKBH5 erase m6A. Structural studies show that m6A disrupts the anti conformation of adenosine, favouring a syn conformation that destabilises base pairing and promotes local RNA unfolding — the mechanistic basis for the reader proteins' ability to recognise m6A in single-stranded contexts.

tRNA is the most heavily modified RNA species: over 90 distinct modifications have been catalogued, including pseudouridine (psi, the C-C glycosidic isomer of uridine that provides an additional hydrogen-bond donor), dihydrouridine (D, which saturates the C5-C6 double bond and increases ring flexibility), queuosine (Q, a 7-deazaguanine derivative with an appended cyclopentenediol), and wybutosine (Y, a tricyclic modified guanosine in position 37 of tRNA-Phe that prevents frameshifting by stabilising codon-anticodon interaction). These modifications are introduced by dedicated enzymes that recognise specific tRNA structural features and carry out chemically diverse reactions: isomerisation, reduction, transglycosylation, and multistep scaffold construction.

Phosphorothioate backbone modification. The sulfur-substituted phosphate backbone (one non-bridging oxygen replaced by sulfur) is found naturally in bacterial DNA as a sequence-specific modification introduced by the dnd gene cluster. The P-S bond is more resistant to nuclease degradation and more lipophilic than the P-O bond. Synthetic phosphorothioate oligonucleotides are the backbone chemistry of all FDA-approved antisense drugs (nusinersen, inotersen, volanesorsen, milasen), where the modification increases plasma half-life from minutes to weeks and enables protein binding that facilitates tissue uptake. The stereochemistry at the phosphorus is racemic in standard synthesis (Rp and Sp diastereomers), and the two diastereomers have different properties — the Sp form is more nuclease-resistant but less efficiently loaded into RNase H. Stereopure phosphorothioate synthesis is an active area of medicinal chemistry development.

Solid-phase oligonucleotide synthesis [Master]

The chemical synthesis of defined-sequence DNA and RNA strands is the enabling technology for molecular biology, diagnostics, and nucleic acid therapeutics. Modern synthesis uses the phosphoramidite method, developed by Beaucage and Caruthers in 1981, which achieves coupling efficiencies exceeding 99.5% per step — sufficient to produce oligonucleotides of 100–200 nucleotides in useful yield.

The phosphoramidite building block. Each nucleotide is supplied as a protected phosphoramidite derivative. Three classes of protecting groups are used. The 5'-hydroxyl of the sugar is blocked with a dimethoxytrityl (DMT) group, a bulky acid-labile protecting group whose removal (detritylation with 3% dichloroacetic acid in dichloromethane) is the first step of each synthesis cycle. The phosphate is protected as a 2-cyanoethyl phosphoramidite; this group is removed in the final ammonia deprotection by beta-elimination of acrylonitrile. The exocyclic amines of A, C, and G are protected with acyl groups (benzoyl for A and C, isobutyryl for G) that are cleaved during the final ammonia treatment at 55 degrees Celsius. Thymine requires no exocyclic amine protection.

The four-step synthesis cycle. The synthesis proceeds on a solid support (controlled-pore glass, CPG) in the 3'-to-5' direction — the opposite of enzymatic synthesis, which proceeds 5'-to-3'. Each cycle adds one nucleotide:

  1. Detritylation. The DMT group is removed from the 5'-OH of the support-bound growing strand by treatment with dichloroacetic acid (DCA) in dichloromethane. The released orange DMT cation is monitored spectrophotometrically to assess coupling efficiency. A decrease in DMT release relative to the previous cycle indicates incomplete coupling.

  2. Coupling. The incoming phosphoramidite (dissolved in acetonitrile) is activated by a weak acid (tetrazole or 5-ethylthio-1H-tetrazole), which protonates the tertiary amine on the phosphoramidite nitrogen, making the phosphorus electrophilic. The 5'-OH of the support-bound strand attacks the phosphorus, forming a phosphite triester linkage. The coupling step is complete in 30–60 seconds with activated phosphoramidites. Typical coupling efficiencies are 98.5–99.5% per step.

  3. Capping. Any unreacted 5'-OH groups (from failed couplings) are acetylated with acetic anhydride and N-methylimidazole. This prevents deletion sequences — strands missing one or more nucleotides — from propagating further. The capping step does not affect successfully coupled material because the phosphite is not nucleophilic under capping conditions.

  4. Oxidation. The phosphite triester (P(III), unstable) is oxidised to phosphotriester (P(V), stable) by treatment with iodine in a pyridine-water mixture. This converts the labile phosphite linkage into the stable phosphate that will become the phosphodiester bond after deprotection.

After all nucleotides have been added, the oligonucleotide is cleaved from the support with concentrated aqueous ammonia (which also initiates removal of the cyanoethyl and acyl protecting groups), and the DMT group may be retained or removed depending on the purification strategy.

Coupling efficiency and yield. The overall yield of a full-length oligonucleotide is the coupling efficiency raised to the power of the number of steps. For a 20-mer (19 couplings) at 99% efficiency: , giving 82.6% full-length product. For a 100-mer at 99%: — only 37% full-length. At 99.5%: . The practical upper limit of current phosphoramidite chemistry is approximately 200 nucleotides, beyond which the yield of full-length product becomes too low for practical use without specialised purification.

RNA synthesis. RNA phosphoramidites require an additional protecting group on the 2'-OH (typically 2'-O-tert-butyldimethylsilyl, TBDMS, or 2'-O-TOM), which must be orthogonal to the 5'-DMT and the base-protecting groups. The 2'-protecting group is removed after synthesis with fluoride ion (TBAF or triethylamine trihydrofluoride), separate from the ammonia deprotection. RNA coupling efficiencies are slightly lower than DNA (98–99%) due to the additional steric bulk of the 2'-protecting group.

Connection to solid-phase peptide synthesis. The phosphoramidite cycle is the nucleic acid analogue of Merrifield solid-phase peptide synthesis (SPPS, treated in 15.12.01 pending). Both methods use acid-labile temporary protecting groups (DMT for DNA, Fmoc for peptides), base-labile or nucleophile-labile permanent protecting groups, and sequential coupling on a solid support. The parallel extends to purification: reversed-phase HPLC separates full-length from failure sequences in both cases. The conceptual connection between SPPS and phosphoramidite chemistry is made explicit in 15.10.01 retrosynthetic analysis, which treats both as instances of solid-phase sequential coupling.

Nucleic acid chemistry in therapeutics [Master]

The chemical properties of nucleic acids — base-pairing complementarity, enzymatic processability, and the synthetic accessibility of modified oligonucleotides — have made nucleic acid therapeutics one of the fastest-growing sectors of pharmaceutical chemistry.

Nucleoside analogue drugs. The earliest nucleic acid therapeutics are nucleoside analogues: structurally modified nucleosides that mimic natural substrates and are incorporated into DNA or RNA by viral polymerases, causing chain termination or lethal mutagenesis. Acyclovir (9-[(2-hydroxyethoxy)methyl]guanine), approved in 1982 for herpes simplex virus, lacks the 3'-OH of the ribose ring entirely — its "sugar" is an acyclic chain. Viral thymidine kinase phosphorylates acyclovir to the monophosphate; cellular kinases convert it to the triphosphate, which is incorporated by viral DNA polymerase. The absence of the 3'-OH prevents further chain extension, terminating replication. The selectivity arises because viral thymidine kinase has 200-fold higher affinity for acyclovir than cellular thymidine kinase.

Azidothymidine (AZT, zidovudine), approved in 1987 as the first anti-HIV drug, replaces the 3'-OH of thymidine with a 3'-azido group (N). HIV reverse transcriptase incorporates AZT-triphosphate into the growing DNA chain; the azido group cannot act as a nucleophile for the next phosphodiester bond formation, terminating the chain. The toxicity of AZT (bone marrow suppression, mitochondrial DNA depletion) reflects the fact that cellular DNA polymerase gamma can also incorporate AZT-TP at low frequency.

Remdesivir, developed for Ebola and repurposed for COVID-19, is a 1'-cyano-modified adenosine analogue. Unlike acyclovir and AZT, remdesivir does not cause immediate chain termination. Instead, the RNA-dependent RNA polymerase of SARS-CoV-2 incorporates remdesivir triphosphate and continues extension for three additional nucleotides before stalling — a mechanism termed "delayed chain termination." The 1'-cyano group induces a conformational distortion in the RNA product that is tolerated for one or two additional incorporations but becomes sterically incompatible with further extension at position +3 relative to the incorporation site. This delayed termination avoids the excision mechanisms that some viruses use to remove chain-terminating nucleotides.

mRNA vaccine chemistry. The COVID-19 mRNA vaccines (BNT162b2 by Pfizer-BioNTech, mRNA-1273 by Moderna) represent the translation of decades of nucleic acid chemistry research into a pharmaceutical product. The key chemical innovations are: (1) the use of N1-methylpseudouridine (m1psi) as the sole uridine substitute in the mRNA, which eliminates recognition by innate immune sensors (TLR7/8, RIG-I, MDA5) and increases translational output by 5–10-fold relative to unmodified mRNA; (2) the optimisation of 5' cap structure (Cap 1: m7GpppNm, where the first transcribed nucleotide carries a 2'-O-methyl group) to enhance ribosome recruitment and avoid activation of interferon-induced proteins with tetratricopeptide repeats (IFITs); (3) the replacement of the 3' poly(A) tail with a segmented poly(A) sequence (A30-linker-A70) that resists deadenylation; and (4) the incorporation of modified UTR sequences that stabilise the transcript and enhance translation.

The delivery vehicle is a lipid nanoparticle (LNP) consisting of four components: an ionisable lipid (ALC-0315 for Pfizer, SM-102 for Moderna) that is neutral at physiological pH but becomes cationic at endosomal pH (pH 5–6), facilitating endosomal escape; a helper phospholipid (DSPC); cholesterol (which stabilises the bilayer); and a PEGylated lipid (PEG-DMG) that prevents aggregation and opsonisation. The mRNA is encapsulated at 30–100 molecules per particle, with the ionisable lipid forming an inverted hexagonal phase at low pH that destabilises the endosomal membrane and releases the mRNA into the cytoplasm. The work of Kariko and Weissman on modified nucleosides in mRNA (2005, Immunity; 2008, Molecular Therapy) established the chemical foundation for these vaccines.

Antisense oligonucleotides and siRNA. Antisense oligonucleotides (ASOs) are short (15–25 nt) single-stranded DNA analogues that bind complementary mRNA by Watson-Crick pairing and induce degradation by RNase H (a cellular enzyme that cleaves the RNA strand of a DNA-RNA duplex). The phosphorothioate backbone modification (described above) provides nuclease resistance and promotes binding to serum proteins that facilitate tissue distribution. Gapmer ASOs have a central DNA region (8–10 nucleotides, required for RNase H recognition) flanked by modified RNA wings (2'-O-methoxyethyl or constrained ethyl, which increase binding affinity and nuclease resistance). Nusinersen (Spinraza), approved in 2016 for spinal muscular atrophy, is a 2'-O-methoxyethyl phosphorothioate gapmer that modulates SMN2 pre-mRNA splicing to increase production of functional SMN protein.

Small interfering RNAs (siRNAs) are 21–23 bp double-stranded RNA molecules that load into the RNA-induced silencing complex (RISC), where the passenger strand is discarded and the guide strand directs Argonaute 2 to cleave complementary mRNA. Therapeutic siRNAs require chemical modifications for stability: 2'-O-methyl and 2'-fluoro substitutions on the ribose, phosphorothioate linkages at the termini, and a 3' dTdT overhang. Patisiran (Onpattro), approved in 2018 for hereditary transthyretin amyloidosis, was the first siRNA drug. It is delivered in a lipid nanoparticle (similar to the mRNA vaccine LNPs) because siRNA does not benefit from the protein-binding properties of phosphorothioate backbones.

Aptamers and SELEX. Aptamers are single-stranded DNA or RNA molecules (40–80 nt) that fold into defined three-dimensional structures capable of binding specific target molecules with antibody-like affinity and specificity (Kd in the nM–pM range). They are selected by systematic evolution of ligands by exponential enrichment (SELEX), developed independently by Tuerk and Gold (1990, Science) and by Ellington and Szostak (1990, Nature). The SELEX process begins with a large random-sequence library ( variants), applies a selection pressure (binding to immobilised target), partitions bound from unbound sequences, amplifies the bound pool by PCR (for DNA SELEX) or RT-PCR plus in vitro transcription (for RNA SELEX), and repeats for 8–15 rounds. Chemical modifications introduced during SELEX (2'-fluoro-pyrimidines, 2'-O-methyl purines) produce aptamers that are nuclease-resistant from the point of selection, avoiding the post-selection optimisation required for unmodified aptamers. Pegaptanib (Macugen), approved in 2004 for age-related macular degeneration, was the first aptamer drug — a 2'-fluoro, 2'-O-methyl RNA aptamer targeting VEGF165.

Synthesis. The therapeutic oligonucleotide field depends on the phosphoramidite chemistry detailed in the previous section, scaled to multi-kilogram production with automated synthesisers holding up to 10 mmol of support. The process chemistry challenges include: maintaining coupling efficiency above 99% at scale; managing the large volumes of acetonitrile solvent; achieving stereoselective phosphorothioate incorporation (for next-generation Sp-pure products); and developing purification methods (ion-exchange HPLC, tangential flow filtration) that remove failure sequences to pharmacopoeia standards.

Synthesis. Putting these together, the four Master-tier sub-sections form a coherent arc: alternative DNA structures reveal that the double helix is not a rigid scaffold but a conformationally responsive polymer whose shape encodes regulatory information beyond the base sequence; chemical modifications layer additional information on both DNA and RNA without changing the primary sequence; solid-phase synthesis makes any desired sequence accessible at laboratory and industrial scale; and therapeutic applications translate these chemical properties into drugs that read, write, edit, and destroy specific nucleic acid sequences inside living cells. The foundational reason nucleic acid chemistry supports all four of these domains is the complementarity between base-pairing specificity (information) and backbone modularity (synthetic accessibility). This is exactly the dual nature that identifies nucleic acids as both information molecules and drug molecules — the bridge is between chemistry and information science, and the pattern generalises from small-molecule nucleoside analogues to large-molecule mRNA therapeutics, with the same Watson-Crick chemistry 15.12.01 pending and phosphodiester bonding providing the molecular substrate throughout. Appears again in 17.05.01 pending DNA replication and 17.05.02 pending transcription, where the same base-pairing and backbone chemistry operates under enzymatic control.

Connections [Master]

  • Amino acids and protein chemistry 15.12.01 pending. Nucleic acid-protein interactions — transcription factors binding DNA grooves, histones packaging DNA into chromatin, ribosomes decoding mRNA into polypeptide chains — depend on the hydrogen-bonding, electrostatic, and hydrophobic properties of both nucleotide bases and amino acid side chains. The protein-nucleic acid recognition code (direct readout via major-groove hydrogen bonds, indirect readout via backbone shape) requires understanding the chemistry of both classes of biomolecule.

  • Enzyme mechanism 15.14.01 pending. DNA polymerases, RNA polymerases, ligases, topoisomerases, and ribozymes are enzyme-catalysed or RNA-catalysed reactions that operate on nucleic acid substrates. Phosphodiester bond formation by polymerases proceeds through a two-metal-ion mechanism analogous to the phosphoryl-transfer chemistry in kinase-catalysed reactions. The polymerase active site enforces Watson-Crick selectivity through geometric complementarity, not through the thermodynamics of base pairing alone.

  • Retrosynthetic analysis 15.10.01. Automated DNA synthesis by phosphoramidite chemistry is a solid-phase synthetic method directly analogous to SPPS (Merrifield solid-phase peptide synthesis). Both use acid-labile temporary protecting groups, orthogonal permanent protecting groups, sequential coupling cycles on a solid support, and HPLC purification of full-length product from failure sequences.

  • NMR spectroscopy 15.11.01. Nucleic acid NMR exploits the imino proton region (10–15 ppm) to characterise base-pairing status, hydrogen-bond geometry, and secondary structure in solution. Exchangeable imino protons are protected from solvent exchange when engaged in Watson-Crick hydrogen bonds, and their chemical shifts report on base-pair type (A-T vs G-C) and stacking context.

  • DNA replication 17.05.01 pending. The biochemistry of DNA replication depends entirely on the base-pairing chemistry established here: semiconservative replication, primer-template junctions, leading and lagging strand synthesis, and the fidelity mechanisms that polymerases use to enforce Watson-Crick selectivity. The phosphodiester bond formation reaction catalysed by DNA polymerase is the enzymatic analogue of the phosphoramidite coupling reaction used in solid-phase synthesis.

Historical & philosophical context [Master]

Friedrich Miescher isolated "nuclein" from pus cells in 1869, identifying a phosphorus-rich substance distinct from proteins. The chemical distinction between the two types of nucleic acid (DNA and RNA) was established by Levine and Jacobs in 1909–1911, who identified deoxyribose as the sugar in thymus-derived nucleic acid and ribose as the sugar in yeast-derived nucleic acid. Chargaff's 1950 paper established the base-composition regularities (, ) that would become the cornerstone of the double-helix model [Chargaff 1950].

Watson and Crick proposed the antiparallel double-helix structure in 1953, drawing on Franklin's X-ray diffraction photographs (Photograph 51, taken by Franklin and Gosling, showing the characteristic X-pattern of a helical structure with 3.4 A repeat and 2.0 nm diameter) and on Chargaff's rules [Watson-Crick 1953]. Franklin's own structural analysis appeared in the same issue of Nature [Franklin 1953]. The Watson-Crick model made two predictions that galvanised molecular biology: that each strand serves as a template for its complement (semiconservative replication, confirmed by Meselson and Stahl in 1958), and that the sequence of bases carries genetic information in a linear code (confirmed by Nirenberg and Matthaei in 1961, who showed that poly-U mRNA encodes polyphenylalanine).

Khorana's total chemical synthesis of a tRNA gene (1965–1970), using phosphodiester chemistry to assemble 77-base-pair duplex DNA from synthetic oligonucleotides, demonstrated that defined-sequence DNA could be produced entirely by chemical means [Khorana 1970]. This work earned Khorana the 1968 Nobel Prize (shared with Nirenberg and Holley) and established the conceptual foundation for the phosphoramidite synthesis methods that followed. Beaucage and Caruthers introduced the phosphoramidite approach in 1981 [Beaucage-Caruthers 1981], replacing the earlier phosphotriester and H-phosphonate methods and achieving the coupling efficiencies that made automated oligonucleotide synthesis practical.

The development of PCR by Mullis in 1983 (Nobel Prize 1993) exploited the temperature dependence of DNA duplex melting and reannealing — a direct application of the thermodynamic consequences of base-pairing and base-stacking chemistry. CRISPR-Cas9 gene editing, developed by Doudna and Charpentier (Nobel Prize 2020), exploits guide-RNA–DNA base pairing for target recognition. The mRNA vaccines for COVID-19, deploying Kariko and Weissman's modified-nucleoside chemistry at industrial scale, represent the most recent translation of nucleic acid chemistry into therapeutic practice.

Bibliography [Master]

@article{WatsonCrick1953,
  author = {Watson, J. D. and Crick, F. H. C.},
  title = {A structure for deoxyribose nucleic acid},
  journal = {Nature},
  volume = {171},
  year = {1953},
  pages = {737--738},
}

@article{Franklin1953,
  author = {Franklin, R. E. and Gosling, R. G.},
  title = {Molecular configuration in sodium thymonucleate},
  journal = {Nature},
  volume = {171},
  year = {1953},
  pages = {740--741},
}

@article{Chargaff1950,
  author = {Chargaff, E.},
  title = {Chemical specificity of nucleic acids and mechanism of their enzymatic degradation},
  journal = {Experientia},
  volume = {6},
  year = {1950},
  pages = {201--209},
}

@article{BeaucageCaruthers1981,
  author = {Beaucage, S. L. and Caruthers, M. H.},
  title = {Deoxynucleoside phosphoramidites --- a new class of key intermediates for deoxypolynucleotide synthesis},
  journal = {Tetrahedron Lett.},
  volume = {22},
  year = {1981},
  pages = {1859--1862},
}

@article{WangRich1979,
  author = {Wang, A. H.-J. and Quigley, G. J. and Kolpak, F. J. and Crawford, J. L. and van Boom, J. H. and van der Marel, G. and Rich, A.},
  title = {Molecular structure of a left-handed double helical {DNA} fragment at atomic resolution},
  journal = {Nature},
  volume = {282},
  year = {1979},
  pages = {680--686},
}

@article{Dickerson1982,
  author = {Dickerson, R. E. and Drew, H. R. and Conner, B. N. and Wing, R. M. and Fratini, A. V. and Kopka, M. L.},
  title = {The anatomy of {A-, B-,} and {Z-DNA}},
  journal = {Science},
  volume = {216},
  year = {1982},
  pages = {475--485},
}

@article{TuerkGold1990,
  author = {Tuerk, C. and Gold, L.},
  title = {Systematic evolution of ligands by exponential enrichment: {RNA} ligands to bacteriophage {T4 DNA} polymerase},
  journal = {Science},
  volume = {249},
  year = {1990},
  pages = {505--510},
}

@article{KarikoWeissman2005,
  author = {Karik\'{o}, K. and Buckstein, M. and Ni, H. and Weissman, D.},
  title = {Suppression of {RNA} recognition by {Toll-like} receptors: the impact of nucleoside modification and the evolutionary origin of {RNA}},
  journal = {Immunity},
  volume = {23},
  year = {2005},
  pages = {165--175},
}

@article{Tahiliani2009,
  author = {Tahiliani, M. and Koh, K. P. and Shen, Y. and Pastor, W. A. and Bandukwala, H. and Brudno, Y. and Agarwal, S. and Iyer, L. M. and Liu, D. R. and Aravind, L. and Rao, A.},
  title = {Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian {DNA} by {MLL} partner {TET1}},
  journal = {Science},
  volume = {324},
  year = {2009},
  pages = {930--935},
}

@book{Saenger1984,
  author = {Saenger, W.},
  title = {Principles of Nucleic Acid Structure},
  publisher = {Springer-Verlag},
  year = {1984},
}

@book{Blackburn2006,
  author = {Blackburn, G. M. and Gait, M. J. and Loakes, D. and Williams, D. M.},
  title = {Nucleic Acids: Structures, Properties, and Functions},
  publisher = {RSC Publishing},
  year = {2006},
}

@book{VoetVoet2016,
  author = {Voet, D. and Voet, J. G.},
  title = {Biochemistry},
  edition = {5th},
  publisher = {Wiley},
  year = {2016},
}