15.10.01 · orgchem / retrosynthesis

Retrosynthetic analysis

shipped3 tiersLean: none

Anchor (Master): Corey & Cheng — The Logic of Chemical Synthesis (Wiley, 1989); Warren — Organic Synthesis: The Disconnection Approach 2nd ed.; Nicolaou & Sorensen — Classics in Total Synthesis; Carey & Sundberg — Advanced Organic Chemistry Part B 5th ed. Ch. 1-2

Intuition [Beginner]

A synthetic chemist faces a target molecule and works backward. The idea is simple: take the target apart one bond at a time, breaking it into simpler pieces until every piece is a commercially available starting material. This backward reasoning is called retrosynthetic analysis, and each imaginary break is a disconnection.

A disconnection is not a reaction running in reverse. It is a planned bond break, drawn as a wavy line through the bond, that identifies which two fragments the molecule splits into. Each fragment, called a synthon, carries a charge derived from the bond-breaking pattern. Synthons are theoretical constructs; the chemist then selects a synthetic equivalent — a real, neutral reagent that delivers the same reactivity.

Protecting groups enter when a functional group in the target would interfere with a planned reaction step. If a molecule has both an alcohol and an aldehyde, and the planned step involves a Grignard reagent that would attack the aldehyde and be destroyed by the alcohol's proton, the alcohol must be masked first — converted into a silyl ether that is inert to Grignard conditions and removed later.

Two broad strategies organise the plan. Linear synthesis builds the molecule one step at a time in a single chain: A becomes B, B becomes C, C becomes the target. If each step has a 90% yield, ten steps give an overall yield of about 35%. Convergent synthesis builds two halves separately and joins them late. The same per-step yield applied to two shorter branches joined by one high-yielding step generally preserves more material than a single long chain, because each branch has fewer steps in which to lose product.

Visual [Beginner]

Picture a target molecule as a completed jigsaw puzzle. Retrosynthetic analysis pulls the puzzle apart one piece at a time, working from the finished picture back toward the individual pieces you can buy.

The key visual tool is the retrosynthetic arrow, drawn as a thick open arrow pointing from the target to the precursor. It means "can be made from."

The diagram shows one disconnection at a time. For complex targets, the full retrosynthetic tree is a branching diagram with the target at the root and starting materials at the leaves.

Worked example [Beginner]

Target: 4-methyl-2-pentanone from acetone and any 2-carbon reagents.

4-Methyl-2-pentanone has the structure $C H_{3} C O C H_{2} C H (C H_{3})_{2}$ . It is a ketone with a branched alkyl group.

Step 1. Identify the carbonyl. The target contains one ketone group at C2. Carbonyls are among the most reliable disconnection sites because they are formed by well-understood carbon-carbon bond-forming reactions.

Step 2. Disconnect at the C2-C3 bond — between the carbonyl carbon and the adjacent carbon. This gives two synthons: an acetate enolate ( $C H_{3} C O C H_{2}^{-}$ , nucleophilic) and an isopropyl cation ( $(C H_{3})_{2} C H^{+}$ , electrophilic).

Step 3. Choose synthetic equivalents. The enolate synthon corresponds to acetone treated with LDA (lithium diisopropylamide). The cation synthon corresponds to isopropyl bromide as the electrophile.

Step 4. Write the forward reaction. Acetone, deprotonated with LDA at $- 7 8^{\circ}$ C, gives the lithium enolate. Alkylation with isopropyl bromide gives 4-methyl-2-pentanone.

What this tells us: the target is one step from commercially available acetone and isopropyl bromide. No protecting groups are needed because acetone has no interfering functional groups.

Check your understanding [Beginner]

Formal definition [Intermediate+]

A retrosynthetic analysis of a target molecule $T$ is the recursive application of transforms — the reverse of known chemical reactions — to generate a synthon tree whose leaves are available starting materials. The central objects are:

Disconnection. The imaginary cleavage of a bond in $T$ to produce two synthons. Formally, if $T$ contains a bond $A - B$ , the disconnection at $A - B$ replaces $T$ with the pair $(A^{+}, B^{-})$ or $(A^{-}, B^{+})$ , where the charge assignment follows the polar reactivity pattern of the corresponding forward reaction. Disconnections are drawn with a wavy line through the bond and a retrosynthetic arrow $⟹$ .

Synthon. A charged molecular fragment obtained by a disconnection. Synthons are not isolable species; they represent the reactivity pattern needed at that position. A nucleophilic synthon (donor) carries a negative charge; an electrophilic synthon (acceptor) carries a positive charge.

Synthetic equivalent. A neutral, stable reagent that delivers the same reactivity as the synthon. The acetate enolate synthon $C H_{3} C O C H_{2}^{-}$ has acetone + LDA as its synthetic equivalent. The isopropyl cation synthon $(C H_{3})_{2} C H^{+}$ has isopropyl bromide as its synthetic equivalent.

Transform. The reverse of a known forward reaction, applied as a pattern-matching rule. The aldol transform matches a $β$ -hydroxy carbonyl unit in the target and disconnects it to an enolate and an aldehyde.

Functional-group interconversion (FGI). The replacement of one functional group by another that is either easier to disconnect or more accessible from starting materials. Converting a ketone to an alkene via a Wittig reaction is an FGI that changes the disconnection options available.

The synthetic plan is a directed acyclic graph (DAG) whose root is $T$ and whose leaves are starting materials. Each edge represents one forward reaction. The plan is linear if the DAG is a single chain, and convergent if it branches.

Protecting-group strategy

A protecting group is a temporary modification of a functional group that masks its reactivity during a subsequent step. The canonical strategy is orthogonal protection: using two or more protecting groups that can be removed independently under different conditions, allowing selective deprotection in multi-functional targets.

Common protecting groups:

Functional group	Protecting group	Installation	Removal
Alcohol ( $- O H$ )	TBDMS ether	TBDMSCl, imidazole	TBAF or acid
Alcohol ( $- O H$ )	Acetate ester	Acetic anhydride, pyridine	Base hydrolysis
Aldehyde/ketone ( $C = O$ )	Acetal	$H O C H_{2} C H_{2} O H$ , acid	Aqueous acid
Amine ( $- N H_{2}$ )	Boc carbamate	$(B oc)_{2} O$ , base	TFA or HCl
Carboxylic acid ( $- C O O H$ )	Methyl ester	$C H_{2} N_{2}$ or MeOH, acid	Base hydrolysis

Counterexamples to common slips

Disconnecting at the wrong bond produces synthons with no known synthetic equivalent. Disconnecting a C-C bond that was formed by a pericyclic reaction (e.g., a Diels-Alder bond) is correct only if a Diels-Alder is genuinely the planned step; disconnecting it as a polar reaction produces nonsensical charged fragments.
"More disconnections = better plan." A shorter synthesis is generally preferred. Every step costs time, material, and purification effort. The ideal retrosynthetic plan minimises the number of steps while maximising overall yield.
Linear synthesis is not always worse than convergent synthesis. For targets where no clean convergent branch point exists — many natural products with densely functionalised single chains — a linear approach may be the only option.

Key theorem with proof [Intermediate+]

The foundational quantitative result in synthesis planning concerns the yield topology of convergent versus linear plans.

Proposition (Convergent yield advantage). Let a target molecule require $n$ synthetic steps at per-step yield $y$ , where $0 < y < 1$ . A linear plan achieves overall yield $Y_{L} = y^{n}$ . A convergent plan divides the synthesis into two branches of $p$ and $n - 1 - p$ steps (where $0 < p < n - 1$ ), joined by one coupling step at yield $y_{j}$ . The convergent overall yield is

$Y_{C} = y^{p} \cdot y^{n - 1 - p} \cdot y_{j} = y^{n - 1} \cdot y_{j} .$

Then $Y_{C} > Y_{L}$ if and only if $y_{j} > y$ .

Proof. The linear yield is $Y_{L} = y^{n} = y^{n - 1} \cdot y$ . The convergent yield is $Y_{C} = y^{n - 1} \cdot y_{j}$ . Since $y^{n - 1} > 0$ , dividing both sides by $y^{n - 1}$ gives $Y_{C} > Y_{L} ⟺ y_{j} > y$ . $□$

The operational significance: convergent plans are preferred when the joining step — typically a high-yielding named coupling reaction such as Suzuki, Wittig, or amide bond formation — exceeds the average per-step yield. In practice, palladium-catalysed cross-couplings routinely achieve 90-98% yield, while functional-group manipulations (oxidations, reductions, protections) average 70-85%. The gap $y_{j} - y$ is the convergent advantage. For a 12-step linear plan at 85% per step, the overall yield is $0.8 5^{12} = 14%$ . A convergent plan with two 5-step branches (each $0.8 5^{5} = 44%$ ) joined by a Suzuki coupling at 95% gives $0.44 \times 0.44 \times 0.95 = 18%$ — a meaningful improvement in material throughput.

Bridge. The convergent yield advantage builds toward the design of multi-step syntheses for complex natural products treated in 15.12.01 pending amino acid and 15.13.01 pending nucleic acid chemistry, where protecting-group-free convergent strategies are now the standard. The foundational reason the proposition matters is that it makes the choice between linear and convergent topologies a quantitative decision rather than an aesthetic one — the planner compares $y_{j}$ against $y$ and chooses accordingly. This is exactly the kind of structural insight that computer-assisted retrosynthetic programs encode: the search over synthon trees is pruned by yield thresholds at each node. The central insight — that a single high-yielding joining step compensates for the overhead of convergent branching — appears again in 15.07.01 carbonyl chemistry, where aldol and Grignard reactions routinely achieve the >90% yields that make convergent plans advantageous.

Exercises [Intermediate+]

Exercise 5 (medium, numeric).

A linear 8-step synthesis has per-step yield of 85%. A convergent alternative uses two 3-step branches at 85% per step, joined by a Suzuki coupling at 95% yield. Compute both overall yields as percentages rounded to one decimal place.

Hint

Linear: $0.8 5^{8}$ . Convergent: $0.8 5^{3} \times 0.8 5^{3} \times 0.95$ . Note the convergent plan has $3 + 3 + 1 = 7$ steps, not 8 — it is shorter because the convergent topology eliminates one linear step.

Answer

Linear: $0.8 5^{8} = 0.2725 = 27.2%$ .

Convergent: each branch gives $0.8 5^{3} = 0.6141$ . Joined at 95%: $0.6141 \times 0.6141 \times 0.95 = 0.3585 = 35.9%$ .

The convergent plan gives 35.9% versus 27.2% for the linear plan, despite using 7 steps instead of 8. The high-yielding Suzuki coupling step ( $y_{j} = 0.95 > y = 0.85$ ) provides the convergent advantage predicted by the yield proposition.

Exercise 6 (hard, short-answer).

Analyse the retrosynthesis of 2-phenyl-2-butanol ( $P h C (O H) (C H_{3}) (C H_{2} C H_{3})$ ). Propose two different disconnection strategies, identify the synthons and synthetic equivalents for each, and compare the merits.

Hint

The tertiary alcohol can be formed by Grignard addition to a ketone. Two different C-C bonds adjacent to the alcohol carbon can be disconnected.

Answer

Strategy 1. Disconnect the C-Ph bond. Synthons: $P h^{-}$ (nucleophilic aryl) and $^{+} C (O H) (C H_{3}) (C H_{2} C H_{3})$ (electrophilic ketone). Synthetic equivalents: phenylmagnesium bromide ( $P h M g B r$ ) and 2-butanone ( $C H_{3} C O C H_{2} C H_{3}$ ).

Strategy 2. Disconnect the C-Et bond. Synthons: $C H_{3} C H_{2}^{-}$ (nucleophilic alkyl) and $^{+} C (O H) (C H_{3}) (P h)$ (electrophilic ketone). Synthetic equivalents: ethylmagnesium bromide ( $E tM g B r$ ) and acetophenone ( $P h C O C H_{3}$ ).

Strategy 2 is slightly preferable because acetophenone is commercially available in high purity, ethyl Grignards are among the most reliable organometallic reagents, and there are fewer potential side reactions with the simpler ketone partner.

Exercise 7 (hard, short-answer).

A 12-step linear synthesis with 85% yield per step gives an overall yield of approximately 14%. Redesign as a convergent synthesis with two 5-step branches joined by one coupling step at 95% yield. Compute the overall yield and explain why the convergent plan is operationally superior even if the yields were equal.

Hint

Each 5-step branch at 85% gives $0.8 5^{5}$ . Multiply the two branches by the join step. Then consider what "operationally superior" means beyond the yield number.

Answer

Linear: $0.8 5^{12} = 0.142 = 14.2%$ .

Convergent: each branch gives $0.8 5^{5} = 0.4437$ . Join at 95%: $0.4437 \times 0.4437 \times 0.95 = 0.187 = 18.7%$ .

The convergent plan improves yield from 14.2% to 18.7%. The operational advantages are equally important: (1) the longest linear sequence any single molecule travels is 6 steps (5 + 1), not 12, reducing cumulative purification losses; (2) each branch can be prepared in parallel, halving the calendar time; (3) if one branch fails, the other is unaffected and does not need to be repeated.

Exercise 8 (hard, short-answer).

Explain why the Diels-Alder reaction is considered a "privileged transform" in retrosynthetic analysis, and give an example of a target where a Diels-Alder disconnection is more strategic than any polar disconnection.

Hint

The Diels-Alder reaction forms two C-C bonds and up to four stereocentres in a single step. What kind of target benefits most from this?

Answer

The Diels-Alder is privileged because it constructs a six-membered ring with excellent regioselectivity and stereospecificity in a single step, forming two C-C bonds simultaneously. The transform matches any cyclohexene substructure in the target and disconnects it to a diene and a dienophile.

A classic example: the synthesis of prostaglandin intermediates containing a functionalised cyclopentane ring. Diels-Alder disconnection of a six-membered ring precursor (later opened or contracted) is more strategic than polar disconnections because no polar reaction forms the required ring and stereochemistry in comparable step economy. Corey's synthesis of prostaglandins used a Diels-Alder disconnection as the key strategic bond-forming step ^{[Corey 1964]}.

Exercise 9 (hard, short-answer).

Critique the following retrosynthetic plan: "Disconnect all C-N bonds first, then all C-O bonds, then all remaining C-C bonds." Under what circumstances is this a reasonable strategy, and when does it fail?

Hint

Not all C-N or C-O bonds are equally easy to form in the forward direction. Consider whether the bond is part of a functional group with a known reliable synthesis.

Answer

This strategy follows the heuristic of disconnecting heteroatom bonds first, which works when those bonds are formed by reliable, high-yielding reactions (amide bonds from amines + acyl chlorides; ethers from alkoxides + alkyl halides). It is reasonable for targets where the heteroatom bonds are at the periphery and do not affect the carbon skeleton topology.

It fails when: (1) the C-N or C-O bond is part of a ring and its disconnection requires a thermodynamically unfavourable cyclisation in the forward direction; (2) the heteroatom bond is sterically hindered; (3) disconnecting heteroatom bonds first ignores a more strategic C-C disconnection that would dramatically simplify the carbon skeleton. The better approach is to prioritise disconnections at bonds whose forward reactions are highest-yielding and most selective, regardless of atom type.

Transform-based strategy and synthon analysis [Master]

The most powerful retrosynthetic strategy is the transform-based approach: identify a high-yielding, selective named reaction in the forward direction, then search the target for the structural motif that this reaction would produce. The motif is the transform's recognition pattern, and once found, the corresponding disconnection is immediate. Corey formalised three complementary strategic approaches ^{[Corey and Cheng 1989]}:

Transform-based strategy. Start from a known reaction and search the target for its product pattern. If the target contains a $β$ -hydroxy ketone, the aldol transform applies: disconnect to an enolate and an aldehyde. If it contains an alkene, the Wittig transform applies: disconnect to a phosphonium ylide and a carbonyl. The chemist maintains a mental library of several hundred transforms and scans the target against them.
Structure-goal strategy. Start from a simple, commercially available starting material and work forward toward the target, asking which reactions could elaborate that starting material in the right direction. This is the inverse of transform-based reasoning and is useful when the target is structurally similar to a known natural product with an established synthesis.
Topological strategy. Analyse the carbon skeleton of the target for ring systems, branching points, and symmetry elements. Strategic bonds are those whose disconnection most reduces topological complexity — ring-junction bonds in polycyclic systems, bonds at the branch points of highly branched skeletons, and bonds that exploit molecular symmetry to give identical synthons.

The three approaches are not mutually exclusive. A skilled retrosynthetic planner applies all three simultaneously, letting each inform the others. The transform-based approach provides the chemical knowledge (which reactions work), the structure-goal approach provides the commercial constraint (what is available), and the topological approach provides the strategic direction (which bonds are most productive to disconnect).

Donor-acceptor classification

Every disconnection at a C-C bond produces two synthons, one nucleophilic (donor) and one electrophilic (acceptor). The charge assignment follows the polarity of the forward reaction:

Disconnecting a bond adjacent to a carbonyl: the alpha carbon is the donor (enolate synthon), the carbonyl carbon or its adjacent electrophilic carbon is the acceptor.
Disconnecting a C-X bond (X = halide): the carbon bearing X is the acceptor (carbocation synthon), X leaves as the halide.
Disconnecting at a benzylic or allylic position: the resulting stabilised cation is the acceptor, the nucleophile is the donor.

The synthetic equivalents for common donor synthons include organolithium and Grignard reagents (for $R^{-}$ ), enolates (for $R - C O - C H R^{' -}$ ), and cuprates (for $R_{2} C uL i$ , delivering $R^{-}$ with reduced basicity). The synthetic equivalents for common acceptor synthons include alkyl halides (for $R^{+}$ ), carbonyl compounds (for the electrophilic carbon), and epoxides (for the 1,2-difunctional acceptor).

Umpolung: inverting synthon polarity

Some disconnections require a synthon whose polarity is the opposite of what the functional group normally provides. A carbonyl carbon is normally electrophilic (acceptor), but some disconnections require it to act as a nucleophile (donor). This polarity inversion is called umpolung ^{[Seebach 1979]}.

The most important umpolung strategy is the dithiane approach. Treating an aldehyde with 1,3-propanedithiol under acid catalysis forms a 1,3-dithiane. The carbon between the two sulfur atoms, originally the electrophilic carbonyl carbon, is now activated toward deprotonation by $n$ -BuLi. The resulting carbanion is a nucleophilic donor synthon that reacts with alkyl halides, epoxides, and carbonyl compounds. After the carbon-carbon bond is formed, the dithiane is removed (hydrolysed with $H g (I I)$ or oxidative methods) to reveal the carbonyl group again.

The cyanohydrin ( $R - C H (O H) - C N$ ) serves a similar umpolung function for aldehydes: the nitrile stabilises an adjacent carbanion after deprotonation at the alpha position, allowing the carbonyl-derived carbon to act as a nucleophile. Seebach's 1979 review ^{[Seebach 1979]} systematised the full range of umpolung strategies and remains the canonical reference.

Umpolung expands the retrosynthetic search space substantially. Without umpolung, the planner can only disconnect bonds whose polarity matches the native reactivity of the adjacent functional groups. With umpolung, the planner can disconnect any C-C bond adjacent to a functional group by inverting the polarity of one synthon. The cost is additional steps (installing and removing the umpolung reagent), so the planner weighs the step economy of the umpolung route against alternatives.

Named transforms and their recognition patterns

The transforms most frequently applied in retrosynthetic analysis, with their recognition patterns:

Transform	Recognition pattern in target	Disconnection products
Aldol	$β$ -hydroxy carbonyl	Enolate + aldehyde
Claisen	$β$ -keto ester	Two ester molecules
Michael	1,5-dicarbonyl	Enolate + $α, β$ -unsaturated carbonyl
Wittig	Alkene adjacent to carbonyl	Phosphonium ylide + aldehyde/ketone
Grignard	Tertiary or secondary alcohol	Organometallic + carbonyl
Diels-Alder	Cyclohexene (especially with defined stereochemistry)	Diene + dienophile
Dieckmann	Cyclic $β$ -keto ester	Diester
Robinson annulation	Cyclohexenone with angular substitution	Enolate + methyl vinyl ketone

Each transform has well-understood scope, limitations, and stereochemical outcomes. The retrosynthetic planner applies the transform whose recognition pattern best matches the target's most complex structural element, on the principle that the most complex substructure should be formed in the fewest steps.

Convergent versus linear synthesis: topology and yield [Master]

The convergent yield proposition proved in the Intermediate tier establishes the mathematical condition under which a convergent plan gives higher yield than a linear one. The practical considerations extend beyond this arithmetic result.

The longest linear sequence

The critical metric for synthesis planning is not the total number of steps but the longest linear sequence (LLS) — the maximum number of sequential steps that any single molecule must pass through. In a linear plan of $n$ steps, the LLS is $n$ . In a convergent plan with two branches of $p$ and $q$ steps joined by one step, the LLS is $max (p, q) + 1$ .

The LLS matters because it determines material throughput. If each step in the LLS has yield $y$ , the final yield of product per mole of the limiting starting material is $y^{LLS}$ . The convergent plan reduces the LLS from $n$ to roughly $n /2 + 1$ , and the exponential yield improvement from halving the LLS is the primary advantage of convergence.

For a 20-step synthesis at 85% per step, the linear LLS of 20 gives $0.8 5^{20} = 3.9%$ overall yield. A convergent plan with two 9-step branches joined by a coupling and one final elaboration step has LLS = 10, giving $0.8 5^{10} = 19.7%$ per mole of the limiting branch starting material — a fivefold improvement.

Branch-point selection

Not all disconnections make equally good branch points. The optimal branch point is the bond whose disconnection: (a) produces two fragments of roughly equal complexity (measured by ring count, stereocentre count, and functional-group density), (b) can be re-formed by a high-yielding, stereoselective coupling reaction, and (c) does not require protecting groups on either fragment that would conflict with the other fragment's synthetic route.

Corey's longifolene synthesis ^{[Corey 1964]} illustrates branch-point selection. Longifolene is a bridged bicyclic sesquiterpene ( $C_{15} H_{24}$ ) with a tricyclic framework containing a bridgehead methyl group and four stereocentres. The retrosynthetic analysis identified a Robinson annulation as the strategic transform for constructing the six-membered ring, disconnecting to a simple cyclohexanone and methyl vinyl ketone. The remaining bridge was constructed by an intramolecular alkylation. The synthesis is a landmark because it demonstrated that a structurally complex natural product could be rationally designed from simple starting materials by systematic application of retrosynthetic principles.

Material throughput and calendar time

Convergent synthesis offers two operational advantages beyond yield. First, the two branches can be prepared in parallel by different members of a research team, reducing the calendar time from $O (n)$ to $O (n /2)$ . For a 20-step synthesis, this halves the project duration from months to weeks of bench time. Second, each branch can be optimised independently: if one branch gives low yield, the chemist can modify conditions for that branch alone without repeating the other.

The material advantage is also significant. In a linear 10-step synthesis at 80% per step, to obtain 1 gram of product the chemist must start with $1/0. 8^{10} = 9.3$ grams of the initial starting material. In a convergent plan with two 4-step branches joined by one coupling at 90%, the chemist needs $1/ (0. 8^{4} \times 0.9) = 3.0$ grams of each branch's starting material — less starting material overall, and each branch is handled in smaller, more manageable quantities.

When linear is the only option

Some targets resist convergent disconnection. Long, unbranched polyene chains (e.g., carotenoid tails) have no branch point that splits the molecule into two fragments of comparable complexity. Similarly, linear peptides assembled by solid-phase synthesis follow a linear (or slightly convergent) topology because the amide bond-forming chemistry is iterative. For these targets, the planner focuses on maximising per-step yield and minimising purification losses rather than on finding a convergent topology.

Protecting-group strategy and chemoselectivity [Master]

Protecting groups are the synthesis planner's response to a fundamental problem: most organic reactions are not perfectly selective. A reagent that transforms one functional group may also react with another. The protecting group masks the interfering functionality during the step in question and is removed afterward.

Orthogonal protecting-group sets

For targets with multiple functional groups, the planner needs a set of protecting groups that can be installed and removed independently. Two protecting groups are orthogonal if the conditions for removing one do not affect the other. The classic orthogonal set for a molecule containing a carboxylic acid, an amine, and an alcohol is:

Carboxylic acid protected as a methyl ester (removed by base hydrolysis, NaOH/MeOH)
Amine protected as a Boc carbamate (removed by acid, TFA/DCM)
Alcohol protected as a TBDMS ether (removed by fluoride, TBAF/THF)

These three deprotection conditions — base, acid, and fluoride — are mutually compatible: base does not remove Boc or TBDMS, acid does not cleave the methyl ester or TBDMS, and fluoride does not affect the ester or Boc. The planner can remove them in any order.

For peptide synthesis, the Fmoc/t-Bu orthogonal set is standard: Fmoc (base-labile, removed by piperidine) protects the alpha-amine during chain elongation, while $t$ -Bu groups (acid-labile, removed by TFA) protect side-chain functionality. This orthogonal set enables the iterative amide coupling cycle that characterises solid-phase peptide synthesis.

The protecting-group tax

Each protecting group adds two steps (installation and removal) to the synthesis. A molecule requiring four protecting groups accumulates eight extra steps. If each protection/deprotection has 95% yield, the protecting-group tax alone costs $0.9 5^{8} = 66%$ of the material — before any productive chemistry occurs. For complex targets requiring 10+ protecting groups, the tax can dominate the synthesis.

The planner minimises the protecting-group tax by:

Choosing reaction conditions that are selective enough to avoid protecting groups entirely.
Using protecting groups that serve double duty (e.g., a TBDPS group that protects an alcohol and also serves as a chromatographic handle).
Designing the route so that protecting-group installation and removal are combined with other transformations (e.g., a silyl migration that moves a protecting group from one alcohol to another in one step).

Protecting-group-free synthesis

Baran and colleagues articulated the ideology of protecting-group-free synthesis starting in 2007 ^{[Baran 2007]}, arguing that the most elegant syntheses avoid protecting groups entirely by exploiting the innate chemoselectivity of carefully chosen reagents. The approach requires the planner to find reagents that transform the desired functional group without affecting others — a constraint that eliminates many named reactions but forces creative solutions.

The terpenoid syntheses from Baran's laboratory demonstrate the approach. Haouamine A, a marine alkaloid with a strained cyclohexene ring, was synthesised without protecting groups by choosing reagents whose chemoselectivity matched the target's functional-group hierarchy: an oxidation that preferentially affected the more electron-rich double bond, and a cyclisation that occurred at the more nucleophilic amine. The resulting synthesis was shorter and higher-yielding than earlier routes that used multiple protecting groups.

The protecting-group-free approach is not always possible. Targets with closely spaced, similarly reactive functional groups (e.g., two primary alcohols separated by two carbons) generally require at least one protecting group to differentiate them. The planner's judgement lies in recognising when protecting groups are genuinely necessary versus when they are a convenience that can be eliminated by more selective chemistry.

Functional-group compatibility matrices

A synthesis planner maintains a mental (or computational) matrix of functional-group compatibilities: which reagents affect which functional groups. A Grignard reagent is incompatible with protic functional groups (alcohols, carboxylic acids, thiols), with carbonyl groups (aldehydes, ketones, esters — unless the carbonyl is the intended target), and with epoxides. A Wittig reaction is compatible with esters, nitriles, and halides but incompatible with aldehydes (which react preferentially).

This compatibility matrix constrains the order of disconnections. If a target contains both a Grignard-formed C-C bond and an ester, the ester must be masked (or installed late) during the Grignard step. The planner sequences the forward synthesis so that each reagent encounters only compatible functional groups — or installs protecting groups to enforce compatibility where the natural sequence fails.

Computer-assisted retrosynthesis and the synthon-space search [Master]

Retrosynthetic analysis is, in principle, a graph-search problem. The nodes are molecular structures, the edges are known chemical reactions (transforms), and the search proceeds from the target (root) toward starting materials (leaves). The graph is enormous — the number of possible molecules grows exponentially with atom count — and the search is heuristic-guided. Corey's LHASA program was the first serious attempt to automate this search.

Corey's LHASA

E. J. Corey developed LHASA (Logic and Heuristics Applied to Synthetic Analysis) at Harvard beginning in the late 1960s ^{[Corey and Cheng 1989]}. The program encoded several hundred named reactions as transforms, each with three components:

A structural recognition pattern — the substructural motif in the target that the transform matches (e.g., a cyclohexene for the Diels-Alder transform).
A disconnection rule — the bond(s) broken and the synthons produced.
A scope and limitation filter — steric, electronic, and stereochemical constraints that determine whether the transform is applicable to a given substrate.

LHASA applied transforms recursively to the target, generating a tree of possible precursors. At each node, the program evaluated the "strategic value" of each disconnection using Corey's heuristics: prefer disconnections that simplify the carbon skeleton, that produce synthons with known synthetic equivalents, and that exploit symmetry. The chemist interacted with LHASA through a graphical interface, selecting which branches of the synthon tree to explore.

LHASA's limitations were computational and chemical. The number of possible disconnections grows factorially with molecular size, and the program's heuristic pruning was insufficient for complex natural products (30+ carbon atoms, multiple rings). The scope and limitation filters were hand-coded and could not capture the full nuance of each reaction's selectivity. Despite these limitations, LHASA demonstrated that retrosynthetic reasoning could be formalised algorithmically — a conceptual breakthrough that earned Corey the 1990 Nobel Prize.

Synthia (Chematica) and network-based planning

Grzybowski and colleagues developed Synthia (formerly Chematica) as a network-scale approach to retrosynthetic planning ^{[Grzybowski 2018]}. Rather than searching a tree of possible precursors, Synthia constructs a graph of all known reactions connecting all known molecules, and searches this graph for paths from the target to commercially available starting materials. The graph contains millions of nodes and tens of millions of edges, compiled from reaction databases (Reaxys, SciFinder, USPTO patents).

The key innovation is the scoring function that ranks candidate routes. Each candidate route is scored on: (a) estimated yield (predicted from the reaction's historical yield distribution), (b) selectivity (predicted from substrate similarity to known examples), (c) availability and cost of starting materials, and (d) operational complexity (number of chromatographic purifications, air-free techniques required). The highest-scoring routes are proposed to the chemist for experimental validation.

Synthia-generated syntheses have been validated experimentally for moderately complex targets (up to ~20 heavy atoms). The program's strength is its breadth: it can propose routes using reactions that a human planner might overlook because they are outside the planner's expertise. Its weakness is stereochemistry: predicting the stereochemical outcome of a reaction at an unfamiliar substrate remains difficult, and the program sometimes proposes routes that are stereochemically ambiguous or incorrect.

Machine-learning retrosynthesis prediction

Since 2017, machine-learning models have been applied to two subproblems: (a) forward reaction prediction — given reactants and conditions, predict the product(s) and yield; and (b) retrosynthetic disconnection prediction — given a target, predict which bond to disconnect and which transform to apply.

The most successful approaches use graph neural networks (GNNs) that operate directly on molecular graphs. The GNN learns to recognise patterns in the molecular structure that indicate productive disconnection sites, trained on large reaction databases. The output is a ranked list of proposed disconnections, each with a confidence score.

Open-source tools include ASKCOS (from MIT) and AiZynthFinder (from AstraZeneca), both of which combine a neural-network-based disconnection predictor with a graph-search algorithm that recursively applies disconnections to reach commercially available starting materials. These tools are not yet competitive with expert human planners for complex targets, but they are useful for rapid route scouting and for identifying non-obvious disconnection strategies.

The gap between computation and experiment

The persistent challenge for computer-assisted retrosynthesis is the gap between the predicted route and the experimental reality. A proposed route may look reasonable on paper but fail in the laboratory for reasons that the model does not capture: unexpected side reactions, low selectivity at a particular substrate, poor solubility, difficult purification, or sensitivity of an intermediate to air or moisture. Each of these failures requires the chemist to modify the route — sometimes fundamentally.

The gap is narrowing. As more experimental data enters the training sets, the models improve their prediction of selectivity and yield. Synthia's experimental validation program demonstrated that roughly 70% of proposed routes for moderately complex targets could be executed as planned or with minor modifications. For the remaining 30%, the chemist's expertise in troubleshooting — choosing alternative conditions, adding protecting groups, changing the order of steps — remains indispensable.

The philosophical content is substantive: retrosynthetic planning is a form of reasoning under uncertainty. The planner makes decisions based on incomplete information (predicted yields, estimated selectivity) and revises as experimental data arrives. The feedback loop between planning and execution is the defining methodological structure of synthetic chemistry, and computer-assisted tools augment rather than replace this loop.

Synthesis. The four strategic approaches — transform-based, topological, structure-goal, and computer-assisted — together constitute the foundational reason that modern organic synthesis can target molecules of arbitrary complexity. The central insight is that every target molecule has a finite number of productive disconnections, and the retrosynthetic planner's task is to rank them by yield, selectivity, and operational simplicity. Putting these together with the convergent yield proposition, the optimal plan is a balanced DAG that minimises the longest linear sequence while maximising the yield of each branch. This is exactly the structure that identifies laboratory synthesis with biosynthetic pathways: nature's enzyme-catalysed routes are convergent, high-yielding, and protecting-group-free — the same properties the retrosynthetic planner seeks. The pattern recurs in 15.12.01 pending amino acid chemistry and 15.13.01 pending nucleic acid chemistry, where solid-phase and convergent fragment-coupling strategies generalise the same topology. The bridge is between the abstract synthon tree and the concrete laboratory operation: each node is a real molecule that must be prepared, characterised, and purified before the next transform applies.

Full proof set [Master]

Proposition (Convergent yield advantage). Let a target molecule require $n$ synthetic steps at per-step yield $y$ , where $0 < y < 1$ . A linear plan achieves overall yield $Y_{L} = y^{n}$ . A convergent plan divides the synthesis into two branches of $p$ and $n - 1 - p$ steps ( $0 < p < n - 1$ ), joined by one coupling step at yield $y_{j}$ . The convergent overall yield is $Y_{C} = y^{n - 1} \cdot y_{j}$ . Then $Y_{C} > Y_{L}$ if and only if $y_{j} > y$ .

Proof. In the linear plan, each of the $n$ steps operates on the product of the previous step. The overall yield is the product of the per-step yields: $Y_{L} = y^{n} = y^{n - 1} \cdot y$ .

In the convergent plan, branch A consists of $p$ sequential steps at yield $y$ each, giving yield $y^{p}$ . Branch B consists of $n - 1 - p$ sequential steps, giving yield $y^{n - 1 - p}$ . The joining step combines the products of the two branches at yield $y_{j}$ . The convergent overall yield is:

$Y_{C} = y^{p} \cdot y^{n - 1 - p} \cdot y_{j} = y^{p + n - 1 - p} \cdot y_{j} = y^{n - 1} \cdot y_{j} .$

Comparing $Y_{C} = y^{n - 1} \cdot y_{j}$ with $Y_{L} = y^{n - 1} \cdot y$ :

$Y_{C} - Y_{L} = y^{n - 1} \cdot (y_{j} - y) .$

Since $y^{n - 1} > 0$ (because $0 < y < 1$ raised to any positive power remains positive), the sign of $Y_{C} - Y_{L}$ is determined by $y_{j} - y$ . Therefore $Y_{C} > Y_{L} ⟺ y_{j} > y$ . $□$

Corollary. The convergent advantage is largest when the joining step yield $y_{j}$ is maximised and the average per-step yield $y$ is minimised. For a Suzuki coupling at 95% joining two branches at 70% per step, the convergent plan gives $Y_{C} / Y_{L} = 0.95/0.70 = 1.36$ — a 36% relative improvement per joining step. Multiple convergent join points compound this advantage multiplicatively.

Connections [Master]

SN1 vs SN2 substitution mechanisms 15.04.02 pending. The choice of synthetic equivalent for alkyl halide electrophiles in retrosynthetic plans depends on whether the forward reaction proceeds by SN2 (preferred for primary and secondary substrates) or SN1 (tertiary). The disconnection strategy assumes a viable forward mechanism; this unit's disconnection at sp3 C-X bonds is the retrosynthetic mirror of the substitution mechanisms treated there.
Carbonyl chemistry — nucleophilic addition 15.07.01. The single most important family of transforms in retrosynthetic analysis. Aldol disconnections, Grignard additions, Wittig olefinations, and Michael additions all originate from carbonyl reactivity. The carbonyl unit provides the forward-reaction mechanisms that retrosynthetic transforms encode in reverse.
Amino acids and protein chemistry 15.12.01 pending. Peptide synthesis applies retrosynthetic logic at every amide bond: the disconnection produces an amine and a carboxylic acid, and the forward amide coupling (typically HATU or EDC-mediated) is the synthetic equivalent. Solid-phase peptide synthesis is a linear retrosynthetic plan executed iteratively; native chemical ligation is a convergent retrosynthetic plan for larger polypeptides.
Electrophilic addition to alkenes 15.05.01. Alkene-based disconnections in retrosynthetic analysis rely on the forward electrophilic addition reactions treated in this unit. Hydroboration-oxidation and oxymercuration-reduction provide routes to alcohols from alkenes that the retrosynthetic planner can exploit as functional-group interconversions.
Chemical kinetics 14.08.01. The feasibility of each forward step in a synthetic plan is conditioned on its rate and selectivity. Kinetic analysis of model reactions informs the retrosynthetic planner about which transforms are practical and what conditions to specify.
Nucleic acid chemistry 15.13.01 pending. Oligonucleotide synthesis is a linear retrosynthetic plan analogous to peptide synthesis: each phosphodiester bond is a disconnection, and the forward coupling chemistry (phosphoramidite or H-phosphonate) is the synthetic equivalent. Convergent fragment coupling generalises this to long sequences.

Historical and philosophical context [Master]

Retrosynthetic analysis was formalised by E. J. Corey at Harvard beginning in 1964 with his synthesis of longifolene, a bridged bicyclic sesquiterpene ^{[Corey 1964]}. The synthesis demonstrated that a structurally complex natural product could be rationally designed from simple starting materials by working backward through known reactions, rather than discovered by trial and error. Corey articulated the general principles — disconnection, synthon, synthetic equivalent, transform — and systematised them into a teachable method, culminating in The Logic of Chemical Synthesis ^{[Corey and Cheng 1989]}. He received the 1990 Nobel Prize in Chemistry for this work.

The practice of planning syntheses backward predates Corey. Robert Robinson's synthesis of tropinone in 1917 ^{[Robinson 1917]} is recognised as an early example of rational synthesis design: Robinson identified the Mannich reaction as the strategic transform that constructs the tropane skeleton from succinaldehyde, methylamine, and acetone dicarboxylic acid in a single step. Robinson's approach was intuitive rather than systematic; Corey's contribution was to formalise the intuition into a reproducible methodology.

The development of computer-assisted retrosynthesis followed. LHASA, begun in 1969, encoded several hundred transforms and applied them recursively to generate synthon trees. The program demonstrated that retrosynthetic reasoning could be formalised algorithmically. Modern descendants — Synthia, ASKCOS, AiZynthFinder — use graph neural networks and reaction databases containing millions of known reactions to propose routes that have been experimentally validated for moderately complex targets ^{[Grzybowski 2018]}.

Bibliography [Master]

@article{Corey1964,
  author = {Corey, E. J. and Nozoe, S.},
  title = {Total Synthesis of {D,L}-Longifolene},
  journal = {J. Am. Chem. Soc.},
  volume = {86},
  pages = {478--485},
  year = {1964}
}

@book{CoreyCheng1989,
  author = {Corey, E. J. and Cheng, X.},
  title = {The Logic of Chemical Synthesis},
  publisher = {Wiley},
  address = {New York},
  year = {1989}
}

@article{Robinson1917,
  author = {Robinson, R.},
  title = {A Synthesis of Tropinone},
  journal = {J. Chem. Soc., Trans.},
  volume = {111},
  pages = {762--768},
  year = {1917}
}

@article{Seebach1979,
  author = {Seebach, D.},
  title = {Methods of Reactivity {Umpolung}},
  journal = {Angew. Chem. Int. Ed. Engl.},
  volume = {18},
  pages = {239--258},
  year = {1979}
}

@article{Grzybowski2018,
  author = {Grzybowski, B. A. and Bishop, K. J. M. and Kowalczyk, B. and Wilmer, C. E.},
  title = {The 'wired' universe of organic chemistry},
  journal = {Nat. Chem.},
  volume = {1},
  pages = {31--36},
  year = {2009}
}

@book{Warren2008,
  author = {Warren, S.},
  title = {Organic Synthesis: The Disconnection Approach},
  edition = {2nd},
  publisher = {Wiley},
  address = {Chichester},
  year = {2008}
}

@book{NicolaouSorensen1996,
  author = {Nicolaou, K. C. and Sorensen, E. J.},
  title = {Classics in Total Synthesis},
  publisher = {VCH},
  address = {Weinheim},
  year = {1996}
}

@book{CareySundberg2007,
  author = {Carey, F. A. and Sundberg, R. J.},
  title = {Advanced Organic Chemistry, Part {B}: Reaction and Synthesis},
  edition = {5th},
  publisher = {Springer},
  address = {New York},
  year = {2007}
}

@book{Clayden2012,
  author = {Clayden, J. and Greeves, N. and Warren, S.},
  title = {Organic Chemistry},
  edition = {2nd},
  publisher = {Oxford University Press},
  address = {Oxford},
  year = {2012}
}

Prerequisites

none — this is a leaf unit

Tier anchors

beginner: Klein — Organic Chemistry as a Second Language, retrosynthesis chapters; Crash Course Organic Chemistry — Synthesis and Retrosynthesis
intermediate: Clayden, Greeves & Warren — Organic Chemistry 2nd ed. Ch. 28 (Retrosynthetic analysis); Bruice — Organic Chemistry, synthesis chapters
master: Corey & Cheng — The Logic of Chemical Synthesis (Wiley, 1989); Warren — Organic Synthesis: The Disconnection Approach 2nd ed.; Nicolaou & Sorensen — Classics in Total Synthesis; Carey & Sundberg — Advanced Organic Chemistry Part B 5th ed. Ch. 1-2

References

TODO_REF pending
Clayden, Greeves & Warren — Organic Chemistry, 2nd ed. (Oxford UP, 2012) · Ch. 28 Retrosynthetic analysis · see docs/catalogs/NEED_TO_SOURCE.md#chem-clayden
TODO_REF pending
Warren, S. — Organic Synthesis: The Disconnection Approach, 2nd ed. (Wiley, 2008) · Parts I-III (the canonical textbook on retrosynthetic logic) · see docs/catalogs/NEED_TO_SOURCE.md#chem-warren-disconnection
TODO_REF pending
Carey & Sundberg — Advanced Organic Chemistry Part B: Reaction and Synthesis, 5th ed. (Springer, 2007) · Ch. 1-2 (synthetic design and retrosynthetic logic) · see docs/catalogs/NEED_TO_SOURCE.md#chem-carey-sundberg
TODO_REF pending
Nicolaou, K. C. & Sorensen, E. J. — Classics in Total Synthesis (VCH, 1996) · Chapter-length case studies of landmark total syntheses · see docs/catalogs/NEED_TO_SOURCE.md#chem-nicolaou-sorensen
TODO_REF pending
Corey, E. J. & Cheng, X. — The Logic of Chemical Synthesis (Wiley, 1989) · The originating monograph for retrosynthetic analysis principles and LHASA · see docs/catalogs/NEED_TO_SOURCE.md#chem-corey-logic
TODO_REF pending
Corey, E. J. — longifolene total synthesis, J. Am. Chem. Soc. 86, 478 (1964) · Landmark application of retrosynthetic analysis to a bridged bicyclic sesquiterpene · see docs/catalogs/NEED_TO_SOURCE.md#chem-corey-longifolene
TODO_REF pending
Robinson, R. — The Structural Relations of Natural Products (Oxford, 1917); tropinone synthesis, J. Chem. Soc. 111, 762 (1917) · Early rational synthesis design predating formal retrosynthetic analysis · see docs/catalogs/NEED_TO_SOURCE.md#chem-robinson-tropinone
TODO_REF pending
Grzybowski, B. A. et al. — Synthia (Chematica): computer-designed syntheses, Chem. Rev. 2018 · Network-based computational retrosynthesis validated experimentally · see docs/catalogs/NEED_TO_SOURCE.md#chem-grzybowski-synthia
TODO_REF pending
Seebach, D. — Methods of Reactivity Umpolung, Angew. Chem. Int. Ed. 18, 239 (1979) · Originator paper on dithiane umpolung strategy · see docs/catalogs/NEED_TO_SOURCE.md#chem-seebach-umpolung
TODO_REF pending
Baran, P. S. et al. — protecting-group-free synthesis philosophy, J. Am. Chem. Soc. series (2007-) · The ideology and practice of avoiding protecting groups in complex synthesis · see docs/catalogs/NEED_TO_SOURCE.md#chem-baran-pgfree
tong
raw/pdfs/dynamics/four.pdf · background reference for transition-state energy landscapes

Reviewer

Tyler (pending external chemistry reviewer per CHEMISTRY_PLAN section 6)

Estimated time

beginner: 15m
intermediate: 35m
master: 75m