25.03.01 · computer-science / complexity

Algorithmic complexity and Big-O notation

shipped3 tiersLean: none

Anchor (Master): Arora and Barak, Computational Complexity: A Modern Approach, Ch. 1; Papadimitriou, Computational Complexity, Ch. 1-2

Intuition Beginner

Imagine you are searching for a word in a book. You could start at page 1 and read every word until you find it. Or you could open to the middle, check whether the word comes before or after that point, and repeat with the relevant half. The first approach scans one page at a time. The second halves the remaining pages at each step. For a 1,000-page book, the first approach might take 1,000 steps. The second takes at most 10. For a million-page book, the first takes a million steps; the second takes 20.

This dramatic difference illustrates the power of algorithmic thinking. The two approaches solve the same problem, but one is exponentially faster than the other. The key insight is that the structure of the problem (the book is sorted alphabetically) enables a more efficient strategy (binary search) than the naive approach (linear search). Recognizing and exploiting problem structure is the essence of algorithm design.

This difference matters because it tells you whether your algorithm will finish in seconds, hours, or lifetimes. Algorithmic complexity is the study of how the running time (or memory usage) of an algorithm grows as the size of the input grows. It gives you a way to predict, before you run the code, whether your solution will be practical for the problem you are trying to solve.

Big-O notation is the language used to describe this growth. When we say an algorithm runs in $O (n)$ time, we mean its running time grows proportionally with the input size $n$ . Double the input, and the running time roughly doubles. When we say $O (n^{2})$ , we mean the running time grows with the square of the input. Double the input, and the running time roughly quadruples. When we say $O (lo g n)$ , the running time grows logarithmically. Double the input, and the running time increases by just one step.

The difference between these growth rates becomes dramatic for large inputs. An $O (n)$ algorithm processing a billion items takes a billion steps. An $O (n^{2})$ algorithm takes a quintillion steps. An $O (lo g n)$ algorithm takes about 30 steps. The growth rate of the algorithm matters far more than the constant factors (how fast your computer is, how efficient your code is) when the input is large enough.

Think of it this way. Suppose you have a slow computer that executes one million operations per second and a fast computer that executes one billion. Your slow computer runs an $O (n lo g n)$ sorting algorithm on a million items, requiring about 20 million operations, finishing in 20 seconds. Your fast computer runs an $O (n^{2})$ sorting algorithm on the same data, requiring about one trillion operations, finishing in 1,000 seconds. The fast computer with the worse algorithm is 50 times slower than the slow computer with the better algorithm. For ten million items, the gap becomes 5,000 times. The algorithm dominates the hardware.

Big-O notation focuses on the dominant term and ignores constants. If an algorithm takes $3 n^{2} + 5 n + 100$ operations, we say it is $O (n^{2})$ because for large $n$ , the $n^{2}$ term dominates the others. The constants 3, 5, and 100 do not change the growth rate. This simplification is deliberate. It allows us to compare algorithms at a high level without getting bogged down in implementation details.

There are several common complexity classes, each describing a different growth rate. $O (1)$ means constant time: the operation takes the same amount of time regardless of input size. Accessing an array element by index is $O (1)$ . $O (lo g n)$ means logarithmic time: the running time grows slowly, adding one step each time the input doubles. Binary search is $O (lo g n)$ .

$O (n)$ means linear time: the running time grows proportionally with input size. Scanning an array from start to end is $O (n)$ . $O (n lo g n)$ is called linearithmic time, and it is the growth rate of the best general-purpose sorting algorithms like mergesort and heapsort. $O (n^{2})$ means quadratic time: the running time grows with the square of the input. Comparing every element to every other element, as in simple sorting algorithms like bubble sort, is $O (n^{2})$ . $O (2^{n})$ means exponential time: the running time doubles with each additional element. Brute-force solutions to many hard problems have this growth rate.

Amortized analysis provides a way to analyze sequences of operations. Some data structures have individual operations that are expensive but rare, interspersed with many cheap operations. A dynamic array (like Python's list) doubles in size when it runs out of space. The resizing operation copies all $n$ elements, taking $O (n)$ time. But this resizing happens only after $n$ cheap $O (1)$ insertions. Averaging the cost over all operations, each insertion costs $O (1)$ amortized. Amortized analysis gives a more accurate picture of performance than worst-case analysis for data structures with occasional expensive operations.

Recurrence relations describe the running time of recursive algorithms. A recurrence expresses $T (n)$ in terms of $T$ evaluated at smaller inputs. Binary search has the recurrence $T (n) = T (n /2) + O (1)$ , capturing the fact that it makes one recursive call on half the input and spends constant time on the comparison. Mergesort has $T (n) = 2 T (n /2) + O (n)$ , reflecting two recursive calls on halves of the input plus linear-time merging. Solving recurrences (using the master theorem, substitution, or recursion trees) is a core skill in algorithm analysis.

Space complexity follows the same notation but measures memory usage instead of time. An algorithm that uses an auxiliary array of size $n$ has $O (n)$ space complexity. An algorithm that uses only a few variables regardless of input size has $O (1)$ space complexity. Time and space are often in tension: you can sometimes reduce running time by using more memory (caching, memoization) or reduce memory usage at the cost of recomputing values.

The concept of worst-case versus average-case complexity adds nuance. An algorithm's worst-case complexity describes its performance on the hardest possible input. Its average-case complexity describes the expected performance over all inputs, often under some distribution. Quicksort has $O (n^{2})$ worst-case time but $O (n lo g n)$ average-case time. For most practical purposes, quicksort is as fast as the theoretically superior mergesort, which is why it is widely used despite its worse worst case.

Best-case complexity describes the fastest the algorithm can run. For linear search, the best case is $O (1)$ : the target is the first element. But best-case analysis is rarely useful, because it tells you nothing about typical or difficult inputs. Worst-case analysis provides a guarantee: no matter what input you encounter, the algorithm will not exceed this bound.

The power of logarithmic algorithms cannot be overstated. Logarithmic growth is so slow that for all practical input sizes, an $O (lo g n)$ algorithm runs effectively in constant time. For $n = 1 0^{100}$ (a googol, far larger than the number of atoms in the observable universe), $lo g_{2} n \approx 332$ . This means that binary search on a sorted array of a googol elements requires at most 332 comparisons. Logarithmic algorithms are the backbone of efficient data structures: binary search trees, balanced trees (AVL, red-black), B-trees (used in databases), and skip lists all achieve $O (lo g n)$ operations.

Polynomial versus exponential is the most important distinction in complexity theory. Polynomial-time algorithms ( $O (n)$ , $O (n^{2})$ , $O (n^{3})$ ) are considered tractable because the running time grows at a manageable rate. Exponential-time algorithms ( $O (2^{n})$ , $O (n!)$ ) are considered intractable because the running time becomes astronomical even for modest input sizes. The transition from polynomial to exponential is sharp: a polynomial algorithm for an NP-complete problem would render thousands of apparently intractable problems tractable overnight.

Reductions are the primary tool for relating the complexity of different problems. A polynomial-time reduction from problem A to problem B shows that B is at least as hard as A: if we could solve B efficiently, we could solve A efficiently by first reducing it to B and then solving B. NP-completeness proofs use this technique: to show that a new problem is NP-complete, reduce a known NP-complete problem to it in polynomial time. This chain of reductions, beginning with Cook's proof for SAT, has established NP-completeness for thousands of problems.

Visual Beginner

The table below shows common complexity classes and how they scale.

Complexity	Name	$n = 10$	$n = 100$	$n = 1, 000$	$n = 1, 000, 000$
$O (1)$	Constant	1	1	1	1
$O (lo g n)$	Logarithmic	3	7	10	20
$O (n)$	Linear	10	100	1,000	1,000,000
$O (n lo g n)$	Linearithmic	33	664	9,966	19,931,569
$O (n^{2})$	Quadratic	100	10,000	1,000,000	$1 0^{12}$
$O (2^{n})$	Exponential	1,024	$1 0^{30}$	$1 0^{301}$	$1 0^{301030}$

Worked example Beginner

Consider the problem of finding whether a number exists in a sorted array of $n$ numbers. Two algorithms can solve this.

Linear search examines each element in order from the first to the last. In the worst case (the target is not in the array or is the last element), it examines all $n$ elements. The worst-case complexity is $O (n)$ .

Binary search examines the middle element, determines which half could contain the target, and recurses on that half. After each comparison, the remaining search space is halved. After $k$ comparisons, at most $n / 2^{k}$ elements remain. The search ends when $n / 2^{k} = 1$ , so $k = lo g_{2} n$ . The worst-case complexity is $O (lo g n)$ .

For $n = 1, 000, 000$ , linear search performs up to 1,000,000 comparisons. Binary search performs at most 20. Binary search is 50,000 times faster.

Now consider sorting. Three algorithms solve this problem with different complexities.

Bubble sort repeatedly passes through the array, swapping adjacent elements that are out of order. After $k$ passes, the $k$ largest elements are in their final positions. The total number of comparisons is $(n - 1) + (n - 2) + \dots + 1 = n (n - 1) /2$ . This is $O (n^{2})$ .

Mergesort divides the array in half, recursively sorts each half, and merges the two sorted halves. The merge step takes $O (n)$ time. The recursion has depth $lo g_{2} n$ (each level halves the problem). The total time is $O (n lo g n)$ . This can be expressed as the recurrence $T (n) = 2 T (n /2) + O (n)$ , which resolves to $O (n lo g n)$ .

For $n = 1, 000, 000$ , bubble sort performs about $5 \times 1 0^{11}$ comparisons. Mergesort performs about $2 \times 1 0^{7}$ comparisons. Mergesort is 25,000 times faster. For $n = 10, 000, 000$ , the gap grows to roughly 250,000 times.

These numbers explain why algorithm choice matters more than hardware speed. A faster computer might execute operations 10 times faster. A better algorithm might execute 25,000 times fewer operations. The algorithmic improvement dwarfs the hardware improvement.

Worked example: analyzing a recursive algorithm. Consider computing the Fibonacci sequence. The naive recursive implementation makes two recursive calls for each input: $F (n) = F (n - 1) + F (n - 2)$ . This produces a recursion tree where the number of calls doubles at each level, resulting in $O (2^{n})$ time. For $n = 50$ , this takes over $1 0^{15}$ operations.

A memoized version stores previously computed values, avoiding redundant computation. Each Fibonacci number from 0 to $n$ is computed exactly once, giving $O (n)$ time and $O (n)$ space. An iterative version uses two variables and computes $F (n)$ in $O (n)$ time and $O (1)$ space. The matrix exponentiation method computes $F (n)$ in $O (lo g n)$ time by raising the matrix $(1110)$ to the $n$ -th power using repeated squaring.

The progression from exponential to logarithmic time, all solving the same problem, illustrates the importance of algorithm design. The exponential algorithm is unusable for $n > 40$ . The linear algorithm handles millions of values. The logarithmic algorithm handles astronomically large values.

Worked example: analyzing a nested loop. Consider this pseudocode for finding all pairs of elements that sum to a target value:

for i = 0 to n-1:
    for j = i+1 to n-1:
        if arr[i] + arr[j] == target:
            output(arr[i], arr[j])

The outer loop runs $n$ times. For each $i$ , the inner loop runs $n - i - 1$ times. The total number of iterations is $(n - 1) + (n - 2) + \dots + 1 = n (n - 1) /2$ . This is $O (n^{2})$ time. The space complexity is $O (1)$ since only loop variables are used.

A more efficient approach uses a hash set. For each element $x$ , check whether $t a r g e t - x$ is in the set. If yes, we found a pair. Add $x$ to the set. This runs in $O (n)$ expected time and $O (n)$ space. The trade-off is additional memory for faster execution, a common pattern in algorithm design: trading space for time.

Check your understanding Beginner

Formal definition Intermediate+

Big-O notation. Let $f, g : N \to R^{+}$ . We write $f (n) = O (g (n))$ if there exist positive constants $c$ and $n_{0}$ such that $f (n) \leq c \cdot g (n)$ for all $n \geq n_{0}$ . Informally, $f$ grows no faster than a constant multiple of $g$ .

Big-Omega notation. We write $f (n) = Ω (g (n))$ if there exist positive constants $c$ and $n_{0}$ such that $f (n) \geq c \cdot g (n)$ for all $n \geq n_{0}$ . This is the lower-bound analogue: $f$ grows at least as fast as a constant multiple of $g$ .

Big-Theta notation. We write $f (n) = Θ (g (n))$ if $f (n) = O (g (n))$ and $f (n) = Ω (g (n))$ . This means $f$ and $g$ grow at the same rate, up to constant factors.

Little-o notation. We write $f (n) = o (g (n))$ if for every positive constant $c > 0$ , there exists $n_{0}$ such that $f (n) < c \cdot g (n)$ for all $n \geq n_{0}$ . This is a strict upper bound: $f$ grows strictly slower than $g$ .

Little-omega notation. We write $f (n) = ω (g (n))$ if for every positive constant $c > 0$ , there exists $n_{0}$ such that $f (n) > c \cdot g (n)$ for all $n \geq n_{0}$ .

Properties of asymptotic notation

Transitivity. If $f = O (g)$ and $g = O (h)$ , then $f = O (h)$ . The same holds for $Ω$ , $Θ$ , $o$ , and $ω$ .

Reflexivity. $f = O (f)$ , $f = Ω (f)$ , $f = Θ (f)$ .

Symmetry. $f = Θ (g)$ if and only if $g = Θ (f)$ .

Transpose symmetry. $f = O (g)$ if and only if $g = Ω (f)$ . $f = o (g)$ if and only if $g = ω (f)$ .

Common complexity classes

$O (1) \subset O (lo g n) \subset O (n) \subset O (n lo g n) \subset O (n^{2}) \subset O (n^{3}) \subset O (2^{n}) \subset O (n!)$

Each inclusion is strict: each class grows strictly faster than the previous one.

The master theorem

The master theorem provides a general solution for recurrences of the form $T (n) = a T (n / b) + f (n)$ , where $a \geq 1$ and $b > 1$ . This recurrence describes divide-and-conquer algorithms that split the problem into $a$ subproblems of size $n / b$ , spending $f (n)$ time on the divide and combine steps.

Case 1. If $f (n) = O (n^{l o g_{b} a - ϵ})$ for some $ϵ > 0$ , then $T (n) = Θ (n^{l o g_{b} a})$ .

Case 2. If $f (n) = Θ (n^{l o g_{b} a} lo g^{k} n)$ for some $k \geq 0$ , then $T (n) = Θ (n^{l o g_{b} a} lo g^{k + 1} n)$ .

Case 3. If $f (n) = Ω (n^{l o g_{b} a + ϵ})$ for some $ϵ > 0$ , and if $a f (n / b) \leq c f (n)$ for some $c < 1$ and large $n$ , then $T (n) = Θ (f (n))$ .

For mergesort: $T (n) = 2 T (n /2) + O (n)$ . Here $a = 2$ , $b = 2$ , $f (n) = n$ . We have $n^{l o g_{b} a} = n^{l o g_{2} 2} = n^{1} = n$ . This is Case 2 with $k = 0$ : $T (n) = Θ (n lo g n)$ .

For binary search: $T (n) = T (n /2) + O (1)$ . Here $a = 1$ , $b = 2$ , $f (n) = 1$ . $n^{l o g_{2} 1} = n^{0} = 1$ . Case 2 with $k = 0$ : $T (n) = Θ (lo g n)$ .

For Strassen's matrix multiplication: $T (n) = 7 T (n /2) + O (n^{2})$ . Here $a = 7$ , $b = 2$ . $n^{l o g_{2} 7} \approx n^{2.81}$ . Since $n^{2} = O (n^{2.81 - ϵ})$ for $ϵ = 0.81$ , this is Case 1: $T (n) = Θ (n^{l o g_{2} 7}) \approx Θ (n^{2.81})$ . Strassen's algorithm multiplies matrices faster than the naive $O (n^{3})$ method by reducing the number of recursive multiplications from 8 to 7 at the cost of more additions.

Amortized analysis techniques

Aggregate method. Compute the total cost of $n$ operations and divide by $n$ . For a dynamic array, $n$ insertions include $O (lo g n)$ resizing operations. The total cost is $1 + 2 + 4 + \dots + n /2 + n < 2 n$ , so the amortized cost per insertion is $O (1)$ .

Accounting method. Assign different "charges" to different operations. A cheap operation is charged more than its actual cost, and the surplus is stored as "credit" that pays for expensive operations later. For a dynamic array, charge each insertion $3 (a c t u a l cos t$ 1 + $2 cr e d i t) . W h e n r es i z in g occ u r s, t h e a cc u m u l a t e d$ 2 credit per element pays for copying.

Potential method. Define a potential function $Φ$ that maps the data structure state to a non-negative real number. The amortized cost of an operation is its actual cost plus the change in potential: $\overset{c}{^} = c + Φ (D^{'}) - Φ (D)$ . If the potential increases during cheap operations and decreases during expensive ones, the amortized costs are bounded. For a dynamic array, $Φ = 2 n - capacity$ captures the excess capacity that will fund the next resize.

Key result: the sorting lower bound Intermediate+

Theorem. Any comparison-based sorting algorithm requires $Ω (n lo g n)$ comparisons in the worst case to sort $n$ elements.

Proof. A comparison-based sorting algorithm can be modeled as a binary decision tree. Each internal node represents a comparison between two elements. Each leaf represents a permutation that the algorithm outputs. The algorithm must distinguish among all $n!$ possible permutations of the input, because any permutation could be the correct sorted order. Therefore, the decision tree must have at least $n!$ leaves.

A binary tree of height $h$ has at most $2^{h}$ leaves. Therefore:

$2^{h} \geq n!$

Taking logarithms:

$h \geq lo g_{2} (n!)$

By Stirling's approximation, $n! \approx (n / e)^{n} 2 π n$ , so $lo g_{2} (n!) = Ω (n lo g n)$ . More precisely:

$lo g_{2} (n!) = i = 1 \sum n lo g_{2} i \geq i = ⌈ n /2 ⌉ \sum n lo g_{2} i \geq \frac{n}{2} lo g_{2} \frac{n}{2} = Ω (n lo g n)$

Therefore $h = Ω (n lo g n)$ , and any comparison-based sorting algorithm requires $Ω (n lo g n)$ comparisons in the worst case. $□$

This result establishes a fundamental limit: no comparison-based sorting algorithm can do better than $O (n lo g n)$ in the worst case. Mergesort and heapsort achieve this bound, making them asymptotically optimal among comparison sorts.

Non-comparison sorts

Sorting algorithms that are not comparison-based can beat the $O (n lo g n)$ bound under certain conditions. Counting sort assumes the input elements are integers in a known range ${0, 1, \dots, k}$ and runs in $O (n + k)$ time. Radix sort processes digits from least to most significant, using counting sort as a subroutine, and runs in $O (d (n + k))$ time where $d$ is the number of digits. For $d$ -digit numbers with $k = 10$ , radix sort runs in $O (d n)$ , which is $O (n)$ when $d$ is constant.

These algorithms exploit additional structure in the input (bounded integer range) that comparison sorts cannot use. The trade-off is generality: comparison sorts work on any ordered data type, while counting and radix sorts require integer keys.

Bucket sort assumes that the input is uniformly distributed over a range. It divides the range into $n$ equal-sized buckets, places each element in its bucket, sorts each bucket (using insertion sort or another algorithm), and concatenates the buckets. For uniformly distributed input, each bucket contains $O (1)$ elements on average, so the total time is $O (n)$ . In the worst case (all elements in one bucket), bucket sort degrades to $O (n^{2})$ .

The practical lesson is that domain knowledge can enable more efficient algorithms. If you know your data has special structure (bounded integers, uniform distribution, nearly sorted), you can exploit it for better performance. Generic algorithms (like comparison sorts) provide guarantees for arbitrary input but may be suboptimal for specific cases.

Exercises Intermediate+

Domain evidence Master

Sorting in practice. The choice of sorting algorithm has enormous practical consequences. Python's built-in sort uses Timsort, a hybrid of mergesort and insertion sort optimized for real-world data that often contains partially ordered runs. Timsort achieves $O (n)$ best case on already-sorted data and $O (n lo g n)$ worst case. Java's Arrays.sort uses dual-pivot quicksort for primitives and Timsort for objects. The standard library implementations reflect decades of engineering optimization informed by complexity analysis.

Graph algorithms at scale. Google's PageRank algorithm processes the web graph (billions of nodes) using power iteration. Each iteration computes matrix-vector multiplications in $O (n + m)$ time where $n$ is the number of pages and $m$ is the number of links. The $O (n lo g n)$ complexity of Dijkstra's shortest path algorithm (using a binary heap) makes it practical for road networks with millions of nodes. For social network analysis, the $O (n^{3})$ Floyd-Warshall all-pairs shortest path algorithm is impractical for graphs with millions of nodes, necessitating approximate algorithms or decomposition strategies.

Database query optimization. Relational database query planners use complexity analysis to choose between execution plans. A nested-loop join of two tables with $n$ and $m$ rows takes $O (nm)$ time. A hash join takes $O (n + m)$ expected time. A merge join on sorted inputs takes $O (n + m)$ . For large tables, the difference between $O (nm)$ and $O (n + m)$ can be the difference between hours and seconds. Query planners estimate the cost of each plan and choose the cheapest, applying the principles of algorithmic complexity to real-world data processing.

Cryptographic key sizes. The complexity of factoring algorithms determines the minimum key sizes for RSA security. The general number field sieve factors $n$ -bit integers in approximately $exp ((64/9)^{1/3} n^{1/3} (lo g n)^{2/3})$ subexponential time. This complexity estimate drives NIST's recommendations: RSA keys should be at least 2048 bits (3072 bits recommended), corresponding to the estimated difficulty of factoring with current algorithms and hardware. If a polynomial-time factoring algorithm were discovered, all RSA keys would become insecure overnight.

Advanced results Master

The complexity hierarchy theorem

The deterministic time hierarchy theorem (Hartmanis and Stearns, 1965) states that given a time-constructible function $t (n)$ , there exist problems solvable in $O (t (n))$ time that are not solvable in $o (t (n) / lo g t (n))$ time on a deterministic Turing machine. This means there are problems that require more time than others: more time lets you solve more problems.

The space hierarchy theorem provides an analogous result for space. Together, these theorems establish an infinite hierarchy of complexity classes, each strictly more powerful than the previous one.

P, NP, and NP-completeness

The class P consists of all decision problems solvable in polynomial time by a deterministic Turing machine. The class NP consists of all decision problems whose yes-instances can be verified in polynomial time given a certificate. Equivalently, NP is the class of problems solvable in polynomial time by a nondeterministic Turing machine.

NP-completeness identifies the hardest problems in NP. A problem $L$ is NP-complete if (1) $L \in N P$ , and (2) every problem in NP is polynomial-time reducible to $L$ . Cook's 1971 theorem established that SAT (Boolean satisfiability) is NP-complete. Karp's 1972 paper showed that 21 other problems are NP-complete by reduction from SAT.

If any NP-complete problem has a polynomial-time algorithm, then P = NP. Whether P equals NP remains the most important open question in theoretical computer science. Most researchers believe P != NP, but a proof has remained elusive for over fifty years.

Beyond NP: PSPACE and EXPTIME

PSPACE is the class of problems solvable in polynomial space. PSPACE contains NP (since polynomial time implies polynomial space) and is believed to strictly contain it. PSPACE-complete problems, like generalized chess and Go, require polynomial space but may need exponential time.

EXPTIME is the class of problems solvable in exponential time. By the time hierarchy theorem, EXPTIME strictly contains P. Problems like deciding the winner in a generalized game of chess on an $n \times n$ board are EXPTIME-complete.

Circuit complexity

Boolean circuits provide an alternative model of computation. A Boolean circuit with $n$ inputs and one output computes a Boolean function $f : {0, 1}^{n} \to {0, 1}$ . The circuit complexity of $f$ is the size (number of gates) of the smallest circuit computing $f$ .

A long-standing open problem is whether there exist functions in NP that require super-polynomial circuit size. If such functions exist, then P != NP. Shannon's counting argument shows that most Boolean functions require exponential circuit size ( $2^{n} / n$ ), but explicit constructions of hard functions remain elusive.

Communication complexity

Communication complexity, introduced by Yao in 1979, studies the minimum number of bits two parties must exchange to compute a function of their combined inputs. Each party holds part of the input, and they communicate to compute the output.

Communication complexity lower bounds have been used to prove lower bounds in many other models, including circuit depth, data structure query time, streaming algorithms, and extended formulations for linear programming. The method of "lifting" converts communication lower bounds into lower bounds for other models, providing a unified framework for proving hardness results.

Fine-grained complexity

Fine-grained complexity studies the exact complexity of problems within P. Rather than asking whether a problem is in P or not, it asks whether a problem can be solved in, say, $O (n^{2 - ϵ})$ time for some $ϵ > 0$ , or whether the known $O (n^{2})$ algorithm is optimal.

The Strong Exponential Time Hypothesis (SETH), formulated by Impagliazzo and Paturi in 2001, conjectures that SAT cannot be solved in $O (2^{(1 - ϵ) n})$ time for any $ϵ > 0$ . SETH has been used to show tight lower bounds for many problems in P: edit distance cannot be solved in $O (n^{2 - ϵ})$ , longest common subsequence cannot be solved in $O (n^{2 - ϵ})$ , and many other problems have similar conditional lower bounds.

Parameterized complexity

Parameterized complexity studies the complexity of problems relative to a parameter $k$ in addition to the input size $n$ . A problem is fixed-parameter tractable (FPT) if it can be solved in $O (f (k) \cdot n^{c})$ time for some function $f$ and constant $c$ . The key insight is that for small values of $k$ , FPT algorithms are efficient even for large $n$ .

The W-hierarchy (W[1], W[2], ...) captures problems believed not to be FPT. Showing that a problem is W[1]-hard provides evidence that it is not fixed-parameter tractable, analogous to how NP-hardness provides evidence that a problem is not polynomial-time solvable.

A classic FPT problem is vertex cover: given a graph $G$ and parameter $k$ , does $G$ have a vertex cover of size at most $k$ ? A bounded search tree algorithm solves this in $O (2^{k} \cdot n)$ time by branching on each edge: at least one endpoint must be in the cover, so we try both and recurse. For $k = 10$ and $n = 1, 000, 000$ , this takes about $1 0^{3} \cdot 1 0^{6} = 1 0^{9}$ operations, which is feasible. The brute-force approach (checking all subsets of size $k$ ) takes $(k n) \approx n^{k}$ time, which for $k = 10$ and $n = 1 0^{6}$ is $1 0^{60}$ , completely infeasible. The FPT algorithm's running time depends polynomially on $n$ and exponentially only on $k$ , making it practical for small $k$ even when $n$ is very large.

Average-case complexity

Average-case complexity studies the expected running time of algorithms over a distribution of inputs. Leonid Levin defined the notion of average-case completeness: problems that are hard on average (not just in the worst case) under specific distributions. This framework is important for cryptography, where security requires that problems be hard not just for some inputs but for randomly chosen inputs.

The smoothed analysis framework (Spielman and Teng, 2001) bridges worst-case and average-case analysis. Instead of asking about the worst input or the average input, smoothed analysis asks about the worst input after a small random perturbation. Formally, the smoothed complexity of an algorithm is $max_{x} E_{g} [T (x + g)]$ , where $g$ is Gaussian noise with standard deviation $σ$ . For the simplex method, Spielman and Teng showed that the smoothed complexity is polynomial for any $σ > 0$ , explaining why simplex is fast in practice despite its exponential worst case.

Online algorithms and competitive analysis

Online algorithms must make decisions without knowing future inputs. A caching algorithm must decide which item to evict without knowing which items will be requested next. A load balancing algorithm must assign jobs to servers without knowing future job arrivals.

Competitive analysis measures the performance of an online algorithm against the optimal offline algorithm that knows the entire input in advance. An online algorithm is $c$ -competitive if its cost is at most $c$ times the optimal cost for every input sequence. The LRU (Least Recently Used) caching algorithm, which evicts the least recently accessed item, is $k$ -competitive where $k$ is the cache size. No deterministic online caching algorithm can do better than $k$ -competitive, so LRU is optimal among deterministic strategies.

Exercise 5 (hard, short answer).

Explain why the $O (n lo g n)$ lower bound for comparison-based sorting does not apply to radix sort. What additional assumption does radix sort make?

Hint

The lower bound applies specifically to algorithms that determine order through pairwise comparisons.

Answer

The $Ω (n lo g n)$ lower bound applies only to comparison-based sorting algorithms, which determine the sorted order by comparing pairs of elements. Radix sort is not comparison-based: it exploits the structure of the keys (they are integers with a fixed number of digits) by sorting digit by digit using counting sort as a subroutine. By avoiding comparisons and using the integer representation directly, radix sort achieves $O (d n)$ time where $d$ is the number of digits. The additional assumption is that keys are integers (or can be mapped to integers) within a bounded range.

Exercise 6 (hard, short answer).

State the Strong Exponential Time Hypothesis (SETH) and explain one consequence it has for problems in P.

Hint

SETH is a stronger version of the P != NP conjecture, stating that SAT cannot be solved much faster than brute force.

Answer

SETH conjectures that for every $ϵ > 0$ , there exists an integer $k$ such that $k$ -SAT on $n$ variables cannot be solved in $O (2^{(1 - ϵ) n})$ time. This is stronger than P != NP because it makes a quantitative claim about the best possible exponent. One consequence is that many problems in P have conditional lower bounds: under SETH, edit distance and longest common subsequence cannot be solved in $O (n^{2 - ϵ})$ time for any $ϵ > 0$ , and many graph problems cannot be solved in $O (n^{3 - ϵ})$ time. These results explain why decades of attempts to improve these algorithms beyond their known $O (n^{2})$ or $O (n^{3})$ bounds have failed.

Connections Master

Connections to mathematics and number theory

Complexity theory has deep connections to number theory. The AKS primality test (2002) placed primality testing in P, resolving a long-standing question. Integer factorization, the basis of RSA encryption, is in NP but is not known to be NP-complete. Its suspected intermediate status (neither in P nor NP-complete) connects to the study of NP-intermediate problems under the assumption P != NP (Ladner's theorem).

Complexity theory also connects to algebraic geometry through the study of arithmetic circuits and the VP vs VNP question, an algebraic analogue of P vs NP. The permanent of a matrix, which is #P-complete (harder than any NP problem), connects to combinatorics and representation theory.

Connections to cryptography

The security of cryptographic systems depends on complexity assumptions. RSA assumes integer factorization is hard. Diffie-Hellman assumes the discrete logarithm problem is hard. Post-quantum cryptography (lattice-based, code-based, hash-based) assumes certain problems remain hard even for quantum computers.

The relationship between complexity and cryptography is fundamental: cryptographic security requires that certain problems be computationally hard, while efficiency requires that related problems be easy. Breaking RSA (factoring) should be hard; using RSA (modular exponentiation) should be easy. This asymmetry is the foundation of public-key cryptography.

One-way functions, which are easy to compute but hard to invert, are the theoretical foundation of cryptography. A function $f$ is one-way if $f (x)$ can be computed in polynomial time, but for a randomly chosen $y$ in the range of $f$ , finding any $x$ such that $f (x) = y$ cannot be done in polynomial time with non-negligible probability. The existence of one-way functions implies P != NP, but the converse is not known. Candidate one-way functions include integer multiplication (factoring is believed hard), discrete exponentiation (discrete logarithm is believed hard), and cryptographic hash functions.

Connections to physics

The relationship between computational complexity and physics has deepened considerably. Quantum computing challenges the classical complexity hierarchy: Shor's algorithm factors integers in polynomial time on a quantum computer, and Grover's algorithm searches an unsorted database in $O (n)$ time. The class BQP (bounded-error quantum polynomial time) contains problems solvable efficiently by quantum computers. Whether BQP strictly contains P is an open question.

The connections go deeper. Black hole physics (the firewall paradox) has been related to computational complexity through the complexity=action and complexity=volume conjectures. The holographic principle relates the complexity of quantum states to geometric properties of spacetime.

The relationship between thermodynamics and computation is also fundamental. Landauer's principle (1961) establishes a physical lower bound on the energy required to erase one bit of information: $E \geq k T ln 2$ , where $k$ is Boltzmann's constant and $T$ is temperature. This connects information theory, computation, and physics: there are fundamental physical limits to how efficiently computation can be performed. Reversible computation (which does not erase information) can theoretically operate with zero energy dissipation, but irreversible operations have a minimum energy cost dictated by thermodynamics.

Connections to optimization

Many optimization problems in operations research, logistics, and engineering are NP-hard. Integer programming, the traveling salesman problem, scheduling, and facility location all lack known polynomial-time algorithms. Approximation algorithms and heuristics provide practical solutions with provable or empirical quality guarantees.

Linear programming, by contrast, is solvable in polynomial time (via the ellipsoid method or interior point methods). This dichotomy between linear and integer programming illustrates the sharp boundary between tractable and intractable problems.

The simplex method, developed by George Dantzig in 1947, solves linear programs by walking along vertices of the feasible polyhedron. In the worst case, it visits an exponential number of vertices, but in practice it is remarkably fast. Interior point methods, developed by Karmarkar in 1984, provably solve linear programs in polynomial time and are competitive with simplex on many practical instances. The contrast between simplex (exponential worst case, fast in practice) and interior point (polynomial worst case) illustrates why worst-case complexity is only part of the story.

Connections to software engineering

Algorithmic complexity directly affects software performance. A developer who uses a bubble sort ( $O (n^{2})$ ) instead of a built-in sort ( $O (n lo g n)$ ) on a million-element list creates code that is 50,000 times slower. Understanding complexity helps developers choose appropriate data structures (hash tables for $O (1)$ lookups vs. sorted arrays for $O (lo g n)$ binary search), design efficient algorithms, and predict whether their code will scale.

Profiling tools measure where programs spend their time, but complexity analysis predicts whether a program will scale before it is written. A startup building a social network with 1,000 users might not notice that their friend recommendation algorithm is $O (n^{3})$ . At 1 million users, the algorithm becomes 10^12 times slower, turning a 1-second computation into a 30-year computation. Understanding complexity prevents these scaling disasters.

Historical and philosophical context Master

The origins of asymptotic analysis

The formal study of algorithm efficiency began with the analysis of sorting algorithms in the 1950s and 1960s. Before Big-O notation, algorithms were compared empirically: run them and measure the time. This approach is unreliable because results depend on the hardware, the implementation, and the specific inputs chosen.

Paul Bachmann introduced Big-O notation in 1894 in the context of analytic number theory. Edmund Landau adopted and popularized it. Donald Knuth extended the notation to include $Ω$ and $Θ$ in his 1976 article "Big Omicron and Big Omega and Big Theta," standardizing the terminology used today. Knuth chose to use the Greek letter Omicron ( $O$ ) for the upper bound because it matched the traditional mathematical notation, and he introduced $Ω$ (Omega) for lower bounds and $Θ$ (Theta) for tight bounds.

The Hartmanis-Stearns hierarchy

Juris Hartmanis and Richard Stearns published their hierarchy theorems in 1965, establishing that giving a Turing machine more time (or space) strictly increases the class of problems it can solve. This was one of the first results in what became the field of computational complexity theory, and it earned them the 1993 Turing Award.

Their work showed that the intuitive notion that "more resources let you solve more problems" could be made mathematically precise. The time hierarchy theorem shows that there exist problems solvable in $O (n^{3})$ time that are not solvable in $O (n^{2})$ time. More generally, for any time-constructible function $t (n)$ , there are problems requiring $t (n)$ time that cannot be solved in $o (t (n) / lo g t (n))$ time.

Cook, Karp, and the P versus NP question

Stephen Cook's 1971 paper "The Complexity of Theorem-Proving Procedures" introduced the concept of NP-completeness and showed that SAT is NP-complete. Richard Karp's 1972 paper "Reducibility among Combinatorial Problems" showed that 21 diverse problems are all NP-complete, establishing the breadth and importance of the concept.

Leonid Levin, working independently in the Soviet Union, discovered similar results around the same time. The Cook-Levin theorem is thus named for both researchers.

The P versus NP question was formally posed in these papers, though the concept was foreshadowed by Kurt Godel in a 1956 letter to John von Neumann, where Godel asked whether theorem-proving could be done in linear or quadratic time. Von Neumann, already ill with cancer, did not reply.

The philosophical significance of complexity

Computational complexity raises philosophical questions about the nature of mathematical knowledge. A proof that P != NP would establish that there are problems where verifying a solution is fundamentally easier than finding one. This would formalize the intuition that creativity (finding solutions) is harder than appreciation (recognizing correct solutions).

The question also touches on the nature of creativity itself. If P = NP, then any problem whose solution can be efficiently verified can also be efficiently solved. This would mean that finding mathematical proofs, composing music, writing novels, and designing engineering solutions could all be automated. The societal implications would be transformative and, for many, deeply unsettling.

Scott Aaronson has argued that P versus NP is not merely a technical question but a question about the nature of mathematical reality. If P != NP, it confirms that the universe contains intrinsic computational barriers that no amount of cleverness can overcome.

The practical impact of complexity theory

Complexity theory has had enormous practical impact beyond theoretical computer science. The theory of NP-completeness gives engineers a precise language for explaining why certain problems are hard: when a problem is NP-hard, no efficient algorithm is known, and finding one would resolve the P versus NP question. This shifts the focus from "find a polynomial algorithm" to "find a good approximation" or "find an algorithm that works well in practice."

Approximation algorithms provide provably near-optimal solutions to NP-hard optimization problems. The vertex cover problem, while NP-hard, has a simple 2-approximation: find a maximal matching and take both endpoints of each matched edge. This algorithm runs in polynomial time and produces a solution at most twice the optimal size. The PCP theorem (Arora et al., 1998) showed that many NP-hard problems cannot be approximated beyond certain thresholds, establishing limits on what polynomial-time algorithms can achieve.

Randomized algorithms use randomness to achieve efficiency that deterministic algorithms cannot match (or are not known to match). The Miller-Rabin primality test runs in $O (k lo g^{3} n)$ time for $k$ iterations, with probability of error at most $4^{- k}$ . For practical purposes (say $k = 40$ ), this gives a deterministic-running, extremely reliable primality test that is much faster than the deterministic AKS algorithm. The probabilistic method in combinatorics, pioneered by Paul Erdos, uses randomness to prove the existence of combinatorial objects with desired properties, even when constructing them deterministically is difficult.

Las Vegas algorithms always produce correct results but have random running time. Quicksort with random pivot selection is a Las Vegas algorithm: it always sorts correctly, but its running time varies. Monte Carlo algorithms have deterministic running time but may produce incorrect results with bounded probability. The Miller-Rabin primality test is a Monte Carlo algorithm: it runs in deterministic time but has a small probability of incorrectly reporting a composite number as prime. The choice between Las Vegas and Monte Carlo depends on the application: for problems where correctness is critical (cryptographic key generation), Las Vegas is preferred. For problems where speed is critical and approximate answers are acceptable (Monte Carlo integration), Monte Carlo is preferred.

Complexity and machine learning

Machine learning raises new questions for complexity theory. Training neural networks is NP-hard in the worst case (Blum and Rivest, 1993), yet gradient descent finds good solutions in practice. This gap between worst-case theory and empirical practice is an active area of research. Smoothed analysis (Spielman and Teng, 2001), which won the 2008 Godel Prize, provides a partial explanation: it shows that for many problems, worst-case instances are fragile and small random perturbations make them tractable. This framework explains why algorithms that are slow in theory (like the simplex method for linear programming) are fast in practice.

Bibliography Master

Primary sources

Cook, S.A. (1971). "The complexity of theorem-proving procedures." Proceedings of the 3rd ACM Symposium on Theory of Computing, 151-158.
Karp, R.M. (1972). "Reducibility among combinatorial problems." In Complexity of Computer Computations, 85-103.
Hartmanis, J. and Stearns, R.E. (1965). "On the computational complexity of algorithms." Transactions of the American Mathematical Society, 117, 285-306.
Knuth, D.E. (1976). "Big omicron and big omega and big theta." ACM SIGACT News, 8(2), 18-24.

Secondary sources

Arora, S. and Barak, B. (2009). Computational Complexity: A Modern Approach. Cambridge University Press.
Papadimitriou, C.H. (1994). Computational Complexity. Addison-Wesley.
Cormen, T.H., Leiserson, C.E., Rivest, R.L., and Stein, C. (2009). Introduction to Algorithms (3rd ed.). MIT Press.
Sipser, M. (2012). Introduction to the Theory of Computation (3rd ed.). Cengage Learning.
Aaronson, S. (2016). "P ?= NP." In Open Problems in Mathematics, Springer.
Impagliazzo, R. and Paturi, R. (2001). "On the complexity of k-SAT." Journal of Computer and System Sciences, 62(2), 367-375.
Downey, R.G. and Fellows, M.R. (2013). Fundamentals of Parameterized Complexity. Springer.
Kushilevitz, E. and Nisan, N. (1997). Communication Complexity. Cambridge University Press.
Aaronson, S. (2013). Quantum Computing since Democritus. Cambridge University Press.
Arora, S. and Barak, B. (2009). Computational Complexity: A Modern Approach. Cambridge University Press.

Prerequisites

25.01.01

Tier anchors

beginner: Cormen et al., Introduction to Algorithms (3e), Ch. 2-3; Sedgewick and Wayne, Algorithms (4e), Ch. 1
intermediate: Cormen et al., Introduction to Algorithms (3e), Ch. 3-4; Kleinberg and Tardos, Algorithm Design, Ch. 2
master: Arora and Barak, Computational Complexity: A Modern Approach, Ch. 1; Papadimitriou, Computational Complexity, Ch. 1-2

References

computer-science · Ch. 7-9
Cormen, T.H., Leiserson, C.E., Rivest, R.L., and Stein, C., Introduction to Algorithms (3e, MIT Press, 2009) · Ch. 2-4 · source being verified
Arora, S. and Barak, B., Computational Complexity: A Modern Approach (Cambridge University Press, 2009) · Ch. 1-3 · source being verified
Papadimitriou, C.H., Computational Complexity (Addison-Wesley, 1994) · Ch. 1-2 · source being verified

Estimated time

beginner: 25m
intermediate: 50m
master: 75m