29.05.01 · psychology / cognition-intelligence

Cognition and intelligence: thinking, reasoning, and the measurement of mind

shipped3 tiersLean: nonepending prereqs

Anchor (Master): primary sources: Binet 1905, Terman 1916, Spearman 1904, Thurstone 1938, Kahneman & Tversky 1974/1979/1982, Gardner 1983, Sternberg 1985, Steele & Aronson 1995, Nisbett et al. 2012

Intuition Beginner

You make thousands of decisions every day. Most take less than a second. Should you bring an umbrella? Is that person friendly or hostile? Is this a good time to cross the street? Your brain handles these questions without breaking stride, and most of the time the answers are good enough.

But sometimes they are not. You overestimate how long a project will take. You stick with a bad restaurant because you already waited thirty minutes for a table. You assume a articulate person must also be knowledgeable. You remember dramatic plane crashes and forget the routine car accidents, so you fear flying more than driving even though driving is statistically far more dangerous.

These errors are not random. They are systematic, predictable, and surprisingly universal. In the 1970s, two Israeli psychologists — Amos Tversky and Daniel Kahneman — began documenting them. They showed that human thinking relies on mental shortcuts, called heuristics, that work well most of the time but produce characteristic mistakes, called biases, when they misfire. This discovery reshaped psychology, economics, medicine, and public policy. Kahneman won the Nobel Prize in Economics in 2002 (Tversky had died in 1996 and could not share it).

This unit covers two large territories. Cognition is the study of how people think: how they reason, decide, solve problems, use language, and create new ideas. Intelligence is the study of how cognitive ability varies between people, how it is measured, and what those measurements mean — and what they do not mean.

Both territories are contested. What counts as "intelligent" depends on who is asking and why. The history of intelligence testing is tangled with eugenics, scientific racism, and the misuse of science to justify inequality. Untangling it requires looking at the evidence honestly — which this unit does.

Start with a basic distinction. Kahneman describes two modes of thinking. System 1 is fast, automatic, and intuitive. It produces answers without conscious effort: you see a face and instantly know it is angry; you hear "two plus two" and "four" pops into your head. System 2 is slow, deliberate, and effortful. It activates when you multiply 27 by 43, compare two mortgage offers, or follow a complex argument.

The interaction between these two systems drives most of the phenomena in this unit. System 1 generates quick answers. System 2 can override them, but often does not — because System 2 is lazy, or because System 1's answer feels right.

Visual Beginner

The landscape of cognition and intelligence can be mapped as a tree with three major branches, each feeding into the central question of what it means to be a thinking being.

The diagram simplifies a complex field. In reality, the branches overlap: language affects reasoning, intelligence affects problem-solving, and the tools used to measure intelligence are themselves products of particular cultural and linguistic traditions.

Worked example Beginner

Consider the following problem, adapted from Tversky and Kahneman's experiments.

Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.

Which is more probable?

A. Linda is a bank teller. B. Linda is a bank teller and is active in the feminist movement.

Most people choose B. But B cannot be more probable than A, because the set of "bank tellers who are feminists" is a subset of the set of "bank tellers." Adding a detail can only reduce the probability, never increase it.

This is the conjunction fallacy. It occurs because System 1 matches Linda's description to a stereotype — feminist activist — and evaluates how well each option fits that stereotype, rather than computing the actual probabilities. Option B fits the description better, but it is less probable.

Tversky and Kahneman found that even people trained in statistics commit this error. The conjunction fallacy is not a sign of stupidity. It is a signature of how human cognition works: we judge by resemblance, not by calculation.

Check your understanding Beginner

Exercise (easy, multiple choice).

Which of the following best describes the relationship between System 1 and System 2 thinking?

A. System 1 handles emotions; System 2 handles logic. B. System 1 is fast and automatic; System 2 is slow and deliberate. C. System 1 is used by children; System 2 is used by adults. D. System 1 is accurate; System 2 is error-prone.

Hint

Think about the speed and effort involved, not the content of the thinking.

Answer

Option B. System 1 operates quickly and automatically, generating intuitive responses without conscious effort. System 2 is slow, deliberate, and requires mental effort. Both systems handle a wide range of content — System 1 is not limited to emotions, and System 2 is not limited to logic. Both children and adults use both systems. System 1 is usually accurate enough for everyday life, and System 2 can make errors too (especially when tired or overloaded).

Exercise (medium, short answer).

You read that a city has two hospitals. The large hospital births about 45 babies per day; the small hospital births about 15. On average, each hospital has days where more than 60% of the babies born are boys. Which hospital has more such days, or are they about the same? Explain your reasoning using the concept of sample size.

Hint

Think about coin flips. If you flip a fair coin 10 times, how often do you get 6 or more heads? What if you flip it 100 times?

Answer

The small hospital has more such days. This is an application of the law of large numbers. With smaller samples, outcomes deviate more from the expected 50-50 split. With only 15 births per day, random variation can easily push the proportion above 60%. With 45 births, the larger sample size makes extreme proportions much less likely. Most people answer "about the same" because they judge by the representativeness of the description (both hospitals are in the same city, so they should be similar) rather than by the statistical principle of sample size — a classic demonstration of the representativeness heuristic.

Formal definition Intermediate

Cognition refers to the mental processes involved in acquiring knowledge and understanding, including perception, attention, memory, reasoning, judgment, problem-solving, decision-making, and language production and comprehension. The term comes from the Latin cognoscere, "to know."

Intelligence is a more contested term. A consensus definition endorsed by 52 leading researchers in a 1996 editorial in the Wall Street Journal states:

Intelligence is a very general mental capability that, among other things, involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience. It is not merely book learning, a narrow academic skill, or test-taking smarts. Rather, it reflects a broader and deeper capability for comprehending our surroundings — "catching on," "making sense" of things, or "figuring out" what to do.

This definition is useful but not uncontroversial. Different cultures emphasize different cognitive abilities. The psychologist Robert Sternberg has argued that Western psychology defines intelligence too narrowly, focusing on analytical and academic skills while neglecting practical and creative intelligence. Howard Gardner has argued that intelligence is not one thing but several, each relatively independent.

The key methodological distinction in intelligence research is between psychometric approaches (which measure performance on standardized tests and analyze the statistical structure of test scores) and cognitive approaches (which study the mental processes underlying test performance). Psychometric approaches ask: how much intelligence does a person have, and how is it structured? Cognitive approaches ask: what mental processes produce intelligent behavior?

Heuristics are mental shortcuts — simple rules or strategies that reduce the cognitive load of decision-making. They operate by ignoring some available information, trading optimality for speed. The psychologist Gerd Gigerenzer has argued that heuristics are not irrational shortcuts but adaptive tools — "fast and frugal" strategies that exploit the structure of natural environments. Tversky and Kahneman emphasized the conditions under which heuristics lead to systematic errors. Both perspectives are supported by evidence; they differ in emphasis.

Cognitive biases are systematic deviations from normative standards of reasoning. A cognitive bias is not a random error — it is a predictable tendency to err in a particular direction. The word "bias" does not imply prejudice in the social sense (though cognitive biases can contribute to social prejudices); it refers to a statistical bias in judgment.

Stereotype threat, defined by Claude Steele and Joshua Aronson in 1995, is the risk of confirming a negative stereotype about one's group, which creates anxiety and cognitive load that impair performance on the stereotyped domain.

Key concepts: heuristics and biases Intermediate

Tversky and Kahneman identified three classic heuristics in their 1974 paper "Judgment under Uncertainty: Heuristics and Biases," published in Science. Each heuristic produces a characteristic family of biases.

The availability heuristic

People estimate the frequency or probability of an event by how easily examples come to mind. Events that are vivid, recent, or heavily covered by media are easier to recall, so they seem more common than they actually are.

After the September 11 attacks, many Americans chose to drive rather than fly. Driving is statistically far more dangerous than flying, but terrorist attacks are vivid and memorable, while car accidents are routine and invisible. The result: an estimated 1,595 additional road deaths in the year following the attacks — six times the number of passengers killed on the four hijacked planes, according to an analysis by Gerd Gigerenzer.

The availability heuristic also explains why people overestimate the prevalence of dramatic causes of death (homicide, tornadoes, shark attacks) and underestimate common ones (heart disease, stroke, diabetes). Newspaper coverage is not proportional to actual risk.

The representativeness heuristic

People judge the probability that A belongs to category B by how much A resembles the typical or stereotypical member of B. This shortcut works when representativeness correlates with actual probability, but it leads to systematic errors when it does not.

The base rate fallacy is one consequence. Suppose a disease affects 1 in 1,000 people, and a test for it is 99% accurate (both sensitivity and specificity). You test positive. What is the probability you actually have the disease?

Most people say around 99%. The actual answer is about 9%. Out of 1,000 people tested, roughly 1 has the disease and tests positive. But about 10 healthy people (1% of 999) will also test positive (false positives). So you have 11 positive tests, only 1 of which is a true positive: 1/11 is approximately 9%. The base rate — how rare the disease is — matters enormously, but representativeness (the test is "99% accurate") makes people ignore it.

The gambler's fallacy is another consequence. After observing a run of red on a roulette wheel, people expect black to "be due." Each spin is independent, but the representativeness heuristic expects outcomes to look random in the short run, not just in the long run.

Anchoring and adjustment

People make estimates by starting from an initial value (the anchor) and adjusting up or down. The adjustment is typically insufficient — people are "dragged" toward the anchor even when the anchor is plainly arbitrary.

In one famous experiment, Kahneman and Tversky asked participants to estimate the percentage of African countries in the United Nations. Those who first saw a random spin of a wheel landing on 10 guessed, on average, 25%. Those who saw the wheel land on 65 guessed, on average, 45%. The arbitrary anchor shifted estimates by 20 percentage points.

Anchoring affects real-world negotiations, pricing, sentencing in courts, and medical diagnoses. It works even when people know about it. Knowing about a bias provides some protection, but not immunity.

Prospect theory

Tversky and Kahneman's 1979 paper "Prospect Theory: An Analysis of Decision under Risk" challenged the standard economic assumption that people are rational utility maximizers. Prospect theory describes how people actually evaluate risky choices, using two key ideas.

Reference dependence. People evaluate outcomes as gains or losses relative to a reference point (usually the status quo), not in absolute terms. A salary increase from $50, 000 t o$ 55,000 feels like a gain. A salary decrease from $60, 000 t o$ 55,000 feels like a loss — even though the final amount is the same.

Loss aversion. Losses loom larger than equivalent gains. Losing $100 f ee l sr o ug h l y tw i ce a s p ain f u l a s g ainin g$ 100 feels good. This asymmetry explains the endowment effect: people demand more money to give up an object they own than they would pay to acquire the same object. Ownership shifts the reference point, and giving something up feels like a loss.

Prospect theory also captures the certainty effect: people overweight outcomes that are certain relative to outcomes that are merely probable. Given a choice between a certain $500 an d a 50$ 1,000 (expected value: $500), m os tp eo pl ec h oose t h ecer t ain g ain . B u t g i v e na c h o i ce b e tw ee na cer t ain l osso f$ 500 and a 50% chance of losing $1,000, most people choose the gamble — they are risk-seeking in the domain of losses.

Key concepts: problem-solving and creativity Intermediate

Problem-solving

Problem-solving involves moving from a current state to a goal state when the path is not immediately obvious. Psychologists distinguish several strategies.

Algorithms are step-by-step procedures that guarantee a solution if one exists. Searching systematically through every possible combination lock code is an algorithm. The weakness: algorithms can be prohibitively slow for large problem spaces.

Heuristics (in the problem-solving sense, distinct from judgment heuristics above) include means-end analysis (identify the difference between the current state and the goal, then find an operation that reduces that difference) and working backward (start from the goal and work back to the current state). Newell and Simon's General Problem Solver (1958) modeled human problem-solving using means-end analysis.

Insight is the sudden, unexpected solution to a problem — the "aha" moment. Classic insight problems include the nine-dot problem (connect a 3x3 grid of dots with four straight lines without lifting the pen) and the candle problem (attach a candle to a wall using only a box of thumbtacks and a book of matches, where the box must be recognized as a platform rather than merely a container). Insight problems often require restructuring — seeing the problem in a fundamentally new way.

Functional fixedness is the tendency to see objects only in terms of their typical function. In the candle problem, the box is seen as a container for thumbtacks, not as a potential shelf for the candle. Overcoming functional fixedness requires suppressing the automatic association between object and function.

Creativity

Creativity is the production of ideas or products that are both novel and appropriate to the goal. It is not limited to the arts. Scientific theories, engineering solutions, and social innovations can all be creative.

J. P. Guilford distinguished divergent thinking (generating many possible solutions) from convergent thinking (narrowing down to the best solution). Standard intelligence tests primarily measure convergent thinking. Divergent thinking is assessed through tasks like "list as many uses for a brick as you can" — where fluency (number of ideas), flexibility (variety of categories), and originality (unusualness of ideas) are all scored.

Wallace's four-stage model of creativity (1926) describes preparation (conscious work on the problem), incubation (setting the problem aside while unconscious processing continues), illumination (the sudden insight), and verification (checking that the insight actually works). This model is descriptive rather than explanatory, but it captures a widely reported phenomenological pattern.

The relationship between creativity and intelligence is debated. Threshold theory holds that above an IQ of about 120, intelligence and creativity become largely independent — suggesting that a minimum level of cognitive ability is necessary for creativity, but additional intelligence does not necessarily make you more creative.

Key concepts: language and thought Intermediate

The Sapir-Whorf hypothesis

The idea that the language you speak shapes the way you think is associated with Edward Sapir and his student Benjamin Lee Whorf, though neither used the term "Sapir-Whorf hypothesis" themselves. The hypothesis exists in two forms.

The strong form (linguistic determinism) holds that language determines thought — that you literally cannot think thoughts that your language does not permit. This version is largely rejected by modern linguists and psychologists. It is contradicted by the existence of bilingual individuals who report thinking differently in each language (implying they can entertain thoughts from both linguistic systems) and by the fact that infants and non-human animals show cognitive abilities before or without language.

The weak form (linguistic relativity) holds that language influences thought — that the habitual use of a particular language makes some ways of thinking easier or more natural, without making others impossible. This version has substantial empirical support.

Evidence for linguistic relativity

Spatial reasoning. Some Australian Aboriginal languages, such as Guugu Yimithirr and Kuuk Thaayorre, use absolute cardinal directions (north, south, east, west) rather than egocentric relative directions (left, right, front, back). Speakers of these languages maintain an extraordinary sense of orientation — they can point to north accurately even in an unfamiliar enclosed room. The language requires constant spatial tracking, and speakers develop the cognitive ability to do it.

Color perception. Languages differ in how they partition the color spectrum. Russian has separate basic terms for light blue (goluboy) and dark blue (siniy), where English uses a single term "blue." Russian speakers are faster at discriminating shades of blue that cross the goluboy/siniy boundary than shades that fall within one category, suggesting that the linguistic distinction sharpens perceptual discrimination at the category boundary.

Time. English speakers tend to gesture leftward when talking about the past and rightward when talking about the future. Aymara speakers (in the Andes) do the opposite: the past is in front (because it has been seen) and the future is behind (because it has not). Mandarin speakers sometimes gesture downward for the past and upward for the future, consistent with Mandarin's vertical time metaphors. These patterns suggest that spatial metaphors for time, encoded in language, shape temporal reasoning.

Counterfactual reasoning. Chinese languages mark counterfactuals less explicitly than English (Chinese does not have a grammatical subjunctive mood). In a classic but contested study, Alfred Bloom (1981) found that Chinese speakers had more difficulty reasoning with counterfactual scenarios presented in Chinese than English speakers had with the same scenarios in English. The study sparked a long debate; some replication attempts supported it, others did not. The current consensus is that language may facilitate certain types of reasoning without making them impossible.

Limits of linguistic relativity

Linguistic relativity is real but modest in its effects. Language is one of many factors that shape cognition, alongside culture, experience, biology, and individual variation. The strongest claims of linguistic determinism are not supported, but the weakest claims — that language influences habitual patterns of thought — are well-established.

Key concepts: intelligence testing — history Intermediate

Binet and the origins of intelligence testing

Alfred Binet developed the first practical intelligence test in France in 1905, commissioned by the French Ministry of Education to identify children who needed educational support. Binet's test assessed a range of cognitive skills — memory, attention, reasoning — through age-graded tasks. A child's performance was expressed as a mental age: the age at which the average child would perform comparably.

Binet was explicit about the limitations of his test. He believed intelligence was multifaceted and influenced by environment, not fixed at birth. He warned against using test scores to label children permanently. He intended the test as a practical tool for identifying children who needed help, not as a measure of innate worth.

Terman and the Stanford-Binet

Lewis Terman at Stanford adapted Binet's test for American use, publishing the Stanford-Binet Intelligence Scales in 1916. Terman introduced the Intelligence Quotient (IQ), originally defined as mental age divided by chronological age, multiplied by 100. A child with a mental age of 10 and a chronological age of 8 would have an IQ of 125. Modern IQ tests no longer use this ratio; instead, they set the mean at 100 and the standard deviation at 15, so IQ is a relative ranking within an age group.

Terman's views differed sharply from Binet's. Terman believed intelligence was largely innate and fixed. He launched a massive longitudinal study of gifted children (the "Termites"), tracking over 1,500 California children with IQs above 135 throughout their lives. The study produced valuable data but also reflected Terman's hereditarian assumptions.

Army testing and the misuse of IQ

During World War I, the U.S. Army administered intelligence tests to nearly two million draftees. The results appeared to show that recent immigrants from southern and eastern Europe scored lower than older immigrant groups from northern and western Europe, and that Black Americans scored lower than white Americans.

These results were used to argue for immigration restriction (the Immigration Act of 1924 cited them explicitly) and to support eugenic policies. The tests, however, were deeply flawed. They were written in English, penalizing non-native speakers. They relied on cultural knowledge specific to white, native-born, middle-class Americans — asking about tennis, Wild West shows, and products available in particular stores. People who had grown up in rural Poland or Sicily or the Jim Crow South were being tested on their familiarity with suburban American life, and the results were interpreted as measures of innate intelligence.

The psychologist Stephen Jay Gould documented this history in detail in The Mismeasure of Man (1981), showing how the testing reflected and reinforced social prejudices rather than measuring any biological reality.

Spearman and the g-factor

Charles Spearman observed in 1904 that scores on different mental ability tests tend to be positively correlated: people who do well on one test tend to do well on others. He called this general correlation g (for "general intelligence") and argued that it reflected a single underlying mental capacity.

Spearman used factor analysis, a statistical technique that identifies latent variables accounting for patterns of correlation among observed variables. A single factor (g) explained much of the shared variance across different tests. But Spearman's student L. L. Thurstone later argued that intelligence was better described by multiple primary mental abilities (verbal comprehension, numerical ability, spatial reasoning, word fluency, memory, perceptual speed, and reasoning) that were partially independent.

The g-vs-multiple-abilities debate continues. Most modern psychometricians accept that g is a real statistical phenomenon — test scores do intercorrelate — while disagreeing about what g means. Some interpret g as a unitary biological property of the brain (general processing speed or neural efficiency). Others see it as a statistical artifact that emerges whenever you test diverse cognitive abilities and look for common variance: people who are good at many things tend to be good at many things, but this does not prove there is a single underlying "intelligence organ."

Key concepts: theories of intelligence Intermediate

Gardner's multiple intelligences

Howard Gardner proposed in Frames of Mind (1983) that intelligence is not one thing but at least eight relatively independent faculties:

Linguistic intelligence — sensitivity to spoken and written language, the ability to learn languages. Writers, poets, lawyers.
Logical-mathematical intelligence — capacity for deductive reasoning, detecting patterns, scientific thinking. Mathematicians, scientists.
Spatial intelligence — ability to visualize and manipulate objects in space. Architects, surgeons, pilots.
Musical intelligence — skill in performing, composing, and appreciating musical patterns. Musicians, composers.
Bodily-kinesthetic intelligence — capacity to use the body to solve problems or create products. Athletes, dancers, surgeons.
Interpersonal intelligence — capacity to understand the intentions, motivations, and desires of others. Teachers, therapists, politicians.
Intrapersonal intelligence — capacity to understand oneself, one's feelings, fears, and motivations. Philosophers, psychologists.
Naturalistic intelligence — ability to recognize and classify plants, animals, and other natural objects. Biologists, farmers. (Added later.)

Gardner's theory is influential in education. It supports the idea that different students have different cognitive strengths and that curricula should address multiple modalities. It is also more inclusive than g-based theories: it recognizes cognitive abilities that are valued across cultures, not just the academic abilities measured by Western intelligence tests.

Critics note that Gardner's intelligences are difficult to test empirically. Gardner set criteria for what counts as an intelligence (evidence from brain damage, evolutionary history, developmental trajectories, psychometric evidence, and more), but these criteria are somewhat flexible. Some psychometricians argue that Gardner's intelligences are better understood as talents or cognitive styles than as separate intelligences, and that they actually correlate with each other — consistent with g. Gardner counters that the correlation is an artifact of the narrow range of abilities tested by standard IQ tests.

Sternberg's triarchic theory

Robert Sternberg proposed in Beyond IQ (1985) that intelligence has three components:

Analytical intelligence — the ability to analyze, compare, and evaluate. This is what standard IQ tests measure: solving well-defined problems with known correct answers.

Creative intelligence — the ability to generate novel ideas, deal with novel situations, and think outside established frameworks. This involves coping with new problems that do not resemble ones you have seen before.

Practical intelligence — the ability to adapt to, shape, and select environments. "Street smarts." This includes tacit knowledge — the unspoken rules of how things work that are not taught in school but are essential for success in real-world contexts.

Sternberg found that practical intelligence predicts job performance at least as well as IQ, and that the two are relatively independent. A person can be analytically brilliant but practically inept, or practically shrewd but academically mediocre.

Sternberg's theory, like Gardner's, broadens the concept of intelligence beyond what IQ tests capture. Both theories are compatible with the statistical reality of g while arguing that g is only part of the story.

The g-factor debate: what does IQ actually measure?

IQ tests reliably measure something. Test-retest reliability is high. Scores predict academic performance, job performance (in complex jobs), income, health outcomes, and longevity. The question is not whether IQ tests measure anything real — they do — but what that something is, and what it is not.

The psychometrician Arthur Jensen argued in a controversial 1969 article that IQ differences between Black and white Americans were substantially genetic in origin. This claim is addressed in detail in the next section. For now, note that the g-factor debate is not purely empirical — it involves questions about what intelligence is, how it should be defined, and whether a single number can capture cognitive ability.

The psychologist James Flynn documented a striking phenomenon: raw IQ scores increased steadily throughout the 20th century in every country where they were measured — about 3 points per decade. This Flynn effect is too rapid to be explained by genetic changes. It demonstrates that IQ scores are sensitive to environmental factors: nutrition, education, exposure to abstract reasoning tasks, and the cognitive demands of modern life all push scores upward. If IQ were a pure measure of innate ability, the Flynn effect would not exist.

Key experiment: race, IQ, and the evidence Intermediate

This section addresses the most contested topic in intelligence research directly and honestly. The goal is to present the evidence, distinguish what is known from what is not, and explain the consensus among researchers.

The observed gap

In the United States, average IQ scores differ between racial groups. The most commonly cited figure is that Black Americans score, on average, about 15 points lower than white Americans (approximately one standard deviation). This is a statistical fact about test scores. It is not in dispute.

What is in dispute is what this gap means — specifically, whether it reflects genetic differences between racial groups.

The hereditarian argument

A small number of researchers — most prominently Arthur Jensen (1969), Hans Eysenck, and later Philippe Rushton and Charles Murray (co-author of The Bell Curve, 1994) — have argued that the Black-white IQ gap is substantially genetic in origin. They cite several lines of evidence: the heritability of IQ within groups (twin studies show that IQ is highly heritable within white populations), the persistence of the gap across decades, and the fact that the gap is not fully explained by socioeconomic status.

Murray and Richard Herrnstein's The Bell Curve (1994) argued that IQ is an important predictor of social outcomes and that the Black-white IQ gap partly reflects genetic differences. The book was widely criticized for its methodology, its interpretation of the evidence, and its social policy implications.

The environmental argument and the scientific consensus

The overwhelming consensus among psychologists and geneticists is that there is no evidence for genetic racial differences in intelligence. The American Psychological Association's official task force on intelligence, convened in response to The Bell Curve, concluded in its 1996 report (Intelligence: Knowns and Unknowns):

"The differential between the mean intelligence test scores of Blacks and Whites [...] does not result from any obvious biases in test construction and administration, nor does it simply reflect differences in socio-economic status. Explanations based on factors of caste and culture may be important, and there is direct evidence for the role of genetic factors, but the evidence is not definitive."

The 2012 review by Nisbett, Aronson, Blair, Dickens, Flynn, Halpern, and Turkheimer — representing a broad range of leading researchers — stated the consensus more directly: "There is no evidence that the IQ difference between Blacks and Whites is genetic in origin." This review was published in American Psychologist, the flagship journal of the APA.

Several lines of evidence support the environmental explanation.

The Flynn effect. IQ scores have risen by roughly 3 points per decade for as long as they have been measured. If IQ reflected innate genetic capacity, this increase would not occur. The Flynn effect demonstrates that IQ scores are highly sensitive to environmental changes.

The narrowing of the gap. The Black-white IQ gap has narrowed substantially since the 1970s, from about 15 points to roughly 9-10 points, consistent with improvements in educational access, socioeconomic conditions, and reduced discrimination. A genetic explanation cannot account for a closing gap over a few decades.

Adoption studies. Children adopted from disadvantaged backgrounds into more affluent families show IQ gains of 12-18 points — roughly the size of the Black-white gap. A well-known study by Moore (1986) found that Black children adopted by white middle-class families had IQ scores comparable to the white average, not the Black average.

Stereotype threat. Claude Steele and Joshua Aronson demonstrated in 1995 that reminding Black students of their race before taking a test (by asking them to indicate their race on the test form) significantly reduced their performance. When the same test was presented without racial cues, the performance gap narrowed. Stereotype threat works by creating anxiety about confirming a negative stereotype, which consumes cognitive resources that would otherwise be available for the test. This finding has been replicated hundreds of times across many stereotyped groups: women performing worse on math tests when reminded of gender stereotypes, older adults performing worse on memory tests when reminded of age stereotypes, and white athletes performing worse on sports tasks when told the task measures "natural athletic ability" (activating the stereotype that Black athletes are naturally superior).

Stereotype threat is direct evidence that IQ test scores reflect social context, not just cognitive ability. It demonstrates that a significant portion of the observed gap can be produced purely by the testing situation itself.

Test bias. IQ tests were developed and normed on white, middle-class, English-speaking populations. Many test items draw on culturally specific knowledge and reasoning styles. The psychologist Arthur Wicherts and colleagues have shown that test items often function differently across cultural groups — an item may be measuring something different in one cultural context than in another.

Heritability within groups does not imply heritability between groups

This is the most common logical error in the race-and-IQ debate, and it requires careful explanation.

Heritability is a population statistic. It measures the proportion of variance in a trait within a defined population that is attributable to genetic variation within that population. If heritability of IQ within white Americans is 0.6, that means 60% of the variation in IQ scores among white Americans is associated with genetic differences among those individuals.

Heritability within a group says nothing about the cause of differences between groups. The classic example: within a population, height is highly heritable — tall parents tend to have tall children. But the height difference between Americans today and Americans in 1800 is almost entirely environmental (better nutrition). High within-group heritability does not prevent between-group differences from being environmental.

This logical point was made forcefully by the geneticist Richard Lewontin in 1970 and remains a cornerstone of the scientific argument: even if IQ is highly heritable within racial groups, the IQ difference between racial groups could be entirely environmental.

Intelligence as a cultural construct

What counts as "intelligent" varies across cultures. Western psychology emphasizes analytical reasoning, abstract problem-solving, and speed of cognitive processing — the skills measured by IQ tests. Many cultures value other cognitive abilities more highly: social intelligence, practical problem-solving, oral tradition and memorization, ecological knowledge, or artistic and musical skill.

The anthropologist Adrian C. "Enrique" Garcia has documented that many Indigenous American communities understand intelligence as a social and practical capacity — the ability to contribute to the community, solve practical problems collaboratively, and maintain harmonious relationships — rather than as an individual, abstract, analytical capacity. Similar observations have been made across African, Asian, and Pacific Islander cultures.

IQ tests measure a particular kind of cognitive ability — the kind most valued in Western, industrialized, literate societies. This is not a criticism of IQ tests per se; the abilities they measure are genuinely useful in those contexts. But it is a reason to be cautious about generalizing from IQ scores to "intelligence" in any universal sense.

Why this matters

The misuse of intelligence testing has caused real harm. Eugenics programs in the United States forcibly sterilized tens of thousands of people classified as "feeble-minded" on the basis of IQ tests, disproportionately targeting poor people, people of color, and immigrants. The Immigration Act of 1924 used intelligence testing data to justify quotas that blocked Jewish refugees fleeing Nazi Germany. In education, IQ testing has been used to track children into dead-end programs, limiting their opportunities based on a single score.

Honest coverage of this topic requires acknowledging both what IQ tests do well (they predict academic and occupational outcomes within populations, they identify cognitive disabilities, they provide useful data for research) and what they do not do (they do not measure innate biological intelligence, they do not compare the genetic potential of racial groups, they do not capture the full range of human cognitive abilities).

Exercises Intermediate

Exercise (medium, short answer).

Explain the difference between the strong form and the weak form of the Sapir-Whorf hypothesis. Give one piece of empirical evidence that supports the weak form.

Hint

The strong form makes a claim about what is possible to think. The weak form makes a claim about what is easy or natural to think.

Answer

The strong form (linguistic determinism) claims that language determines thought — that people cannot think thoughts their language does not permit. The weak form (linguistic relativity) claims that language influences thought — that habitual linguistic patterns make some ways of thinking easier or more natural without making others impossible.

Evidence for the weak form includes: Russian speakers' faster discrimination of blue shades across the goluboy/siniy boundary (Winawer et al., 2007); Kuuk Thaayorre speakers' superior spatial orientation using cardinal directions (Boroditsky, 2011); and cross-linguistic differences in time metaphors shaping temporal reasoning (Boroditsky, 2001). The strong form is contradicted by bilingual thinking, pre-linguistic cognition in infants, and non-human animal cognition.

Exercise (medium, short answer).

A researcher finds that IQ is 0.6 heritable within Population A and 0.6 heritable within Population B. The mean IQ of Population A is 105 and the mean IQ of Population B is 90. Explain why this finding does not establish that the difference between the two populations is genetic.

Hint

Think about the definition of heritability. What population does the 0.6 refer to? Does it say anything about between-group differences?

Answer

Heritability is a within-group statistic. A heritability of 0.6 within Population A means that 60% of the variation among individuals within Population A is associated with genetic differences among those individuals. It says nothing about the cause of the between-group difference.

The classic counterexample: height is highly heritable within both 1800s Americans and modern Americans, yet the height difference between these groups is almost entirely environmental (better nutrition). High within-group heritability is compatible with entirely environmental between-group differences. To establish a genetic cause for the between-group difference, you would need direct genetic evidence — which does not exist for the IQ gap between any racial groups.

Exercise (hard, essay).

Describe the phenomenon of stereotype threat (Steele & Aronson, 1995). Explain how it provides evidence that the observed IQ gap between racial groups in the United States is partly an artifact of the testing situation rather than a pure measure of cognitive ability. What are the implications for how intelligence tests should be administered and interpreted?

Hint

Consider what stereotype threat reveals about the relationship between social context and cognitive performance. If simply reminding someone of their group identity changes their test score, what does that tell us about what the test is measuring?

Answer

Stereotype threat occurs when individuals at risk of confirming a negative stereotype about their group experience anxiety and cognitive load that impair performance in the stereotyped domain. Steele and Aronson (1995) showed that Black students performed worse on a verbal test when it was presented as a measure of intellectual ability (activating the stereotype) compared to when the same test was presented as a problem-solving task unrelated to ability. The performance gap narrowed or disappeared when stereotype threat was removed.

This demonstrates that a significant portion of the observed Black-white IQ gap may be an artifact of the testing situation itself. When Black test-takers sit for an intelligence test, the social context — a history of stereotypes about Black intelligence, awareness of the racial IQ gap, fear of confirming the stereotype — creates a cognitive burden that reduces performance. This is not a measure of actual cognitive ability but of cognitive ability under social pressure.

Implications: (1) IQ test scores should not be interpreted as pure measures of innate intelligence, especially across racial groups. (2) Test administration can be modified to reduce stereotype threat — for example, by presenting tests as diagnostic of current skill rather than innate ability, or by having test-takers write about personal values before the test (a "self-affirmation" intervention that has been shown to reduce the gap). (3) The existence of stereotype threat strengthens the environmental explanation for racial IQ gaps and undermines claims that the gap is primarily genetic.

Emotional intelligence Master

Peter Salovey and John Mayer introduced the concept of emotional intelligence (EI) in 1990, defining it as the ability to perceive, use, understand, and manage emotions. Daniel Goleman's 1995 bestselling book popularized the concept and made broad claims about its importance for life success.

The Mayer-Salovey-Caruso model identifies four branches, arranged from basic to advanced:

Perceiving emotions — detecting emotions in oneself and others through facial expressions, voice tone, and body language.
Using emotions to facilitate thought — harnessing emotions to support reasoning. A positive mood can enhance creative thinking; a neutral or slightly negative mood can sharpen analytical thinking.
Understanding emotions — recognizing how emotions combine, change over time, and transition from one to another. Understanding that irritation can escalate to anger, that relief often follows anxiety.
Managing emotions — regulating emotions in oneself and others. The ability to calm down when angry, cheer up when discouraged, or help another person manage their emotional state.

Emotional intelligence is measured through ability-based tests (such as the MSCEIT, which presents emotion-related problems with objectively scored answers) and self-report questionnaires (such as the EQ-i). The two approaches measure somewhat different things — ability-based EI shows modest correlations with cognitive intelligence, while self-report EI correlates more strongly with personality traits like agreeableness and emotional stability.

The predictive validity of EI is debated. A meta-analysis by O'Boyle, Humphrey, Pollack, Hawver, and Story (2011) found that EI predicted job performance modestly, above and beyond cognitive ability and personality, but the effect sizes were smaller than claimed in popular accounts. EI appears to be most important in jobs that require substantial interpersonal interaction — management, therapy, teaching, sales.

The broader significance of emotional intelligence is conceptual: it extends the domain of "intelligence" beyond the cognitive abilities measured by IQ tests, consistent with Gardner's interpersonal and intrapersonal intelligences and Sternberg's practical intelligence. Whether EI is truly a form of intelligence or a set of personality traits and social skills remains debated.

The heritability of intelligence: quantitative details Master

The heritability of IQ is not a fixed number. It increases across the lifespan, from roughly 0.20 in early childhood to roughly 0.60 in adulthood and as high as 0.80 in late adulthood. This counterintuitive finding — that genes matter more as people age — is explained by the fact that people increasingly select environments that match their genetic propensities. A child with a genetic predisposition toward intellectual curiosity may be exposed to books by parents, but as an adult, that same person actively seeks out intellectual environments. The genetic influence is amplified by self-selected environments, a phenomenon called gene-environment correlation.

Twin studies compare the IQ correlation between monozygotic (identical) twins, who share nearly 100% of their genes, and dizygotic (fraternal) twins, who share roughly 50%. If IQ were entirely genetic, monozygotic twins would correlate at 1.0 and dizygotic twins at 0.5. The actual correlations are approximately 0.85 for monozygotic twins raised together, 0.75 for monozygotic twins raised apart, and 0.60 for dizygotic twins raised together. These figures support substantial heritability but also substantial environmental influence.

Adoption studies complicate the picture further. The Texas Adoption Study found that adopted children's IQs correlated more strongly with their biological mothers (whom they had never met) than with their adoptive mothers. But the Minnesota Study of Twins Reared Apart found that monozygotic twins raised in different families still had highly correlated IQs — while also showing the influence of the adoptive family environment on specific cognitive abilities.

The key insight from behavioral genetics is that heritability is not destiny. Even a trait with heritability of 0.80 is still 20% environmental, and environmental interventions can produce large changes. The Flynn effect — IQ gains of 15-20 points in a single generation — demonstrates that environmental changes can produce effect sizes comparable to the Black-white gap within a few decades, far faster than any genetic change could occur.

Eric Turkheimer and colleagues (2003) demonstrated that heritability of IQ depends on socioeconomic status. In impoverished families, heritability of IQ is near zero — virtually all variation is explained by shared environment. In affluent families, heritability is substantial. This means that the "nature vs. nurture" question has no single answer: the relative contribution of genes and environment depends on the range of environments in the population being studied. In environments where basic needs are unmet, environment dominates. In environments where basic needs are met, genetic differences have more room to express themselves.

Advanced topics: AI and intelligence Master

The relationship between artificial intelligence and human intelligence raises fundamental questions about what intelligence is, whether it is uniquely biological, and what it means to "measure" intelligence in a machine.

The Turing Test and its limitations

Alan Turing proposed in 1950 that the question "can machines think?" should be replaced by a practical test: if a human judge, conversing by text with a machine and a human, cannot reliably tell which is which, the machine should be considered intelligent. The Turing Test has been influential but heavily criticized.

The philosopher John Searle argued with his Chinese Room thought experiment (1980) that passing the Turing Test does not demonstrate understanding. A person in a room who follows instructions to manipulate Chinese symbols can produce convincing Chinese responses without understanding a word of Chinese. The room passes the Turing Test for Chinese, but there is no understanding — only symbol manipulation. Searle argued that computers, which manipulate symbols syntactically without semantics, are in the same position.

The Turing Test also reflects a culturally specific, language-centric view of intelligence. It tests the ability to produce linguistically appropriate responses — a subset of what humans consider intelligent behavior.

What AI systems actually do

Modern AI systems — large language models, image recognition systems, game-playing programs — demonstrate impressive performance on specific tasks. GPT-4 scores above the 90th percentile on the bar exam, above the 99th percentile on the verbal GRE, and performs well on many standardized intelligence tests. AlphaGo and its successors defeated the world's best Go players. DeepMind's AlphaFold solved the protein structure prediction problem.

But these systems lack many features that psychologists consider central to intelligence: they do not have goals, desires, or motivations; they do not generalize flexibly across domains without retraining; they do not learn efficiently from small amounts of data the way humans do; and they do not have the embodied, social, and emotional intelligence that Gardner and Sternberg emphasize.

The gap between narrow AI performance on specific tasks and general intelligence (sometimes called AGI) remains large. Whether this gap will be closed by scaling current approaches, by new architectures, or not at all is an open question.

Implications for intelligence theory

AI research has practical implications for the study of human intelligence. If machines can achieve high performance on IQ tests without possessing anything that most researchers would call "general intelligence," this raises questions about what IQ tests actually measure. It suggests that the skills assessed by IQ tests — pattern recognition, logical reasoning, vocabulary knowledge — can be performed by systems that lack consciousness, understanding, or genuine cognitive flexibility.

Conversely, the things humans find easy — recognizing a face, walking across a cluttered room, understanding a joke, reading social cues — have proven far more difficult for AI than the abstract reasoning tasks that IQ tests emphasize. This supports the arguments of Gardner, Sternberg, and others that intelligence is broader and more diverse than what IQ tests capture.

Connections to other fields Master

Philosophy of mind

The study of cognition connects directly to the philosophy of mind. The question of what thought is — a computational process, a biological phenomenon, an embodied and embedded activity — has been debated since Descartes. Functionalism (the dominant position in philosophy of mind) holds that mental states are defined by their functional role, not by their physical substrate — which is compatible with the idea that machines could, in principle, think. Embodied cognition challenges this, arguing that cognition depends on having a body interacting with a physical environment, not just on symbol manipulation.

The heuristics-and-biases program has philosophical implications for epistemology. If human reasoning is systematically biased, this raises questions about the reliability of human knowledge and the nature of rationality. Some philosophers (notably Keith Stanovich) distinguish between rationality (the ability to form true beliefs and make good decisions) and intelligence (the cognitive capacity measured by IQ tests). On this view, high intelligence does not guarantee rationality, because System 2 can be used to rationalize biases rather than overcome them.

Economics and decision science

Prospect theory transformed economics by providing a descriptively accurate model of human decision-making under risk. Behavioral economics — built on the foundations laid by Tversky and Kahneman — studies how actual human economic behavior deviates from the predictions of rational-choice models. Richard Thaler and Cass Sunstein's Nudge (2008) applied behavioral economics to policy, arguing that governments can improve outcomes by designing choice architectures that account for cognitive biases.

Neuroscience

Cognitive neuroscience uses brain imaging (fMRI, EEG, PET) to study the neural correlates of cognitive processes. Research on working memory (Baddeley's model), attentional networks (Posner), and decision-making (the role of the prefrontal cortex and the dopamine system) connects psychological theories of cognition to underlying brain mechanisms. The discovery that the brain's default mode network is active during rest and mind-wandering has reshaped understanding of creativity and insight.

Education

Gardner's multiple intelligences, Sternberg's triarchic theory, and research on stereotype threat have all influenced educational practice. Understanding that intelligence is multifaceted supports differentiated instruction. Understanding stereotype threat supports interventions to close achievement gaps. Understanding heuristics and biases supports critical thinking education.

Historical and philosophical context Master

The prehistory of intelligence testing

The desire to measure mental ability predates psychology. Imperial China used civil service examinations for over a thousand years, testing knowledge of Confucian classics, poetry composition, and policy analysis. These examinations were meritocratic in principle but reflected the cultural and educational capital of the scholar-gentry class.

In the nineteenth century, Francis Galton — Charles Darwin's cousin — established the eugenics movement and attempted to measure intelligence through sensory acuity (reaction time, visual discrimination, hearing sensitivity). Galton's approach failed: sensory measures did not correlate well with mental ability. But his statistical innovations (correlation, regression to the mean) became foundational for the field.

Galton's eugenics program — improving the human species through selective breeding — directly influenced the development of intelligence testing. Binet's test was a practical educational tool, but its American adapters (Terman, Henry Goddard, Robert Yerkes) brought strong hereditarian and eugenic convictions to its use. Goddard testified before Congress advocating immigration restrictions based on IQ testing of arrivals at Ellis Island. Yerkes oversaw the Army testing program that produced the data used to justify the Immigration Act of 1924.

The cognitive revolution

The study of cognition was suppressed in American psychology during the behaviorist era (roughly 1913-1956), when John B. Watson and B. F. Skinner argued that psychology should study only observable behavior, not internal mental processes. The "cognitive revolution" of the 1950s and 1960s — driven by George Miller, Jerome Bruner, Noam Chomsky, Allen Newell, and Herbert Simon — restored the study of mental processes to scientific respectability.

Chomsky's critique of Skinner's Verbal Behavior (1959) was a turning point. Chomsky argued that language could not be explained by stimulus-response associations alone — the creativity and generativity of language required innate cognitive structures. This argument opened the door to studying internal mental representations, which became the foundation of cognitive psychology.

The culture wars around intelligence

The intelligence debate has been repeatedly weaponized in political contexts. Jensen's 1969 article in the Harvard Educational Review was used to argue against compensatory education programs (like Head Start) on the grounds that IQ differences were genetic and therefore not remediable by environmental intervention. The Bell Curve (1994) was used to argue against affirmative action and social welfare programs.

These political uses of intelligence research have been criticized on multiple grounds. First, the scientific evidence does not support the conclusions being drawn (as documented above). Second, even if IQ differences were partly genetic, this would not justify policies that deny people opportunity — heritability within groups says nothing about what interventions can accomplish. Third, the framing of the debate in terms of "racial differences in intelligence" treats race as a biological category, when modern genetics shows that racial categories are social constructs that do not map neatly onto genetic variation.

The Human Genome Project confirmed that there is more genetic variation within any racial group than between racial groups. The concept of "race" used in the IQ debate — Black, white, Hispanic, Asian — captures a tiny fraction of human genetic diversity. These categories reflect social and historical processes (slavery, colonialism, immigration patterns), not biological boundaries.

Bibliography Master

Primary sources

Binet, A. & Simon, T. (1905). Methodes nouvelles pour le diagnostic du niveau intellectuel des anormaux. L'Annee Psychologique, 11, 163-191.
Spearman, C. (1904). "General Intelligence," Objectively Determined and Measured. American Journal of Psychology, 15, 201-292.
Terman, L. M. (1916). The Measurement of Intelligence. Boston: Houghton Mifflin.
Thurstone, L. L. (1938). Primary Mental Abilities. Chicago: University of Chicago Press.
Tversky, A. & Kahneman, D. (1974). Judgment under Uncertainty: Heuristics and Biases. Science, 185, 1124-1131.
Tversky, A. & Kahneman, D. (1979). Prospect Theory: An Analysis of Decision under Risk. Econometrica, 47, 263-292.
Gardner, H. (1983). Frames of Mind: The Theory of Multiple Intelligences. New York: Basic Books.
Sternberg, R. J. (1985). Beyond IQ: A Triarchic Theory of Human Intelligence. Cambridge: Cambridge University Press.
Steele, C. M. & Aronson, J. (1995). Stereotype Threat and the Intellectual Test Performance of African Americans. Journal of Personality and Social Psychology, 69, 797-811.
Salovey, P. & Mayer, J. D. (1990). Emotional Intelligence. Imagination, Cognition and Personality, 9, 185-211.

Secondary sources and reviews

Kahneman, D. (2011). Thinking, Fast and Slow. New York: Farrar, Straus and Giroux.
Neisser, U. et al. (1996). Intelligence: Knowns and Unknowns. American Psychologist, 51, 77-101.
Nisbett, R. E. et al. (2012). Intelligence: New Findings and Theoretical Developments. American Psychologist, 67, 130-159.
Gould, S. J. (1981). The Mismeasure of Man. New York: W. W. Norton.
Turkheimer, E. et al. (2003). Socioeconomic Status Modifies Heritability of IQ in Young Children. Psychological Science, 14, 623-628.
Flynn, J. R. (2007). What Is Intelligence? Cambridge: Cambridge University Press.
Gigerenzer, G. (2007). Gut Feelings: The Intelligence of the Unconscious. New York: Viking.
Boroditsky, L. (2011). How Language Shapes Thought. Scientific American, 304, 62-65.
Winawer, J. et al. (2007). Russian Blues Reveal Effects of Language on Color Discrimination. Proceedings of the National Academy of Sciences, 104, 7780-7785.
Stanovich, K. E. (2010). Rationality and the Reflective Mind. Oxford: Oxford University Press.
Whorf, B. L. (1956). Language, Thought, and Reality. Cambridge: MIT Press.

Prerequisites

29.04.01 pending

Tier anchors

beginner: Kahneman, Thinking, Fast and Slow; any intro cognitive psychology textbook
intermediate: Kahneman & Tversky collected works; Sternberg, Handbook of Intelligence; Gardner, Frames of Mind
master: primary sources: Binet 1905, Terman 1916, Spearman 1904, Thurstone 1938, Kahneman & Tversky 1974/1979/1982, Gardner 1983, Sternberg 1985, Steele & Aronson 1995, Nisbett et al. 2012

References

Kahneman, D. — Thinking, Fast and Slow (Farrar, Straus and Giroux, 2011) · Part I: Systems 1 and 2; Part III: Overconfidence
Kahneman, D. & Tversky, A. — Judgment under Uncertainty: Heuristics and Biases, Science 185, 1124-1131 (1974) · pp. 1124-1131
Tversky, A. & Kahneman, D. — Prospect Theory: An Analysis of Decision under Risk, Econometrica 47, 263-292 (1979) · pp. 263-292
Gardner, H. — Frames of Mind: The Theory of Multiple Intelligences (Basic Books, 1983; 10th anniversary ed. 1993) · Ch. 1-4, 10-11
Sternberg, R. J. — Beyond IQ: A Triarchic Theory of Human Intelligence (Cambridge University Press, 1985) · Ch. 1-5
Steele, C. M. & Aronson, J. — Stereotype Threat and the Intellectual Test Performance of African Americans, Journal of Personality and Social Psychology 69, 797-811 (1995) · pp. 797-811
Nisbett, R. E. et al. — Intelligence: New Findings and Theoretical Developments, American Psychologist 67(2), 130-159 (2012) · full article
Spearman, C. — 'General Intelligence,' Objectively Determined and Measured, American Journal of Psychology 15, 201-292 (1904) · pp. 201-292
Binet, A. & Simon, T. — Methodes nouvelles pour le diagnostic du niveau intellectuel des anormaux (1905) · full article
Terman, L. M. — The Measurement of Intelligence (Houghton Mifflin, 1916) · Ch. 1-3
Whorf, B. L. — Language, Thought, and Reality (MIT Press, 1956) · selected essays
Salovey, P. & Mayer, J. D. — Emotional Intelligence, Imagination, Cognition and Personality 9(3), 185-211 (1990) · pp. 185-211

Estimated time

beginner: 25m
intermediate: 50m
master: 75m