Cognition and intelligence: thinking, reasoning, and the measurement of mind
Anchor (Master): primary sources: Binet 1905, Terman 1916, Spearman 1904, Thurstone 1938, Kahneman & Tversky 1974/1979/1982, Gardner 1983, Sternberg 1985, Steele & Aronson 1995, Nisbett et al. 2012
Intuition Beginner
You make thousands of decisions every day. Most take less than a second. Should you bring an umbrella? Is that person friendly or hostile? Is this a good time to cross the street? Your brain handles these questions without breaking stride, and most of the time the answers are good enough.
But sometimes they are not. You overestimate how long a project will take. You stick with a bad restaurant because you already waited thirty minutes for a table. You assume a articulate person must also be knowledgeable. You remember dramatic plane crashes and forget the routine car accidents, so you fear flying more than driving even though driving is statistically far more dangerous.
These errors are not random. They are systematic, predictable, and surprisingly universal. In the 1970s, two Israeli psychologists — Amos Tversky and Daniel Kahneman — began documenting them. They showed that human thinking relies on mental shortcuts, called heuristics, that work well most of the time but produce characteristic mistakes, called biases, when they misfire. This discovery reshaped psychology, economics, medicine, and public policy. Kahneman won the Nobel Prize in Economics in 2002 (Tversky had died in 1996 and could not share it).
This unit covers two large territories. Cognition is the study of how people think: how they reason, decide, solve problems, use language, and create new ideas. Intelligence is the study of how cognitive ability varies between people, how it is measured, and what those measurements mean — and what they do not mean.
Both territories are contested. What counts as "intelligent" depends on who is asking and why. The history of intelligence testing is tangled with eugenics, scientific racism, and the misuse of science to justify inequality. Untangling it requires looking at the evidence honestly — which this unit does.
Start with a basic distinction. Kahneman describes two modes of thinking. System 1 is fast, automatic, and intuitive. It produces answers without conscious effort: you see a face and instantly know it is angry; you hear "two plus two" and "four" pops into your head. System 2 is slow, deliberate, and effortful. It activates when you multiply 27 by 43, compare two mortgage offers, or follow a complex argument.
The interaction between these two systems drives most of the phenomena in this unit. System 1 generates quick answers. System 2 can override them, but often does not — because System 2 is lazy, or because System 1's answer feels right.
Visual Beginner
The landscape of cognition and intelligence can be mapped as a tree with three major branches, each feeding into the central question of what it means to be a thinking being.
The diagram simplifies a complex field. In reality, the branches overlap: language affects reasoning, intelligence affects problem-solving, and the tools used to measure intelligence are themselves products of particular cultural and linguistic traditions.
Worked example Beginner
Consider the following problem, adapted from Tversky and Kahneman's experiments.
Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.
Which is more probable?
A. Linda is a bank teller. B. Linda is a bank teller and is active in the feminist movement.
Most people choose B. But B cannot be more probable than A, because the set of "bank tellers who are feminists" is a subset of the set of "bank tellers." Adding a detail can only reduce the probability, never increase it.
This is the conjunction fallacy. It occurs because System 1 matches Linda's description to a stereotype — feminist activist — and evaluates how well each option fits that stereotype, rather than computing the actual probabilities. Option B fits the description better, but it is less probable.
Tversky and Kahneman found that even people trained in statistics commit this error. The conjunction fallacy is not a sign of stupidity. It is a signature of how human cognition works: we judge by resemblance, not by calculation.
Check your understanding Beginner
Formal definition Intermediate
Cognition refers to the mental processes involved in acquiring knowledge and understanding, including perception, attention, memory, reasoning, judgment, problem-solving, decision-making, and language production and comprehension. The term comes from the Latin cognoscere, "to know."
Intelligence is a more contested term. A consensus definition endorsed by 52 leading researchers in a 1996 editorial in the Wall Street Journal states:
Intelligence is a very general mental capability that, among other things, involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience. It is not merely book learning, a narrow academic skill, or test-taking smarts. Rather, it reflects a broader and deeper capability for comprehending our surroundings — "catching on," "making sense" of things, or "figuring out" what to do.
This definition is useful but not uncontroversial. Different cultures emphasize different cognitive abilities. The psychologist Robert Sternberg has argued that Western psychology defines intelligence too narrowly, focusing on analytical and academic skills while neglecting practical and creative intelligence. Howard Gardner has argued that intelligence is not one thing but several, each relatively independent.
The key methodological distinction in intelligence research is between psychometric approaches (which measure performance on standardized tests and analyze the statistical structure of test scores) and cognitive approaches (which study the mental processes underlying test performance). Psychometric approaches ask: how much intelligence does a person have, and how is it structured? Cognitive approaches ask: what mental processes produce intelligent behavior?
Heuristics are mental shortcuts — simple rules or strategies that reduce the cognitive load of decision-making. They operate by ignoring some available information, trading optimality for speed. The psychologist Gerd Gigerenzer has argued that heuristics are not irrational shortcuts but adaptive tools — "fast and frugal" strategies that exploit the structure of natural environments. Tversky and Kahneman emphasized the conditions under which heuristics lead to systematic errors. Both perspectives are supported by evidence; they differ in emphasis.
Cognitive biases are systematic deviations from normative standards of reasoning. A cognitive bias is not a random error — it is a predictable tendency to err in a particular direction. The word "bias" does not imply prejudice in the social sense (though cognitive biases can contribute to social prejudices); it refers to a statistical bias in judgment.
Stereotype threat, defined by Claude Steele and Joshua Aronson in 1995, is the risk of confirming a negative stereotype about one's group, which creates anxiety and cognitive load that impair performance on the stereotyped domain.
Key concepts: heuristics and biases Intermediate
Tversky and Kahneman identified three classic heuristics in their 1974 paper "Judgment under Uncertainty: Heuristics and Biases," published in Science. Each heuristic produces a characteristic family of biases.
The availability heuristic
People estimate the frequency or probability of an event by how easily examples come to mind. Events that are vivid, recent, or heavily covered by media are easier to recall, so they seem more common than they actually are.
After the September 11 attacks, many Americans chose to drive rather than fly. Driving is statistically far more dangerous than flying, but terrorist attacks are vivid and memorable, while car accidents are routine and invisible. The result: an estimated 1,595 additional road deaths in the year following the attacks — six times the number of passengers killed on the four hijacked planes, according to an analysis by Gerd Gigerenzer.
The availability heuristic also explains why people overestimate the prevalence of dramatic causes of death (homicide, tornadoes, shark attacks) and underestimate common ones (heart disease, stroke, diabetes). Newspaper coverage is not proportional to actual risk.
The representativeness heuristic
People judge the probability that A belongs to category B by how much A resembles the typical or stereotypical member of B. This shortcut works when representativeness correlates with actual probability, but it leads to systematic errors when it does not.
The base rate fallacy is one consequence. Suppose a disease affects 1 in 1,000 people, and a test for it is 99% accurate (both sensitivity and specificity). You test positive. What is the probability you actually have the disease?
Most people say around 99%. The actual answer is about 9%. Out of 1,000 people tested, roughly 1 has the disease and tests positive. But about 10 healthy people (1% of 999) will also test positive (false positives). So you have 11 positive tests, only 1 of which is a true positive: 1/11 is approximately 9%. The base rate — how rare the disease is — matters enormously, but representativeness (the test is "99% accurate") makes people ignore it.
The gambler's fallacy is another consequence. After observing a run of red on a roulette wheel, people expect black to "be due." Each spin is independent, but the representativeness heuristic expects outcomes to look random in the short run, not just in the long run.
Anchoring and adjustment
People make estimates by starting from an initial value (the anchor) and adjusting up or down. The adjustment is typically insufficient — people are "dragged" toward the anchor even when the anchor is plainly arbitrary.
In one famous experiment, Kahneman and Tversky asked participants to estimate the percentage of African countries in the United Nations. Those who first saw a random spin of a wheel landing on 10 guessed, on average, 25%. Those who saw the wheel land on 65 guessed, on average, 45%. The arbitrary anchor shifted estimates by 20 percentage points.
Anchoring affects real-world negotiations, pricing, sentencing in courts, and medical diagnoses. It works even when people know about it. Knowing about a bias provides some protection, but not immunity.
Prospect theory
Tversky and Kahneman's 1979 paper "Prospect Theory: An Analysis of Decision under Risk" challenged the standard economic assumption that people are rational utility maximizers. Prospect theory describes how people actually evaluate risky choices, using two key ideas.
Reference dependence. People evaluate outcomes as gains or losses relative to a reference point (usually the status quo), not in absolute terms. A salary increase from 55,000 feels like a gain. A salary decrease from 55,000 feels like a loss — even though the final amount is the same.
Loss aversion. Losses loom larger than equivalent gains. Losing 100 feels good. This asymmetry explains the endowment effect: people demand more money to give up an object they own than they would pay to acquire the same object. Ownership shifts the reference point, and giving something up feels like a loss.
Prospect theory also captures the certainty effect: people overweight outcomes that are certain relative to outcomes that are merely probable. Given a choice between a certain 1,000 (expected value: 500 and a 50% chance of losing $1,000, most people choose the gamble — they are risk-seeking in the domain of losses.
Key concepts: problem-solving and creativity Intermediate
Problem-solving
Problem-solving involves moving from a current state to a goal state when the path is not immediately obvious. Psychologists distinguish several strategies.
Algorithms are step-by-step procedures that guarantee a solution if one exists. Searching systematically through every possible combination lock code is an algorithm. The weakness: algorithms can be prohibitively slow for large problem spaces.
Heuristics (in the problem-solving sense, distinct from judgment heuristics above) include means-end analysis (identify the difference between the current state and the goal, then find an operation that reduces that difference) and working backward (start from the goal and work back to the current state). Newell and Simon's General Problem Solver (1958) modeled human problem-solving using means-end analysis.
Insight is the sudden, unexpected solution to a problem — the "aha" moment. Classic insight problems include the nine-dot problem (connect a 3x3 grid of dots with four straight lines without lifting the pen) and the candle problem (attach a candle to a wall using only a box of thumbtacks and a book of matches, where the box must be recognized as a platform rather than merely a container). Insight problems often require restructuring — seeing the problem in a fundamentally new way.
Functional fixedness is the tendency to see objects only in terms of their typical function. In the candle problem, the box is seen as a container for thumbtacks, not as a potential shelf for the candle. Overcoming functional fixedness requires suppressing the automatic association between object and function.
Creativity
Creativity is the production of ideas or products that are both novel and appropriate to the goal. It is not limited to the arts. Scientific theories, engineering solutions, and social innovations can all be creative.
J. P. Guilford distinguished divergent thinking (generating many possible solutions) from convergent thinking (narrowing down to the best solution). Standard intelligence tests primarily measure convergent thinking. Divergent thinking is assessed through tasks like "list as many uses for a brick as you can" — where fluency (number of ideas), flexibility (variety of categories), and originality (unusualness of ideas) are all scored.
Wallace's four-stage model of creativity (1926) describes preparation (conscious work on the problem), incubation (setting the problem aside while unconscious processing continues), illumination (the sudden insight), and verification (checking that the insight actually works). This model is descriptive rather than explanatory, but it captures a widely reported phenomenological pattern.
The relationship between creativity and intelligence is debated. Threshold theory holds that above an IQ of about 120, intelligence and creativity become largely independent — suggesting that a minimum level of cognitive ability is necessary for creativity, but additional intelligence does not necessarily make you more creative.
Key concepts: language and thought Intermediate
The Sapir-Whorf hypothesis
The idea that the language you speak shapes the way you think is associated with Edward Sapir and his student Benjamin Lee Whorf, though neither used the term "Sapir-Whorf hypothesis" themselves. The hypothesis exists in two forms.
The strong form (linguistic determinism) holds that language determines thought — that you literally cannot think thoughts that your language does not permit. This version is largely rejected by modern linguists and psychologists. It is contradicted by the existence of bilingual individuals who report thinking differently in each language (implying they can entertain thoughts from both linguistic systems) and by the fact that infants and non-human animals show cognitive abilities before or without language.
The weak form (linguistic relativity) holds that language influences thought — that the habitual use of a particular language makes some ways of thinking easier or more natural, without making others impossible. This version has substantial empirical support.
Evidence for linguistic relativity
Spatial reasoning. Some Australian Aboriginal languages, such as Guugu Yimithirr and Kuuk Thaayorre, use absolute cardinal directions (north, south, east, west) rather than egocentric relative directions (left, right, front, back). Speakers of these languages maintain an extraordinary sense of orientation — they can point to north accurately even in an unfamiliar enclosed room. The language requires constant spatial tracking, and speakers develop the cognitive ability to do it.
Color perception. Languages differ in how they partition the color spectrum. Russian has separate basic terms for light blue (goluboy) and dark blue (siniy), where English uses a single term "blue." Russian speakers are faster at discriminating shades of blue that cross the goluboy/siniy boundary than shades that fall within one category, suggesting that the linguistic distinction sharpens perceptual discrimination at the category boundary.
Time. English speakers tend to gesture leftward when talking about the past and rightward when talking about the future. Aymara speakers (in the Andes) do the opposite: the past is in front (because it has been seen) and the future is behind (because it has not). Mandarin speakers sometimes gesture downward for the past and upward for the future, consistent with Mandarin's vertical time metaphors. These patterns suggest that spatial metaphors for time, encoded in language, shape temporal reasoning.
Counterfactual reasoning. Chinese languages mark counterfactuals less explicitly than English (Chinese does not have a grammatical subjunctive mood). In a classic but contested study, Alfred Bloom (1981) found that Chinese speakers had more difficulty reasoning with counterfactual scenarios presented in Chinese than English speakers had with the same scenarios in English. The study sparked a long debate; some replication attempts supported it, others did not. The current consensus is that language may facilitate certain types of reasoning without making them impossible.
Limits of linguistic relativity
Linguistic relativity is real but modest in its effects. Language is one of many factors that shape cognition, alongside culture, experience, biology, and individual variation. The strongest claims of linguistic determinism are not supported, but the weakest claims — that language influences habitual patterns of thought — are well-established.
Key concepts: intelligence testing — history Intermediate
Binet and the origins of intelligence testing
Alfred Binet developed the first practical intelligence test in France in 1905, commissioned by the French Ministry of Education to identify children who needed educational support. Binet's test assessed a range of cognitive skills — memory, attention, reasoning — through age-graded tasks. A child's performance was expressed as a mental age: the age at which the average child would perform comparably.
Binet was explicit about the limitations of his test. He believed intelligence was multifaceted and influenced by environment, not fixed at birth. He warned against using test scores to label children permanently. He intended the test as a practical tool for identifying children who needed help, not as a measure of innate worth.
Terman and the Stanford-Binet
Lewis Terman at Stanford adapted Binet's test for American use, publishing the Stanford-Binet Intelligence Scales in 1916. Terman introduced the Intelligence Quotient (IQ), originally defined as mental age divided by chronological age, multiplied by 100. A child with a mental age of 10 and a chronological age of 8 would have an IQ of 125. Modern IQ tests no longer use this ratio; instead, they set the mean at 100 and the standard deviation at 15, so IQ is a relative ranking within an age group.
Terman's views differed sharply from Binet's. Terman believed intelligence was largely innate and fixed. He launched a massive longitudinal study of gifted children (the "Termites"), tracking over 1,500 California children with IQs above 135 throughout their lives. The study produced valuable data but also reflected Terman's hereditarian assumptions.
Army testing and the misuse of IQ
During World War I, the U.S. Army administered intelligence tests to nearly two million draftees. The results appeared to show that recent immigrants from southern and eastern Europe scored lower than older immigrant groups from northern and western Europe, and that Black Americans scored lower than white Americans.
These results were used to argue for immigration restriction (the Immigration Act of 1924 cited them explicitly) and to support eugenic policies. The tests, however, were deeply flawed. They were written in English, penalizing non-native speakers. They relied on cultural knowledge specific to white, native-born, middle-class Americans — asking about tennis, Wild West shows, and products available in particular stores. People who had grown up in rural Poland or Sicily or the Jim Crow South were being tested on their familiarity with suburban American life, and the results were interpreted as measures of innate intelligence.
The psychologist Stephen Jay Gould documented this history in detail in The Mismeasure of Man (1981), showing how the testing reflected and reinforced social prejudices rather than measuring any biological reality.
Spearman and the g-factor
Charles Spearman observed in 1904 that scores on different mental ability tests tend to be positively correlated: people who do well on one test tend to do well on others. He called this general correlation g (for "general intelligence") and argued that it reflected a single underlying mental capacity.
Spearman used factor analysis, a statistical technique that identifies latent variables accounting for patterns of correlation among observed variables. A single factor (g) explained much of the shared variance across different tests. But Spearman's student L. L. Thurstone later argued that intelligence was better described by multiple primary mental abilities (verbal comprehension, numerical ability, spatial reasoning, word fluency, memory, perceptual speed, and reasoning) that were partially independent.
The g-vs-multiple-abilities debate continues. Most modern psychometricians accept that g is a real statistical phenomenon — test scores do intercorrelate — while disagreeing about what g means. Some interpret g as a unitary biological property of the brain (general processing speed or neural efficiency). Others see it as a statistical artifact that emerges whenever you test diverse cognitive abilities and look for common variance: people who are good at many things tend to be good at many things, but this does not prove there is a single underlying "intelligence organ."
Key concepts: theories of intelligence Intermediate
Gardner's multiple intelligences
Howard Gardner proposed in Frames of Mind (1983) that intelligence is not one thing but at least eight relatively independent faculties:
- Linguistic intelligence — sensitivity to spoken and written language, the ability to learn languages. Writers, poets, lawyers.
- Logical-mathematical intelligence — capacity for deductive reasoning, detecting patterns, scientific thinking. Mathematicians, scientists.
- Spatial intelligence — ability to visualize and manipulate objects in space. Architects, surgeons, pilots.
- Musical intelligence — skill in performing, composing, and appreciating musical patterns. Musicians, composers.
- Bodily-kinesthetic intelligence — capacity to use the body to solve problems or create products. Athletes, dancers, surgeons.
- Interpersonal intelligence — capacity to understand the intentions, motivations, and desires of others. Teachers, therapists, politicians.
- Intrapersonal intelligence — capacity to understand oneself, one's feelings, fears, and motivations. Philosophers, psychologists.
- Naturalistic intelligence — ability to recognize and classify plants, animals, and other natural objects. Biologists, farmers. (Added later.)
Gardner's theory is influential in education. It supports the idea that different students have different cognitive strengths and that curricula should address multiple modalities. It is also more inclusive than g-based theories: it recognizes cognitive abilities that are valued across cultures, not just the academic abilities measured by Western intelligence tests.
Critics note that Gardner's intelligences are difficult to test empirically. Gardner set criteria for what counts as an intelligence (evidence from brain damage, evolutionary history, developmental trajectories, psychometric evidence, and more), but these criteria are somewhat flexible. Some psychometricians argue that Gardner's intelligences are better understood as talents or cognitive styles than as separate intelligences, and that they actually correlate with each other — consistent with g. Gardner counters that the correlation is an artifact of the narrow range of abilities tested by standard IQ tests.
Sternberg's triarchic theory
Robert Sternberg proposed in Beyond IQ (1985) that intelligence has three components:
Analytical intelligence — the ability to analyze, compare, and evaluate. This is what standard IQ tests measure: solving well-defined problems with known correct answers.
Creative intelligence — the ability to generate novel ideas, deal with novel situations, and think outside established frameworks. This involves coping with new problems that do not resemble ones you have seen before.
Practical intelligence — the ability to adapt to, shape, and select environments. "Street smarts." This includes tacit knowledge — the unspoken rules of how things work that are not taught in school but are essential for success in real-world contexts.
Sternberg found that practical intelligence predicts job performance at least as well as IQ, and that the two are relatively independent. A person can be analytically brilliant but practically inept, or practically shrewd but academically mediocre.
Sternberg's theory, like Gardner's, broadens the concept of intelligence beyond what IQ tests capture. Both theories are compatible with the statistical reality of g while arguing that g is only part of the story.
The g-factor debate: what does IQ actually measure?
IQ tests reliably measure something. Test-retest reliability is high. Scores predict academic performance, job performance (in complex jobs), income, health outcomes, and longevity. The question is not whether IQ tests measure anything real — they do — but what that something is, and what it is not.
The psychometrician Arthur Jensen argued in a controversial 1969 article that IQ differences between Black and white Americans were substantially genetic in origin. This claim is addressed in detail in the next section. For now, note that the g-factor debate is not purely empirical — it involves questions about what intelligence is, how it should be defined, and whether a single number can capture cognitive ability.
The psychologist James Flynn documented a striking phenomenon: raw IQ scores increased steadily throughout the 20th century in every country where they were measured — about 3 points per decade. This Flynn effect is too rapid to be explained by genetic changes. It demonstrates that IQ scores are sensitive to environmental factors: nutrition, education, exposure to abstract reasoning tasks, and the cognitive demands of modern life all push scores upward. If IQ were a pure measure of innate ability, the Flynn effect would not exist.
Key experiment: race, IQ, and the evidence Intermediate
This section addresses the most contested topic in intelligence research directly and honestly. The goal is to present the evidence, distinguish what is known from what is not, and explain the consensus among researchers.
The observed gap
In the United States, average IQ scores differ between racial groups. The most commonly cited figure is that Black Americans score, on average, about 15 points lower than white Americans (approximately one standard deviation). This is a statistical fact about test scores. It is not in dispute.
What is in dispute is what this gap means — specifically, whether it reflects genetic differences between racial groups.
The hereditarian argument
A small number of researchers — most prominently Arthur Jensen (1969), Hans Eysenck, and later Philippe Rushton and Charles Murray (co-author of The Bell Curve, 1994) — have argued that the Black-white IQ gap is substantially genetic in origin. They cite several lines of evidence: the heritability of IQ within groups (twin studies show that IQ is highly heritable within white populations), the persistence of the gap across decades, and the fact that the gap is not fully explained by socioeconomic status.
Murray and Richard Herrnstein's The Bell Curve (1994) argued that IQ is an important predictor of social outcomes and that the Black-white IQ gap partly reflects genetic differences. The book was widely criticized for its methodology, its interpretation of the evidence, and its social policy implications.
The environmental argument and the scientific consensus
The overwhelming consensus among psychologists and geneticists is that there is no evidence for genetic racial differences in intelligence. The American Psychological Association's official task force on intelligence, convened in response to The Bell Curve, concluded in its 1996 report (Intelligence: Knowns and Unknowns):
"The differential between the mean intelligence test scores of Blacks and Whites [...] does not result from any obvious biases in test construction and administration, nor does it simply reflect differences in socio-economic status. Explanations based on factors of caste and culture may be important, and there is direct evidence for the role of genetic factors, but the evidence is not definitive."
The 2012 review by Nisbett, Aronson, Blair, Dickens, Flynn, Halpern, and Turkheimer — representing a broad range of leading researchers — stated the consensus more directly: "There is no evidence that the IQ difference between Blacks and Whites is genetic in origin." This review was published in American Psychologist, the flagship journal of the APA.
Several lines of evidence support the environmental explanation.
The Flynn effect. IQ scores have risen by roughly 3 points per decade for as long as they have been measured. If IQ reflected innate genetic capacity, this increase would not occur. The Flynn effect demonstrates that IQ scores are highly sensitive to environmental changes.
The narrowing of the gap. The Black-white IQ gap has narrowed substantially since the 1970s, from about 15 points to roughly 9-10 points, consistent with improvements in educational access, socioeconomic conditions, and reduced discrimination. A genetic explanation cannot account for a closing gap over a few decades.
Adoption studies. Children adopted from disadvantaged backgrounds into more affluent families show IQ gains of 12-18 points — roughly the size of the Black-white gap. A well-known study by Moore (1986) found that Black children adopted by white middle-class families had IQ scores comparable to the white average, not the Black average.
Stereotype threat. Claude Steele and Joshua Aronson demonstrated in 1995 that reminding Black students of their race before taking a test (by asking them to indicate their race on the test form) significantly reduced their performance. When the same test was presented without racial cues, the performance gap narrowed. Stereotype threat works by creating anxiety about confirming a negative stereotype, which consumes cognitive resources that would otherwise be available for the test. This finding has been replicated hundreds of times across many stereotyped groups: women performing worse on math tests when reminded of gender stereotypes, older adults performing worse on memory tests when reminded of age stereotypes, and white athletes performing worse on sports tasks when told the task measures "natural athletic ability" (activating the stereotype that Black athletes are naturally superior).
Stereotype threat is direct evidence that IQ test scores reflect social context, not just cognitive ability. It demonstrates that a significant portion of the observed gap can be produced purely by the testing situation itself.
Test bias. IQ tests were developed and normed on white, middle-class, English-speaking populations. Many test items draw on culturally specific knowledge and reasoning styles. The psychologist Arthur Wicherts and colleagues have shown that test items often function differently across cultural groups — an item may be measuring something different in one cultural context than in another.
Heritability within groups does not imply heritability between groups
This is the most common logical error in the race-and-IQ debate, and it requires careful explanation.
Heritability is a population statistic. It measures the proportion of variance in a trait within a defined population that is attributable to genetic variation within that population. If heritability of IQ within white Americans is 0.6, that means 60% of the variation in IQ scores among white Americans is associated with genetic differences among those individuals.
Heritability within a group says nothing about the cause of differences between groups. The classic example: within a population, height is highly heritable — tall parents tend to have tall children. But the height difference between Americans today and Americans in 1800 is almost entirely environmental (better nutrition). High within-group heritability does not prevent between-group differences from being environmental.
This logical point was made forcefully by the geneticist Richard Lewontin in 1970 and remains a cornerstone of the scientific argument: even if IQ is highly heritable within racial groups, the IQ difference between racial groups could be entirely environmental.
Intelligence as a cultural construct
What counts as "intelligent" varies across cultures. Western psychology emphasizes analytical reasoning, abstract problem-solving, and speed of cognitive processing — the skills measured by IQ tests. Many cultures value other cognitive abilities more highly: social intelligence, practical problem-solving, oral tradition and memorization, ecological knowledge, or artistic and musical skill.
The anthropologist Adrian C. "Enrique" Garcia has documented that many Indigenous American communities understand intelligence as a social and practical capacity — the ability to contribute to the community, solve practical problems collaboratively, and maintain harmonious relationships — rather than as an individual, abstract, analytical capacity. Similar observations have been made across African, Asian, and Pacific Islander cultures.
IQ tests measure a particular kind of cognitive ability — the kind most valued in Western, industrialized, literate societies. This is not a criticism of IQ tests per se; the abilities they measure are genuinely useful in those contexts. But it is a reason to be cautious about generalizing from IQ scores to "intelligence" in any universal sense.
Why this matters
The misuse of intelligence testing has caused real harm. Eugenics programs in the United States forcibly sterilized tens of thousands of people classified as "feeble-minded" on the basis of IQ tests, disproportionately targeting poor people, people of color, and immigrants. The Immigration Act of 1924 used intelligence testing data to justify quotas that blocked Jewish refugees fleeing Nazi Germany. In education, IQ testing has been used to track children into dead-end programs, limiting their opportunities based on a single score.
Honest coverage of this topic requires acknowledging both what IQ tests do well (they predict academic and occupational outcomes within populations, they identify cognitive disabilities, they provide useful data for research) and what they do not do (they do not measure innate biological intelligence, they do not compare the genetic potential of racial groups, they do not capture the full range of human cognitive abilities).
Exercises Intermediate
Emotional intelligence Master
Peter Salovey and John Mayer introduced the concept of emotional intelligence (EI) in 1990, defining it as the ability to perceive, use, understand, and manage emotions. Daniel Goleman's 1995 bestselling book popularized the concept and made broad claims about its importance for life success.
The Mayer-Salovey-Caruso model identifies four branches, arranged from basic to advanced:
- Perceiving emotions — detecting emotions in oneself and others through facial expressions, voice tone, and body language.
- Using emotions to facilitate thought — harnessing emotions to support reasoning. A positive mood can enhance creative thinking; a neutral or slightly negative mood can sharpen analytical thinking.
- Understanding emotions — recognizing how emotions combine, change over time, and transition from one to another. Understanding that irritation can escalate to anger, that relief often follows anxiety.
- Managing emotions — regulating emotions in oneself and others. The ability to calm down when angry, cheer up when discouraged, or help another person manage their emotional state.
Emotional intelligence is measured through ability-based tests (such as the MSCEIT, which presents emotion-related problems with objectively scored answers) and self-report questionnaires (such as the EQ-i). The two approaches measure somewhat different things — ability-based EI shows modest correlations with cognitive intelligence, while self-report EI correlates more strongly with personality traits like agreeableness and emotional stability.
The predictive validity of EI is debated. A meta-analysis by O'Boyle, Humphrey, Pollack, Hawver, and Story (2011) found that EI predicted job performance modestly, above and beyond cognitive ability and personality, but the effect sizes were smaller than claimed in popular accounts. EI appears to be most important in jobs that require substantial interpersonal interaction — management, therapy, teaching, sales.
The broader significance of emotional intelligence is conceptual: it extends the domain of "intelligence" beyond the cognitive abilities measured by IQ tests, consistent with Gardner's interpersonal and intrapersonal intelligences and Sternberg's practical intelligence. Whether EI is truly a form of intelligence or a set of personality traits and social skills remains debated.
The heritability of intelligence: quantitative details Master
The heritability of IQ is not a fixed number. It increases across the lifespan, from roughly 0.20 in early childhood to roughly 0.60 in adulthood and as high as 0.80 in late adulthood. This counterintuitive finding — that genes matter more as people age — is explained by the fact that people increasingly select environments that match their genetic propensities. A child with a genetic predisposition toward intellectual curiosity may be exposed to books by parents, but as an adult, that same person actively seeks out intellectual environments. The genetic influence is amplified by self-selected environments, a phenomenon called gene-environment correlation.
Twin studies compare the IQ correlation between monozygotic (identical) twins, who share nearly 100% of their genes, and dizygotic (fraternal) twins, who share roughly 50%. If IQ were entirely genetic, monozygotic twins would correlate at 1.0 and dizygotic twins at 0.5. The actual correlations are approximately 0.85 for monozygotic twins raised together, 0.75 for monozygotic twins raised apart, and 0.60 for dizygotic twins raised together. These figures support substantial heritability but also substantial environmental influence.
Adoption studies complicate the picture further. The Texas Adoption Study found that adopted children's IQs correlated more strongly with their biological mothers (whom they had never met) than with their adoptive mothers. But the Minnesota Study of Twins Reared Apart found that monozygotic twins raised in different families still had highly correlated IQs — while also showing the influence of the adoptive family environment on specific cognitive abilities.
The key insight from behavioral genetics is that heritability is not destiny. Even a trait with heritability of 0.80 is still 20% environmental, and environmental interventions can produce large changes. The Flynn effect — IQ gains of 15-20 points in a single generation — demonstrates that environmental changes can produce effect sizes comparable to the Black-white gap within a few decades, far faster than any genetic change could occur.
Eric Turkheimer and colleagues (2003) demonstrated that heritability of IQ depends on socioeconomic status. In impoverished families, heritability of IQ is near zero — virtually all variation is explained by shared environment. In affluent families, heritability is substantial. This means that the "nature vs. nurture" question has no single answer: the relative contribution of genes and environment depends on the range of environments in the population being studied. In environments where basic needs are unmet, environment dominates. In environments where basic needs are met, genetic differences have more room to express themselves.
Advanced topics: AI and intelligence Master
The relationship between artificial intelligence and human intelligence raises fundamental questions about what intelligence is, whether it is uniquely biological, and what it means to "measure" intelligence in a machine.
The Turing Test and its limitations
Alan Turing proposed in 1950 that the question "can machines think?" should be replaced by a practical test: if a human judge, conversing by text with a machine and a human, cannot reliably tell which is which, the machine should be considered intelligent. The Turing Test has been influential but heavily criticized.
The philosopher John Searle argued with his Chinese Room thought experiment (1980) that passing the Turing Test does not demonstrate understanding. A person in a room who follows instructions to manipulate Chinese symbols can produce convincing Chinese responses without understanding a word of Chinese. The room passes the Turing Test for Chinese, but there is no understanding — only symbol manipulation. Searle argued that computers, which manipulate symbols syntactically without semantics, are in the same position.
The Turing Test also reflects a culturally specific, language-centric view of intelligence. It tests the ability to produce linguistically appropriate responses — a subset of what humans consider intelligent behavior.
What AI systems actually do
Modern AI systems — large language models, image recognition systems, game-playing programs — demonstrate impressive performance on specific tasks. GPT-4 scores above the 90th percentile on the bar exam, above the 99th percentile on the verbal GRE, and performs well on many standardized intelligence tests. AlphaGo and its successors defeated the world's best Go players. DeepMind's AlphaFold solved the protein structure prediction problem.
But these systems lack many features that psychologists consider central to intelligence: they do not have goals, desires, or motivations; they do not generalize flexibly across domains without retraining; they do not learn efficiently from small amounts of data the way humans do; and they do not have the embodied, social, and emotional intelligence that Gardner and Sternberg emphasize.
The gap between narrow AI performance on specific tasks and general intelligence (sometimes called AGI) remains large. Whether this gap will be closed by scaling current approaches, by new architectures, or not at all is an open question.
Implications for intelligence theory
AI research has practical implications for the study of human intelligence. If machines can achieve high performance on IQ tests without possessing anything that most researchers would call "general intelligence," this raises questions about what IQ tests actually measure. It suggests that the skills assessed by IQ tests — pattern recognition, logical reasoning, vocabulary knowledge — can be performed by systems that lack consciousness, understanding, or genuine cognitive flexibility.
Conversely, the things humans find easy — recognizing a face, walking across a cluttered room, understanding a joke, reading social cues — have proven far more difficult for AI than the abstract reasoning tasks that IQ tests emphasize. This supports the arguments of Gardner, Sternberg, and others that intelligence is broader and more diverse than what IQ tests capture.
Connections to other fields Master
Philosophy of mind
The study of cognition connects directly to the philosophy of mind. The question of what thought is — a computational process, a biological phenomenon, an embodied and embedded activity — has been debated since Descartes. Functionalism (the dominant position in philosophy of mind) holds that mental states are defined by their functional role, not by their physical substrate — which is compatible with the idea that machines could, in principle, think. Embodied cognition challenges this, arguing that cognition depends on having a body interacting with a physical environment, not just on symbol manipulation.
The heuristics-and-biases program has philosophical implications for epistemology. If human reasoning is systematically biased, this raises questions about the reliability of human knowledge and the nature of rationality. Some philosophers (notably Keith Stanovich) distinguish between rationality (the ability to form true beliefs and make good decisions) and intelligence (the cognitive capacity measured by IQ tests). On this view, high intelligence does not guarantee rationality, because System 2 can be used to rationalize biases rather than overcome them.
Economics and decision science
Prospect theory transformed economics by providing a descriptively accurate model of human decision-making under risk. Behavioral economics — built on the foundations laid by Tversky and Kahneman — studies how actual human economic behavior deviates from the predictions of rational-choice models. Richard Thaler and Cass Sunstein's Nudge (2008) applied behavioral economics to policy, arguing that governments can improve outcomes by designing choice architectures that account for cognitive biases.
Neuroscience
Cognitive neuroscience uses brain imaging (fMRI, EEG, PET) to study the neural correlates of cognitive processes. Research on working memory (Baddeley's model), attentional networks (Posner), and decision-making (the role of the prefrontal cortex and the dopamine system) connects psychological theories of cognition to underlying brain mechanisms. The discovery that the brain's default mode network is active during rest and mind-wandering has reshaped understanding of creativity and insight.
Education
Gardner's multiple intelligences, Sternberg's triarchic theory, and research on stereotype threat have all influenced educational practice. Understanding that intelligence is multifaceted supports differentiated instruction. Understanding stereotype threat supports interventions to close achievement gaps. Understanding heuristics and biases supports critical thinking education.
Historical and philosophical context Master
The prehistory of intelligence testing
The desire to measure mental ability predates psychology. Imperial China used civil service examinations for over a thousand years, testing knowledge of Confucian classics, poetry composition, and policy analysis. These examinations were meritocratic in principle but reflected the cultural and educational capital of the scholar-gentry class.
In the nineteenth century, Francis Galton — Charles Darwin's cousin — established the eugenics movement and attempted to measure intelligence through sensory acuity (reaction time, visual discrimination, hearing sensitivity). Galton's approach failed: sensory measures did not correlate well with mental ability. But his statistical innovations (correlation, regression to the mean) became foundational for the field.
Galton's eugenics program — improving the human species through selective breeding — directly influenced the development of intelligence testing. Binet's test was a practical educational tool, but its American adapters (Terman, Henry Goddard, Robert Yerkes) brought strong hereditarian and eugenic convictions to its use. Goddard testified before Congress advocating immigration restrictions based on IQ testing of arrivals at Ellis Island. Yerkes oversaw the Army testing program that produced the data used to justify the Immigration Act of 1924.
The cognitive revolution
The study of cognition was suppressed in American psychology during the behaviorist era (roughly 1913-1956), when John B. Watson and B. F. Skinner argued that psychology should study only observable behavior, not internal mental processes. The "cognitive revolution" of the 1950s and 1960s — driven by George Miller, Jerome Bruner, Noam Chomsky, Allen Newell, and Herbert Simon — restored the study of mental processes to scientific respectability.
Chomsky's critique of Skinner's Verbal Behavior (1959) was a turning point. Chomsky argued that language could not be explained by stimulus-response associations alone — the creativity and generativity of language required innate cognitive structures. This argument opened the door to studying internal mental representations, which became the foundation of cognitive psychology.
The culture wars around intelligence
The intelligence debate has been repeatedly weaponized in political contexts. Jensen's 1969 article in the Harvard Educational Review was used to argue against compensatory education programs (like Head Start) on the grounds that IQ differences were genetic and therefore not remediable by environmental intervention. The Bell Curve (1994) was used to argue against affirmative action and social welfare programs.
These political uses of intelligence research have been criticized on multiple grounds. First, the scientific evidence does not support the conclusions being drawn (as documented above). Second, even if IQ differences were partly genetic, this would not justify policies that deny people opportunity — heritability within groups says nothing about what interventions can accomplish. Third, the framing of the debate in terms of "racial differences in intelligence" treats race as a biological category, when modern genetics shows that racial categories are social constructs that do not map neatly onto genetic variation.
The Human Genome Project confirmed that there is more genetic variation within any racial group than between racial groups. The concept of "race" used in the IQ debate — Black, white, Hispanic, Asian — captures a tiny fraction of human genetic diversity. These categories reflect social and historical processes (slavery, colonialism, immigration patterns), not biological boundaries.
Bibliography Master
Primary sources
- Binet, A. & Simon, T. (1905). Methodes nouvelles pour le diagnostic du niveau intellectuel des anormaux. L'Annee Psychologique, 11, 163-191.
- Spearman, C. (1904). "General Intelligence," Objectively Determined and Measured. American Journal of Psychology, 15, 201-292.
- Terman, L. M. (1916). The Measurement of Intelligence. Boston: Houghton Mifflin.
- Thurstone, L. L. (1938). Primary Mental Abilities. Chicago: University of Chicago Press.
- Tversky, A. & Kahneman, D. (1974). Judgment under Uncertainty: Heuristics and Biases. Science, 185, 1124-1131.
- Tversky, A. & Kahneman, D. (1979). Prospect Theory: An Analysis of Decision under Risk. Econometrica, 47, 263-292.
- Gardner, H. (1983). Frames of Mind: The Theory of Multiple Intelligences. New York: Basic Books.
- Sternberg, R. J. (1985). Beyond IQ: A Triarchic Theory of Human Intelligence. Cambridge: Cambridge University Press.
- Steele, C. M. & Aronson, J. (1995). Stereotype Threat and the Intellectual Test Performance of African Americans. Journal of Personality and Social Psychology, 69, 797-811.
- Salovey, P. & Mayer, J. D. (1990). Emotional Intelligence. Imagination, Cognition and Personality, 9, 185-211.
Secondary sources and reviews
- Kahneman, D. (2011). Thinking, Fast and Slow. New York: Farrar, Straus and Giroux.
- Neisser, U. et al. (1996). Intelligence: Knowns and Unknowns. American Psychologist, 51, 77-101.
- Nisbett, R. E. et al. (2012). Intelligence: New Findings and Theoretical Developments. American Psychologist, 67, 130-159.
- Gould, S. J. (1981). The Mismeasure of Man. New York: W. W. Norton.
- Turkheimer, E. et al. (2003). Socioeconomic Status Modifies Heritability of IQ in Young Children. Psychological Science, 14, 623-628.
- Flynn, J. R. (2007). What Is Intelligence? Cambridge: Cambridge University Press.
- Gigerenzer, G. (2007). Gut Feelings: The Intelligence of the Unconscious. New York: Viking.
- Boroditsky, L. (2011). How Language Shapes Thought. Scientific American, 304, 62-65.
- Winawer, J. et al. (2007). Russian Blues Reveal Effects of Language on Color Discrimination. Proceedings of the National Academy of Sciences, 104, 7780-7785.
- Stanovich, K. E. (2010). Rationality and the Reflective Mind. Oxford: Oxford University Press.
- Whorf, B. L. (1956). Language, Thought, and Reality. Cambridge: MIT Press.