29.01.01 · psychology / intro-methods

Introduction to psychology and research methods

shipped3 tiersLean: none

Anchor (Master): primary sources: Wundt 1874, James 1890, Freud 1900, Watson 1913, Skinner 1938, Milgram 1963, Open Science Collaboration 2015, Henrich et al. 2010; secondary: Hergenhahn, Henley, Fancher

Intuition Beginner

Psychology is the scientific study of behavior and mental processes. That definition carries two commitments. First, psychology studies what organisms do — behavior you can observe and measure. Second, it studies what happens inside the organism — thoughts, emotions, memories, motivations — the mental processes that accompany and drive behavior.

Psychology is not mind-reading, and it is not common sense dressed up in lab coats. Most people have strong intuitions about why humans act the way they do. Those intuitions are sometimes right and often wrong. The difference between a psychologist and someone speculating at a dinner party is method: the psychologist tests claims against evidence using methods designed to rule out alternative explanations.

The field grew out of two older traditions. Philosophy asked questions about the nature of mind, knowledge, and free will. Physiology studied the nervous system and the senses. In 1879, Wilhelm Wundt opened the first psychology laboratory in Leipzig, Germany, marking the transition from armchair speculation to empirical investigation. William James published The Principles of Psychology in 1890, giving the field its American foundation.

Around the same period, Sigmund Freud developed psychoanalysis in Vienna, proposing that unconscious drives — especially sexual and aggressive ones — shape behavior in ways the conscious mind does not recognize. Freud's ideas were enormously influential in Western culture but proved difficult to test scientifically. In the early twentieth century, John B. Watson and B. F. Skinner rejected the study of inner experience entirely, launching behaviorism. They argued that psychology should study only observable behavior, and that all complex behavior is learned through conditioning and reinforcement.

By the 1950s and 1960s, the cognitive revolution brought mental processes back. Psychologists began treating the mind as an information processor — encoding, storing, and transforming knowledge. More recently, brain imaging, genetics, and neuroscience have added a biological layer of explanation. Each of these schools captured something real about human nature, and each also missed important things. Contemporary psychology draws on all of them.

Psychology uses the same scientific method as the natural sciences: observe, form a hypothesis, test it with a study, analyze the data, and revise. A good study specifies its variables in advance, uses appropriate controls, and reports its methods in enough detail for others to replicate.

One further limitation shapes the entire discipline. Most published research in psychology comes from a narrow slice of humanity: people who are Western, Educated, Industrialized, Rich, and Democratic — abbreviated WEIRD. One review found that about 68% of participants in published psychology studies come from the United States alone, and most of those are undergraduates taking psychology courses for credit. Findings based on this population are often presented as universal truths about human nature. They may instead be truths about a particular demographic.

Visual Beginner

The table below maps the five major research designs used in psychology. The key distinction is whether the researcher manipulates a variable (experimental) or merely observes what naturally occurs (non-experimental).

Design	What the researcher does	Key strength	Key limitation
Experimental	Manipulates IV, randomly assigns participants to groups	Can establish causation	Lab settings may be artificial
Correlational	Measures two variables and computes their statistical relationship	Can study variables that cannot ethically be manipulated	Cannot establish causation
Observational	Watches and records behavior in natural settings	Captures authentic, real-world behavior	Observer bias; no control over confounds
Case study	Conducts deep investigation of one person or event	Rich detail; useful for rare phenomena	Cannot generalize to populations
Longitudinal	Tracks the same participants across months or years	Shows how individuals change over time	Expensive; participants drop out

Only experimental designs — where the researcher randomly assigns participants to conditions and controls confounding variables — support causal claims. Correlational designs reveal relationships but leave open the possibility that a third variable causes both. Observational, case study, and longitudinal designs each offer unique advantages but share the same fundamental limitation: no causal inference without experimental manipulation.

Worked example Beginner

A researcher wants to know whether sleep deprivation affects memory. She recruits 40 participants and randomly assigns them to two groups. The control group sleeps normally (8 hours). The experimental group stays awake for 24 hours. Both groups then take the same 20-word recall test.

The independent variable (IV) is sleep condition: 8 hours versus 24 hours awake. The dependent variable (DV) is the number of words correctly recalled. Random assignment distributes individual differences in memory ability roughly equally across both groups, which is what makes causal inference possible.

Results: the control group recalls an average of 14.2 words. The experimental group recalls an average of 9.8 words. The difference is 4.4 words.

Does this prove that sleep deprivation causes worse memory? Not by itself. The researcher needs a statistical test to determine whether the 4.4-word difference is large enough to be unlikely if there were no real effect — if the difference were due to random chance alone.

A t-test comparing the two group means yields $p = 0.003$ . This means: if sleep deprivation had no real effect on memory, there would be only a 0.3% chance of observing a difference this large or larger from random assignment. The researcher concludes that the evidence supports the hypothesis: sleep deprivation impairs memory recall.

Notice what the $p$ -value does NOT say. It does not say there is a 99.7% chance the hypothesis is true. It does not say the effect is large or important. It says: given no real effect, data this extreme would be rare. That is a specific and somewhat counterintuitive claim, and misunderstanding it is one of the most common statistical errors in the entire field.

Check your understanding Beginner

Exercise (easy, multiple choice).

A researcher finds that children who eat breakfast score higher on math tests. Can the researcher conclude that eating breakfast causes better math performance?

A. Yes, because the data show a relationship. B. No, because this is a correlational finding and other factors could explain both the breakfast habit and the test scores. C. Yes, as long as the sample size was large enough. D. No, because breakfast has no effect on cognitive performance.

Hint

Think about whether the researcher manipulated any variable or merely observed two things that tend to co-occur.

Answer

Option B.

This is a correlational observation, not an experiment. Children who eat breakfast may also have more stable home environments, better overall nutrition, or more parental involvement in schoolwork — any of which could explain both the breakfast habit and the higher test scores. Without random assignment, causation cannot be established.

Formal definition Intermediate+

A hypothesis is a testable prediction about the relationship between variables. An independent variable (IV) is the variable the researcher manipulates. A dependent variable (DV) is the variable the researcher measures. An operational definition specifies how a variable will be measured or quantified. "Aggression" is not directly measurable; an operational definition might specify it as the number of times a child strikes a bobo doll during a 10-minute observation period.

A confounding variable (confound) is a variable that varies systematically with the IV and also affects the DV. If the sleep-deprivation study ran the control group in the morning and the experimental group at night, time of day would be a confound. The memory difference could be caused by fatigue at the end of the day rather than by sleep loss.

Random assignment distributes known and unknown individual differences across conditions, protecting internal validity. Random selection (sampling) gives every member of the target population an equal chance of being included, supporting external validity (generalizability). These are different mechanisms serving different purposes.

Research designs — formal taxonomy

Experimental design. The researcher (1) manipulates the IV, (2) randomly assigns participants to conditions, (3) measures the DV, and (4) controls known confounds. Only experimental designs support causal inference. Subtypes include between-subjects designs (different participants in each condition) and within-subjects designs (same participants in all conditions, with order counterbalanced to control carryover effects).

Quasi-experimental design. The researcher compares pre-existing groups that differ on the variable of interest but cannot randomly assign participants. Example: comparing depression rates in two cities with different air pollution levels. The lack of random assignment weakens but does not eliminate causal claims; statistical controls (covariates, matching) can partially compensate.

Correlational design. The researcher measures two or more variables without manipulation and quantifies their relationship. The Pearson correlation coefficient $r$ ranges from $- 1$ (perfect negative linear relationship) through $0$ (no linear relationship) to $+ 1$ (perfect positive linear relationship). Correlation does not imply causation because a third variable (confound) may cause both measured variables, or the relationship may be coincidental.

Observational design. The researcher observes behavior in its natural context without intervention. Variants include naturalistic observation (no involvement), participant observation (researcher joins the group being studied), and structured observation (predefined behavioral coding scheme). Observer bias and reactivity (participants changing behavior when watched) threaten validity.

Case study. An intensive investigation of a single individual, group, or event. Case studies generate hypotheses and document rare or unusual phenomena. Phineas Gage, who survived an iron rod through his frontal lobe in 1848, provided early evidence for localized brain function. Case studies cannot test hypotheses or support generalization.

Longitudinal design. The same participants are assessed repeatedly over an extended period, revealing within-person change and developmental trajectories. Attrition (systematic dropout) is the primary threat: participants who remain in the study may differ from those who leave, biasing results.

Cross-sectional design. Different groups (typically different ages) are compared at a single time point. Faster and cheaper than longitudinal designs, but confounded by cohort effects — differences between age groups may reflect generational experiences rather than developmental change.

Descriptive statistics

Descriptive statistics summarize a dataset without drawing inferences beyond it.

Central tendency. The mean is the arithmetic average: $\overset{x}{ˉ} = \frac{1}{n} \sum_{i = 1}^{n} x_{i}$ . The median is the middle value when scores are ordered. The mode is the most frequent value. The mean is sensitive to outliers; the median is robust. For the dataset {2, 3, 3, 5, 100}, the mean is 22.6 (pulled upward by the outlier), the median is 3, and the mode is 3.

Variability. The range is the maximum minus the minimum. The variance $s^{2}$ is the average squared deviation from the mean. The standard deviation $s$ is the square root of the variance; it is in the same units as the original data, making it more interpretable. A small $s$ indicates scores cluster tightly around the mean; a large $s$ indicates scores are spread out.

The normal distribution. A symmetric, bell-shaped probability distribution defined by its mean $μ$ and standard deviation $σ$ . Approximately 68% of values fall within one standard deviation of the mean, approximately 95% within two standard deviations, and approximately 99.7% within three. Many psychological variables are approximately normally distributed in large samples.

Inferential statistics

Inferential statistics use sample data to draw conclusions about the populations from which the samples were drawn.

Null hypothesis significance testing (NHST). The researcher states a null hypothesis $H_{0}$ (no effect) and an alternative hypothesis $H_{1}$ (there is an effect). A test statistic is computed from the data, yielding a $p$ -value: the probability of obtaining a test statistic at least as extreme as the observed value, assuming $H_{0}$ is true. If $p < α$ (where $α$ is the significance level, conventionally set at 0.05), the result is deemed "statistically significant" and $H_{0}$ is rejected.

Type I error ( $α$ ): rejecting $H_{0}$ when it is actually true (false positive). Setting $α = 0.05$ means accepting a 5% false-positive rate. Type II error ( $β$ ): failing to reject $H_{0}$ when it is false (false negative). Statistical power ( $1 - β$ ): the probability of correctly detecting a real effect. Power depends on three factors: effect size (larger effects are easier to detect), sample size (larger samples increase power), and $α$ (a more lenient threshold increases power but also increases Type I error).

Effect size. A measure of the magnitude of an effect, independent of sample size. Unlike $p$ -values, effect sizes do not automatically become "significant" with larger samples. Common measures include Cohen's $d$ (the standardized difference between two group means: $d = (\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2}) / s_{pooled}$ ) and Pearson's $r$ . Cohen's guidelines: $d = 0.2$ is small, $d = 0.5$ is medium, $d = 0.8$ is large.

Confidence interval (CI). A range of values that, in repeated sampling, would contain the true population parameter a specified percentage of the time. A 95% CI for a mean indicates that if the study were repeated many times, 95% of the computed intervals would contain the true population mean. CIs convey both the point estimate and its precision.

Ethics in psychological research

Several landmark events shaped modern research ethics.

Milgram's obedience studies (1961-1962). Stanley Milgram recruited participants to administer what they believed were painful electric shocks to another person (a confederate) as part of a "learning experiment." No actual shocks were delivered, but participants did not know this. Milgram found that 65% of participants administered the maximum "shock" level (450 volts) when instructed by an authority figure. The study revealed powerful obedience effects but raised serious ethical concerns about deception, emotional distress, and the adequacy of informed consent.

Zimbardo's Stanford Prison Experiment (1971). Philip Zimbardo randomly assigned college students to play "guards" or "prisoners" in a mock prison in the basement of the Stanford psychology building. The study was planned for two weeks but terminated after six days because guards became aggressively abusive and prisoners showed extreme psychological distress. The study demonstrated how situational roles can override individual dispositions. The ethical problems were severe: participants were not adequately protected from harm, and Zimbardo served simultaneously as researcher and prison superintendent, compromising his ability to monitor participant welfare.

The Belmont Report (1979). Commissioned in response to ethical violations in biomedical and behavioral research — most notoriously the Tuskegee Syphilis Study, in which the U.S. Public Health Service studied the progression of untreated syphilis in 399 Black men without their informed consent for forty years. The Belmont Report established three principles. Respect for persons: individuals have the right to make autonomous decisions; people with diminished autonomy require additional protections. Beneficence: researchers must minimize harm and maximize benefits. Justice: the burdens and benefits of research should be distributed fairly across populations.

Institutional Review Boards (IRBs). All research involving human participants at institutions receiving U.S. federal funding must be reviewed and approved by an IRB before data collection begins. The IRB evaluates the risk-benefit ratio, the adequacy of informed consent procedures, protections for vulnerable populations (children, prisoners, cognitively impaired individuals), and the necessity of any deception used.

The WEIRD problem

Henrich, Heine, and Norenzayan (2010) documented that psychological research relies overwhelmingly on participants from WEIRD (Western, Educated, Industrialized, Rich, Democratic) societies. By their analysis, 96% of psychological studies come from countries representing just 12% of the world's population. Within the United States, the majority of research participants are undergraduate students — mostly aged 18-22, mostly taking introductory psychology, mostly from middle-class backgrounds.

This matters because WEIRD populations are, in many respects, statistical outliers among the world's peoples. Compared to non-WEIRD populations, WEIRD participants tend to be more individualistic, more analytically oriented (as opposed to holistically), more likely to attribute behavior to disposition rather than situation, more trusting of strangers, and more likely to make fairness judgments based on impartial principles rather than relationships. Findings based on WEIRD participants should not be assumed to generalize to all of humanity without empirical verification.

Key result: reproducibility, statistical power, and the replication crisis Intermediate+

In 2015, the Open Science Collaboration published the results of an unprecedented replication effort. Two hundred seventy researchers across multiple institutions attempted to replicate 100 studies published in three top-tier psychology journals. Only 36% of the replications produced statistically significant results in the same direction as the original studies. Mean effect sizes in the replications were approximately half the magnitude of the original reports.

This finding — that most published findings in a sample of high-profile psychology studies did not replicate — ignited the replication crisis (sometimes called the reproducibility crisis or the credibility revolution). The crisis is not unique to psychology. Similar problems have been documented in medicine (Ioannidis, 2005), economics (Camerer et al., 2016), and other empirical fields. But psychology became the flashpoint because of the scale and transparency of the replication effort and because the field responded with unusual speed and openness.

Causes of the replication crisis

Publication bias. Journals preferentially publish statistically significant results ( $p < 0.05$ ). Studies that find no effect — null results — languish in researchers' file drawers or go unpublished. The resulting literature overstates the frequency and magnitude of real effects. Rosenthal (1979) called this the "file drawer problem." If 20 studies test the same false hypothesis, one will produce $p < 0.05$ by chance alone. If only that one study is published, the literature contains a false positive with no counterevidence visible.

Low statistical power. Many published studies use samples too small to reliably detect the effects they investigate. An underpowered study that happens to produce a significant result will overestimate the true effect size — a phenomenon called the "winner's curse." If the true effect is small and the study is powered at 30%, the significant result is likely inflated, and replication attempts will find a smaller effect or none at all.

Questionable research practices (QRPs). Researchers have substantial flexibility in data analysis: collecting additional participants until results become significant, selectively reporting dependent variables that "worked," excluding "outlier" data points that weaken the result, and running many statistical tests but reporting only the significant ones. Simmons, Nelson, and Simonsohn (2011) demonstrated that a combination of common QRPs can inflate the false-positive rate from the nominal 5% to over 60%.

Small samples for small effects. Many psychological effects are small-to-medium in magnitude (Cohen's $d < 0.5$ ). Detecting them reliably requires large samples, but many published studies use 20-50 participants per condition — far below what a power analysis would recommend.

The response: reform as self-correction

The response to the replication crisis has been substantial and ongoing.

Preregistration — registering hypotheses, methods, and analysis plans in a public repository before data collection — reduces analytical flexibility and prevents selective reporting. Open data and materials allow independent verification and re-analysis. Registered Reports — a publication format in which peer review occurs before data collection — eliminate publication bias for null results because the decision to publish is made before results are known.

Larger sample sizes and multisite collaborations increase power and generalizability. Many labs now participate in large-scale replication projects where the same protocol is run across dozens of sites with thousands of participants, producing more reliable effect-size estimates.

Presenting the replication crisis solely as a failure misses something important about how science works. The fact that psychologists could organize 270 researchers to independently test published claims, report the results openly, and use those results to reform their methods is a sign of scientific health, not sickness. A field that cannot self-correct — that never questions its published findings — is in much deeper trouble than one that does. The replication crisis, viewed this way, is evidence that psychology's commitment to empirical evidence is genuine: when the evidence said many findings were unreliable, the field reformed its practices.

Effect sizes matter more than p-values

The American Statistical Association released an unprecedented statement on $p$ -values in 2016 (Wasserstein and Lazar, 2016). The key points: $p$ -values do not measure the probability that a hypothesis is true, do not measure the size or importance of an effect, and are not a reliable indicator of whether a result will replicate. A $p$ -value is the probability of observing data at least as extreme as the actual data, under the assumption that the null hypothesis is true. Nothing more.

Effect sizes provide the information that $p$ -values lack. Cohen's $d$ expresses the difference between two group means in standard-deviation units. The guidelines ( $d = 0.2$ small, $d = 0.5$ medium, $d = 0.8$ large) are rough benchmarks, not universal thresholds. An effect classified as "small" by Cohen's conventions could be enormously consequential if it affects millions of people. A cheap educational intervention that improves reading scores by $d = 0.1$ across every schoolchild in a country would shift millions of lives, even though the effect looks small in any single classroom.

Exercises Intermediate+

Exercise 2 (easy, multiple choice).

A correlation of $r = - 0.85$ between hours of screen time and academic performance indicates:

A. No relationship B. A strong negative association — more screen time is associated with lower grades C. A strong negative association — more screen time causes lower grades D. The relationship is invalid because correlations cannot be negative

Hint

The sign of $r$ indicates direction; the magnitude indicates strength. Correlation does not establish causation.

Answer

Option B.

A correlation of $r = - 0.85$ is strong (close to $- 1$ ) and negative, meaning that as screen time increases, academic performance tends to decrease. The association is strong, but causation cannot be inferred. A third variable — socioeconomic status, parenting style, sleep quality — could explain both screen time and grades.

Exercise 3 (medium, conceptual).

A study reports that a new therapy reduces depression scores with $p = 0.03$ , Cohen's $d = 0.15$ . A second study reports that a different therapy reduces depression scores with $p = 0.08$ , Cohen's $d = 0.65$ . Which study provides stronger evidence of a practically meaningful effect? Explain.

Hint

$p$ -values and effect sizes convey different information. Statistical significance and practical importance are separate questions.

Answer

The second study shows stronger evidence of a practically meaningful effect.

The first study achieves statistical significance ( $p = 0.03 < 0.05$ ) but the effect is tiny ( $d = 0.15$ ). The therapy "works" in a narrow statistical sense, but the improvement is too small to matter in most clinical contexts.

The second study does not reach conventional significance ( $p = 0.08$ ), but the effect size is large ( $d = 0.65$ ). The non-significant $p$ -value may reflect low statistical power from a small sample, not the absence of a real effect.

This illustrates why relying on $p$ -values alone can mislead. The first result is significant but negligible; the second is non-significant but potentially important. A replication of the second study with a larger sample would likely produce both statistical significance and a large effect size.

Exercise 4 (medium, conceptual).

Identify the independent variable, dependent variable, and at least three potential confounding variables in this study: "A researcher compares the reading comprehension scores of students at a well-funded suburban school and an underfunded urban school."

Hint

The researcher did not assign students to schools. What else differs between the two groups besides funding?

Answer

IV (quasi-independent): School funding level (well-funded suburban vs. underfunded urban)
DV: Reading comprehension scores
Potential confounds: socioeconomic status of families, teacher qualifications, class size, access to books at home, parental education level, English language proficiency, neighborhood resources, nutrition, exposure to environmental lead, community support services, and parental involvement in schoolwork

This is a quasi-experimental design because the researcher cannot randomly assign students to school districts. The many confounds make it difficult to attribute reading differences to school funding alone.

Exercise 5 (medium, conceptual).

Explain why the statement "the study found a significant result with $p = 0.04$ , so there is a 96% chance the hypothesis is true" is incorrect.

Hint

What does the $p$ -value actually represent? It is conditional on the null hypothesis, not on the research hypothesis.

Answer

The $p$ -value is the probability of observing data at least this extreme given that the null hypothesis is true. It is not the probability that the null hypothesis is true, nor the probability that the research hypothesis is true.

$p = 0.04$ means: if there were no real effect, data this extreme would occur only 4% of the time. The 96% figure refers to a probability statement about the data under the null, not about the truth of the hypothesis. Whether the hypothesis is actually true depends on prior evidence, effect size, replication, and theoretical plausibility — none of which are captured by the $p$ -value.

Exercise 6 (hard, conceptual).

A researcher conducts a study with 20 participants per group, finds $p = 0.06$ (not significant), decides to add 10 more participants per group, re-runs the analysis, and obtains $p = 0.04$ (significant). The researcher reports only the second analysis. Identify all the problems with this approach.

Hint

What happens to the false-positive rate when researchers can peek at their data and decide whether to collect more?

Answer

This practice — collecting data, checking significance, and then collecting more data if the result is not yet significant — is a form of $p$ -hacking called "optional stopping." It inflates the Type I error rate above the nominal $α$ level.

The stated $α = 0.05$ assumes a single, pre-specified analysis. When the researcher gets two chances to find significance (one at $n = 40$ , one at $n = 60$ ), the actual probability of a false positive rises above 5%. The more times a researcher peeks, the higher the false positive rate climbs — potentially exceeding 20% with repeated checking.

Additional problems: (1) Reporting only the significant analysis hides the research path, creating a misleading narrative about how the result was obtained. (2) The effect size from a stopped analysis is likely inflated because the stopping rule selected for results that crossed the significance threshold. (3) The practice violates the preregistration principle: the sample size should be determined before data collection based on an a priori power analysis, not adjusted post hoc based on results.

Exercise 7 (hard, conceptual).

A study finds that people in Japan and the United States describe themselves differently. Japanese participants emphasize group memberships and social roles; American participants emphasize personal traits and individual achievements. A psychology textbook presents this finding as revealing how "people" think about themselves. Identify the problem with this presentation and explain how it connects to the WEIRD critique.

Hint

The comparison involves two populations. What is the baseline against which "different" is measured? And what does this imply about which group the textbook treats as the default?

Answer

The presentation treats American self-concept (individualistic, trait-based) as the default and Japanese self-concept (collectivist, role-based) as the interesting contrast. But from a global perspective, the American pattern is the outlier. Cross-cultural research shows that the majority of the world's populations emphasize relational and collective self-concepts, not independent ones.

The textbook framing commits two errors. First, it generalizes from two data points to "people" in general, when both Japanese and American participants are from wealthy, industrialized, democratic nations — both are WEIRD. Second, it implicitly treats the Western (American) pattern as the universal baseline and the non-Western pattern as the deviation, when the evidence suggests the reverse: independent self-construal is the cross-cultural exception, not the rule.

This connects to the WEIRD critique because much of what psychology textbooks present as "how the mind works" is actually "how the American undergraduate mind works." The appropriate response is not to discard these findings but to test them across diverse populations and present them as culture-specific until cross-cultural evidence accumulates.

Competing perspectives: psychoanalysis, behaviorism, and cognitivism Master

Each major school of psychology captured genuine insights about human behavior and mental life. Each also had limitations that motivated the next generation to develop alternative frameworks. Understanding psychology requires treating these perspectives not as discarded relics but as partially correct accounts that together illuminate different aspects of the same phenomenon.

Psychoanalysis: the unconscious and the limits of self-knowledge

Freud's central insight was that much of mental life operates outside conscious awareness. Thoughts, desires, memories, and conflicts that are too threatening to acknowledge are relegated to the unconscious, where they continue to influence behavior in disguised forms — in dreams, slips of the tongue, physical symptoms, and repeated relational patterns.

Modern cognitive science has partly validated this insight. Research on implicit biases, automatic processing, priming, and the cognitive unconscious confirms that a great deal of mental activity occurs outside awareness. People can hold attitudes they do not consciously endorse, and those attitudes can shape behavior without the person's knowledge.

Freud's specific theoretical mechanisms — the Oedipus complex, psychosexual stages, the structural model of id, ego, and superego — have not held up to empirical scrutiny. The theory's resistance to falsification is the basis of Karl Popper's critique that psychoanalysis is unfalsifiable and therefore does not qualify as scientific. Any outcome can be interpreted post hoc as consistent with the theory: a patient who agrees with the analyst's interpretation is showing insight; a patient who disagrees is showing "resistance," which itself confirms the interpretation.

Psychoanalytic therapy (psychoanalysis and its shorter-term derivatives, psychodynamic therapy) has a smaller evidence base than cognitive-behavioral therapy but is not empty. A 2015 meta-analysis by Shedler found that psychodynamic therapy was effective for a range of conditions, with effect sizes comparable to other evidence-based treatments. The "common factors" debate — whether specific therapeutic techniques matter more than the therapeutic relationship, expectations, and other nonspecific factors — complicates any claim that one therapeutic approach is uniquely effective.

Behaviorism: the power of environment

Behaviorism's founding methodological insight was that psychology should be an objective science studying observable behavior rather than subjective experience. Watson's 1913 manifesto and Skinner's subsequent work on operant conditioning established that behavior is shaped by its consequences: reinforcement increases the frequency of a behavior, punishment decreases it, and extinction occurs when reinforcement stops.

Behaviorism produced robust, replicable findings that remain in active use. Operant conditioning principles are applied in education (token economies, programmed instruction), clinical treatment (applied behavior analysis for autism spectrum disorder), animal training, and organizational management. Classical conditioning explains phobia acquisition, taste aversion, and the bodily responses triggered by cues associated with past experiences. The behaviorist emphasis on environmental determinants of behavior remains methodologically important.

The limitations of behaviorism were exposed by several lines of evidence. Edward Tolman's work on cognitive maps showed that rats learn spatial layouts without any reinforcement for specific routes, implying internal representations rather than simple stimulus-response bonds. Noam Chomsky's 1959 review of Skinner's Verbal Behavior argued that language acquisition cannot be explained by reinforcement alone: children produce novel sentences they have never been rewarded for saying, and they make systematic errors (e.g., "I goed") that suggest rule learning rather than imitation. Martin Seligman's concept of "prepared learning" (1970) showed that some associations form far more readily than others — monkeys easily learn to fear snakes but not flowers — implying biological constraints on learning that pure behaviorism could not explain.

Cognitivism: the mind as information processor

The cognitive revolution of the 1950s and 1960s restored the study of mental processes by treating the mind as an information-processing system that encodes, stores, retrieves, and transforms representations. George Miller's "The Magical Number Seven, Plus or Minus Two" (1956) identified capacity limits in working memory. Atkinson and Shiffrin's multi-store model (1968) distinguished sensory, short-term, and long-term memory systems. Ulric Neisser's Cognitive Psychology (1967) gave the field its name.

Cognitive psychology generated precise, testable models of attention (selective, divided, sustained), memory (encoding, storage, retrieval, forgetting), language (syntax, semantics, pragmatics), decision-making (heuristics and biases), and problem-solving (algorithms, insight, functional fixedness). The heuristics-and-biases program founded by Amos Tversky and Daniel Kahneman demonstrated systematic deviations from rational decision-making — availability, representativeness, anchoring, framing, loss aversion — earning Kahneman the 2002 Nobel Prize in Economics and launching behavioral economics.

Cognitive-behavioral therapy (CBT), which integrates cognitive restructuring with behavioral techniques, has the largest evidence base of any psychotherapy approach. CBT is a first-line treatment for anxiety disorders, depression, insomnia, and many other conditions.

Cognitivism's limitations have become apparent. The mind-as-computer metaphor underestimates the role of emotion, embodiment, and social context. Antonio Damasio's somatic marker hypothesis (1994) demonstrated that emotional processing is integral to rational decision-making, not a source of noise to be filtered out. The "situated cognition" and "embodied cognition" movements argue that cognitive processing depends on the body and the environment, not just on abstract symbol manipulation. Connectionist (neural network) models challenged the assumption that cognition requires explicit symbolic rules.

The biological revolution and its limits

Advances in neuroimaging (fMRI, PET), genetics (twin studies, adoption studies, genome-wide association studies), and pharmacology have added biological explanation to psychology. The biological perspective does not replace cognitive or behavioral accounts; it grounds them in physical mechanisms. Discoveries such as long-term potentiation as a cellular mechanism of learning, the role of the amygdala in fear conditioning, and the involvement of the prefrontal cortex in executive function have advanced the field.

Biological reductionism carries risks. Knowing that a brain region is active during a task does not explain the task. The reverse inference problem — inferring a mental state from brain activity — is statistically precarious. Reducing complex social phenomena (prejudice, political orientation, romantic attachment) to "brain circuits" can obscure the social and cultural processes that are equally causal.

Toward integration

Contemporary psychology is moving toward integration: biopsychosocial models that treat biological, psychological, and social factors as interacting levels of explanation. A complete account of depression requires understanding genetic vulnerability (biological), maladaptive thought patterns (cognitive), learned helplessness (behavioral), unconscious conflicts (psychodynamic), and social isolation (social) — not choosing among them.

Psychology as liberation and oppression Master

Psychology has been used both to advance human freedom and to justify its violation. This dual history is not a footnote — it is central to understanding the field's relationship with political power.

Psychology in the service of liberation

During the civil rights era, psychologists produced research that directly challenged racist policies. Kenneth Clark and Mamie Phipps Clark's doll studies (1939-1947) demonstrated that Black children in segregated schools developed negative self-perceptions, systematically preferring white dolls over Black dolls and attributing more positive qualities to the white dolls. This research was cited in the Supreme Court's unanimous Brown v. Board of Education decision (1954), which declared school segregation unconstitutional. The Clarks demonstrated that separate was inherently unequal because segregation damaged children's self-concept and sense of worth.

Gordon Allport's The Nature of Prejudice (1954) synthesized research on stereotyping, scapegoating, and intergroup conflict, providing a scientific framework for understanding and combating discrimination. Allport's contact hypothesis proposed that under appropriate conditions — equal status, common goals, intergroup cooperation, and institutional support — intergroup contact reduces prejudice.

Muzafer Sherif's Robbers Cave experiment (1954) demonstrated intergroup dynamics in a controlled setting. Twenty-two eleven-year-old boys at a summer camp were randomly assigned to two groups. After establishing group identities, the researchers created intergroup conflict through competitive activities, then reduced it through superordinate goals requiring cooperation. The study showed that hostility arises from group competition, not individual malice, and that cooperation toward shared objectives can reduce it.

Social psychology has continued producing research relevant to justice and equity. Claude Steele and Joshua Aronson's stereotype threat research (1995) demonstrated that awareness of negative stereotypes about one's group impairs performance on diagnostic tasks, providing a mechanism through which social prejudice becomes self-fulfilling. Anthony Greenwald and Mahzarin Banaji's implicit bias research (1995) developed methods to measure unconscious prejudices that people may not consciously endorse but that shape behavior in hiring, policing, healthcare, and education.

Psychology in the service of oppression

The same discipline that contributed to desegregation also provided intellectual cover for some of the twentieth century's worst abuses.

Eugenics and "scientific racism." In the early twentieth century, psychologists played prominent roles in the eugenics movement. Lewis Terman, who developed the Stanford-Binet IQ test, wrote that racial mixing between "high-grade" and "low-grade" races should be prevented. Henry Goddard's 1912 study of the Kallikak family claimed to demonstrate the hereditary transmission of "feeble-mindedness." The study was later discredited for methodological flaws and biased interpretation, but not before it influenced policy. Psychologists' work was used to justify forced sterilization laws (over 60,000 Americans were sterilized under eugenic statutes), immigration restriction (the 1924 Immigration Act explicitly cited IQ testing data), and racial segregation.

The IQ testing movement illustrates how ostensibly objective science can encode social bias. Early IQ tests contained items requiring specific cultural knowledge — identifying brands of cars, knowing the rules of tennis — that systematically advantaged white, middle-class, American-born test-takers. Results were then used to "prove" the intellectual inferiority of immigrants and Black Americans, creating a circular process: culturally biased tests produced biased results that were cited as evidence for the biases built into the tests.

CIA-funded torture and behavior modification research. From the 1950s through the 1970s, the CIA funded psychological research on interrogation, sensory deprivation, and behavior modification under programs codenamed MKULTRA, ARTICHOKE, and BLUEBIRD. Psychologist Donald Ewen Cameron, funded by a CIA front organization, conducted experiments on psychiatric patients at McGill University without their informed consent. His techniques — drug-induced sleep for weeks at a time, massive electroconvulsive therapy, repetitive taped messages played during drug-induced coma — were intended to "depattern" personality and rebuild it. Patients suffered permanent harm.

These programs were not conducted by rogue operators working against the wishes of the psychological establishment. They were funded by the U.S. government, sometimes routed through legitimate research institutions, and some of the researchers involved held positions of prominence in the field. The American Psychological Association's involvement in post-9/11 "enhanced interrogation" programs — with APA members consulting on interrogation techniques used at Guantanamo Bay and CIA black sites — represents a more recent chapter in this history. An independent review commissioned by the APA in 2015 found that the organization had colluded with the Department of Defense to enable psychologist involvement in interrogations that violated professional ethics.

Understanding this history is essential for maintaining vigilance against similar failures. Psychological expertise is a tool, and like any tool, it can be used toward liberation or oppression depending on who wields it and toward what end.

Connections Master

Statistical inference [02.XX]. The statistical tools used throughout psychology — hypothesis testing, confidence intervals, effect sizes, regression — are applications of the probability theory and statistical inference developed in the mathematical statistics strand. Psychology is one of the heaviest consumers of statistical methods in the empirical sciences.
Epistemology: knowledge, justification, and truth 20.01.01. Psychology's reliance on empirical evidence raises philosophical questions about what counts as knowledge. The demarcation problem — what separates science from non-science — applies directly to debates about whether psychoanalysis, evolutionary psychology, or social neuroscience qualify as scientific. Popper's falsificationism, Kuhn's paradigms, and Lakatos's research programmes all describe episodes in psychology's history.
Philosophy of science: demarcation, falsification, and paradigms 20.08.01. The replication crisis is a live case study in the philosophy of science. Kuhn's framework of normal science, anomaly, and revolutionary change maps onto psychology's current self-examination. The role of values in science — which questions get asked, whose experiences count as data, who controls funding — is visible in psychology because the subject matter is politically charged.
Consciousness: the hard problem, qualia, and the mind-body debate 20.06.01. The mind-body problem frames the entire history of psychology. Is consciousness a brain process? Can subjective experience be fully explained by neural activity? These questions connect the biological perspective in psychology to the philosophy of mind.
Research ethics and bioethics [20.02.XX]. The Belmont Report's three principles — respect for persons, beneficence, justice — apply to all human-subjects research. Psychology's ethical violations (Milgram, Zimbardo, MKULTRA) are standard case studies in bioethics. The Tuskegee Syphilis Study illustrates the intersection of research ethics with racial injustice.
Behavioral economics 23.01.25. The integration of psychology into economics — prospect theory (Kahneman and Tversky), bounded rationality (Herbert Simon), nudge theory (Thaler and Sunstein) — created behavioral economics, which challenges the assumption of rational self-interest in classical economics. Kahneman received the 2002 Nobel Prize in Economics for this work.
Neuroscience: brain and behaviour 29.02.01. The next unit in the psychology sequence covers neuroscience: neurons, synapses, brain anatomy, and the biological mechanisms underlying behavior. The research methods covered here — experimental design, statistical analysis, ethical review — are applied throughout neuroscience.
Social psychology 29.07.01. The study of how people think about, influence, and relate to one another relies on the experimental and correlational methods introduced here. Milgram's obedience studies and Sherif's group conflict studies are foundational to social psychology.
Cross-cultural and indigenous psychology 29.12.01. The WEIRD critique, introduced in this unit, is developed in full in the cross-cultural unit. Research methods must be adapted for cross-cultural validity: instruments validated in one culture may not measure the same construct in another, and emic (culture-specific) approaches complement etic (universal) approaches.

Historical and philosophical context Master

The founding of scientific psychology

Psychology's emergence as an independent discipline is conventionally dated to 1879, when Wilhelm Wundt established the first laboratory dedicated to psychological research at the University of Leipzig. Wundt used introspection — trained observers reporting on their conscious experiences under controlled conditions — to study the basic elements of mental life: sensations, feelings, and images. Wundt's approach, called structuralism by his student Edward Titchener, aimed to identify the building blocks of consciousness the way chemists had identified elements.

Wundt's introspection method had serious limitations. Different laboratories produced different results, and the method was inherently subjective — two observers could report different experiences of the same stimulus. Despite these problems, Wundt's lasting contribution was establishing the principle that mental processes could be studied empirically, not just philosophically.

William James, teaching at Harvard, took a different approach. His The Principles of Psychology (1890) — still readable and insightful more than a century later — treated mental life as a continuous stream rather than a collection of discrete elements. James's functionalism asked what mental processes do: how they help organisms adapt to their environments. This pragmatic orientation influenced American psychology's emphasis on practical application and set the stage for behaviorism.

Psychoanalysis and the discovery of the unconscious

Sigmund Freud, a Viennese neurologist, developed psychoanalysis in the 1890s and refined it over four decades. Freud's model of the mind divided it into conscious, preconscious, and unconscious regions, with the unconscious containing repressed wishes, traumatic memories, and instinctual drives (primarily sexual and aggressive). Psychoanalytic therapy aimed to bring unconscious material into awareness through free association, dream analysis, and interpretation of transference patterns.

Freud's influence on Western thought extends far beyond clinical psychology. Concepts like the unconscious, defense mechanisms (repression, projection, displacement, sublimation), and the importance of early childhood experience have entered common language. Freud's specific claims, however, have been criticized on multiple grounds: many (penis envy, the Oedipus complex) lack empirical support; the theory's unfalsifiability makes it scientifically problematic; and Freud's case studies involved a small number of upper-middle-class Viennese patients, a population that is not representative of humanity.

Carl Jung, Alfred Adler, and later object-relations theorists (Melanie Klein, D. W. Winnicott, Heinz Kohut) developed alternative psychodynamic frameworks, but all share the core assumption that unconscious processes shape conscious experience in ways the individual does not fully recognize.

Behaviorism's rise and fall

John B. Watson's 1913 manifesto "Psychology as the Behaviorist Views It" rejected introspection and mental states as subjects of scientific study. Watson argued that psychology should study only observable behavior and that all complex behavior is learned through conditioning. His famous (and ethically reprehensible) Little Albert study (1920) with Rosalie Rayner demonstrated that fear could be conditioned in an infant.

B. F. Skinner extended behaviorism with operant conditioning — the principle that behavior is shaped by its consequences. Skinner's analysis of reinforcement schedules (fixed-ratio, variable-ratio, fixed-interval, variable-interval), shaping, and stimulus control generated precise, quantitative predictions about behavior. His inventions (the operant conditioning chamber, the cumulative recorder, programmed instruction, the air crib) demonstrated behaviorism's practical applications.

Behaviorism dominated American psychology from the 1920s through the 1950s. During this period, research on classical conditioning (Pavlov, Watson, Rayner), operant conditioning (Skinner, Ferster), and applied behavior analysis flourished.

The cognitive revolution

The cognitive revolution was catalyzed by multiple converging developments in the 1950s. Chomsky's critique of Skinner's account of language acquisition demonstrated that behaviorism could not explain the creativity and systematicity of human language. George Miller's work on the limits of working memory established that cognitive processing has measurable capacity constraints. The invention of the digital computer provided a new metaphor for the mind: an information-processing system that transforms inputs into outputs through intermediate representations.

Ulric Neisser's Cognitive Psychology (1967) consolidated the field and gave it its name. The information-processing framework generated research programs in memory (encoding specificity, levels of processing, working memory models), attention (selective attention, cocktail party effect, attentional blink), language (parsing, ambiguity resolution, discourse comprehension), decision-making (heuristics and biases, prospect theory), and problem-solving (insight, functional fixedness, expertise).

The cognitive revolution also produced one of the most influential research programs in the behavioral sciences: the heuristics-and-biases tradition founded by Amos Tversky and Daniel Kahneman. Their work on availability, representativeness, anchoring, framing, and prospect theory demonstrated systematic deviations from rational decision-making, earning Kahneman the 2002 Nobel Prize in Economics (Tversky died in 1996 and could not receive it).

The biological revolution

Starting in the 1970s and accelerating with the development of functional magnetic resonance imaging (fMRI) in the early 1990s, the biological revolution added neuroscience to psychology's toolkit. Brain imaging allows researchers to measure which brain regions are metabolically active during specific cognitive tasks. Genetic methods (twin studies, adoption studies, genome-wide association studies) estimate the heritability of psychological traits and identify specific genetic variants associated with mental disorders. Psychopharmacology tests the effects of drugs on behavior and mood, providing evidence for neurochemical models of mental illness.

The biological perspective has generated genuine breakthroughs: the discovery of long-term potentiation as a cellular mechanism of learning (Bliss and Lomo, 1973), the role of the amygdala in fear conditioning (LeDoux, 1990s), and the involvement of the prefrontal cortex in executive function and self-regulation. But it has also generated critiques: that brain imaging is correlational rather than causal (most fMRI studies show associations, not causes), that genetic explanations of complex behavior are incomplete (heritability estimates do not identify specific causal pathways), and that the biological perspective can medicalize social problems.

The WEIRD critique and its implications

Henrich, Heine, and Norenzayan's 2010 paper "The Weirdest People in the World?" consolidated a growing body of evidence that psychology's participant pool is extraordinarily narrow. By their analysis, 96% of psychological studies come from countries with just 12% of the world's population. Within those countries, the majority of participants are university students — a group that differs from the global population along dozens of dimensions.

The WEIRD critique does not claim that findings based on WEIRD samples are necessarily wrong. It claims that they are ungeneralizable until tested in other populations. Many findings presented as human universals have not been tested outside WEIRD contexts, and when they are tested, they sometimes change substantially. The Muller-Lyer visual illusion, long assumed to be a universal feature of human visual perception, is significantly weaker or absent in cultures whose visual environments lack the right angles and straight lines that characterize Western architecture. Fairness norms measured with the ultimatum game vary dramatically across cultures — some populations offer nearly nothing and accept nearly nothing, while Americans offer roughly 50% and reject offers below 20%.

The WEIRD bias intersects with other structural biases in the field. Most psychology research is conducted by WEIRD researchers studying WEIRD participants in WEIRD institutions, creating a feedback loop that reinforces culturally specific assumptions. The questions that get asked, the methods that seem natural, and the theories that feel plausible are all shaped by the cultural position of the researchers.

Ethics: from Nuremberg to the present

The history of research ethics in psychology is a history of violations followed by reforms, each reform prompted by a scandal.

The Nuremberg Code (1947) was drafted in response to Nazi medical experiments on concentration camp prisoners. Its core principles — voluntary consent, the right to withdraw, a favorable risk-benefit ratio — remain foundational. The Declaration of Helsinki (1964, revised multiple times) extended ethical principles to clinical research.

The Tuskegee Syphilis Study (1932-1972) — in which the U.S. Public Health Service observed the natural progression of untreated syphilis in 399 Black men without their informed consent and actively withheld treatment even after penicillin became the standard cure — was a medical study, but its revelation shocked the behavioral sciences as well and led directly to the Belmont Report and the modern IRB system.

The Belmont Report (1979) established three principles. Respect for persons: individuals have the right to make autonomous decisions about their participation; participants with diminished autonomy (children, prisoners, cognitively impaired individuals) require additional protections. Beneficence: researchers must minimize harm and maximize benefits; the risk-benefit ratio must be favorable. Justice: the burdens and benefits of research should be distributed fairly; vulnerable populations should not bear disproportionate research risks without receiving proportional benefits.

Milgram's obedience studies and Zimbardo's prison experiment were conducted before the Belmont Report formalized these principles. Both would face far greater scrutiny from modern IRBs. Milgram's use of deception and induced emotional distress, and Zimbardo's dual role as researcher and prison superintendent, are now considered serious ethical violations by contemporary standards. The question of whether the knowledge gained justified the harm remains actively debated within the field.

Bibliography Master

Primary literature and major secondary sources:

Wundt, W., Grundzuge der physiologischen Psychologie (Engelmann, 1874). The first systematic textbook of experimental psychology.
James, W., The Principles of Psychology (Henry Holt, 1890). The foundational American psychology text; still in print and still insightful.
Freud, S., Die Traumdeutung (The Interpretation of Dreams) (Deuticke, 1900). The founding work of psychoanalysis.
Watson, J. B., "Psychology as the Behaviorist Views It," Psychological Review 20(2) (1913), 158-177. The behaviorist manifesto.
Pavlov, I. P., Conditioned Reflexes (Oxford University Press, 1927). Classical conditioning.
Skinner, B. F., The Behavior of Organisms: An Experimental Analysis (Appleton-Century, 1938). The founding text of operant conditioning.
Clark, K. B. and Clark, M. P., "Racial Identification and Preference in Negro Children," in Newcomb and Hartley (eds.), Readings in Social Psychology (Holt, 1947), 169-178. Cited in Brown v. Board of Education.
Allport, G. W., The Nature of Prejudice (Addison-Wesley, 1954). Foundational text on stereotyping and intergroup conflict.
Sherif, M., Harvey, O. J., White, B. J., Hood, W. R., and Sherif, C. W., Intergroup Conflict and Cooperation: The Robbers Cave Experiment (University of Oklahoma Book Exchange, 1961).
Milgram, S., "Behavioral Study of Obedience," Journal of Abnormal and Social Psychology 67(4) (1963), 371-378.
Chomsky, N., "Review of Verbal Behavior by B. F. Skinner," Language 35(1) (1959), 26-58.
Miller, G. A., "The Magical Number Seven, Plus or Minus Two," Psychological Review 63(2) (1956), 81-97.
Atkinson, R. C. and Shiffrin, R. M., "Human Memory: A Proposed System and Its Control Processes," in Spence and Spence (eds.), The Psychology of Learning and Motivation Vol. 8 (Academic Press, 1968), 47-89.
Neisser, U., Cognitive Psychology (Appleton-Century-Crofts, 1967). Named the field.
Kahneman, D. and Tversky, A., "Prospect Theory: An Analysis of Decision under Risk," Econometrica 47(2) (1979), 263-291.
Damasio, A. R., Descartes' Error: Emotion, Reason, and the Human Brain (Putnam, 1994).
Steele, C. M. and Aronson, J., "Stereotype Threat and the Intellectual Test Performance of African Americans," Journal of Personality and Social Psychology 69(5) (1995), 797-811.
Greenwald, A. G. and Banaji, M. R., "Implicit Social Cognition: Attitudes, Self-Esteem, and Stereotypes," Psychological Review 102(1) (1995), 4-27.
National Commission for the Protection of Human Subjects, The Belmont Report (DHEW, 1979).
Rosenthal, R., "The File Drawer Problem and Tolerance for Null Results," Psychological Bulletin 86(3) (1979), 638-641.
Seligman, M. E. P., "On the Generality of the Laws of Learning," Psychological Review 77(5) (1970), 406-418.
Henrich, J., Heine, S. J., and Norenzayan, A., "The Weirdest People in the World?," Behavioral and Brain Sciences 33(2-3) (2010), 61-83.
Open Science Collaboration, "Estimating the Reproducibility of Psychological Science," Science 349(6251) (2015), aac4716.
Simmons, J. P., Nelson, L. D., and Simonsohn, U., "False-Positive Psychology," Psychological Science 22(11) (2011), 1359-1366.
Wasserstein, R. L. and Lazar, N. A., "The ASA Statement on p-Values: Context, Process, and Purpose," American Statistician 70(2) (2016), 129-133.
Shedler, J., "The Efficacy of Psychodynamic Psychotherapy," American Psychologist 65(2) (2010), 98-109.
Ioannidis, J. P. A., "Why Most Published Research Findings Are False," PLoS Medicine 2(8) (2005), e124.
Zimbardo, P. G., The Lucifer Effect (Random House, 2007).

Prerequisites

none — this is a leaf unit

Tier anchors

beginner: Gray, Psychology (8e), Ch. 1-2; Gazzaniga and Heatherton, Psychological Science, Ch. 1-2
intermediate: Myers and Hansen, Experimental Psychology (7e); Cohen et al., Applied Multiple Regression/Correlation Analysis
master: primary sources: Wundt 1874, James 1890, Freud 1900, Watson 1913, Skinner 1938, Milgram 1963, Open Science Collaboration 2015, Henrich et al. 2010; secondary: Hergenhahn, Henley, Fancher

References

Gray, Psychology (8e, Worth Publishers, 2018) · Ch. 1-2 · source being verified
Myers and Hansen, Experimental Psychology (7e, Cengage, 2022) · Ch. 1-4 · source being verified
Henrich, Heine, and Norenzayan, 'The weirdest people in the world?' Behavioral and Brain Sciences 33(2-3), 2010 · pp. 61-83 · source being verified
Open Science Collaboration, 'Estimating the reproducibility of psychological science' Science 349(6251), 2015 · aac4716 · source being verified
Milgram, S., 'Behavioral study of obedience' Journal of Abnormal and Social Psychology 67(4), 1963 · pp. 371-378 · source being verified
Zimbardo, The Lucifer Effect (Random House, 2007) · Ch. 2-5

Estimated time

beginner: 25m
intermediate: 50m
master: 75m