33.08.01 · history-of-science / contemporary-science

Contemporary science: challenges, open science, and the future

shipped3 tiersLean: none

Anchor (Master): primary sources: Open Science Collaboration 2015, Ioannidis 2005, Wilkinson et al. 2016 (FAIR principles), Intergovernmental Panel on Climate Change reports; secondary: Mirowski, Ravetz, Funtowicz, Jasanoff

Intuition Beginner

Science in the 21st century faces a paradox. On one hand, it produces more knowledge, more quickly, than at any point in human history. Over three million scientific papers are published each year. The Human Genome Project, the detection of gravitational waves, the development of mRNA vaccines, and the imaging of black holes are among the remarkable achievements of the past two decades. On the other hand, there is growing concern that much of this output is unreliable, that the system that produces it is dysfunctional, and that the relationship between science and society is in crisis.

The replication crisis (sometimes called the reproducibility crisis) is the most visible symptom of these problems. Beginning around 2010, researchers in several fields — particularly psychology, medicine, and economics — began systematically attempting to replicate published findings and discovered that many could not be reproduced. A 2015 study by the Open Science Collaboration attempted to replicate 100 studies published in top psychology journals and found that only 36% of the replications produced statistically significant results in the same direction as the original studies. A 2016 survey by Nature found that 70% of researchers had tried and failed to reproduce another scientist's experiments.

John Ioannidis's 2005 paper "Why Most Published Research Findings Are False" argued that for most study designs and most research settings, the majority of published research findings are false. His argument was statistical: when studies are underpowered (too few participants to reliably detect effects), when effect sizes are small, when there are many possible relationships to test, and when there is flexibility in data analysis, the probability that a statistically significant result reflects a true effect can be far below 50%.

The causes of the replication crisis are complex and systemic. The "publish or perish" culture in academic science creates pressure to produce novel, statistically significant results. Journals preferentially publish positive findings (the "file drawer problem": null results are rarely submitted or accepted for publication). Statistical practices like $p$ -hacking (running many analyses and reporting only those that produce significant results) and HARKing (hypothesizing after results are known) inflate the false positive rate. Small sample sizes, combined with the pressure to publish, produce findings that are too noisy to replicate.

The open science movement has emerged as a response to these problems. Open science advocates for making research data, methods, and results freely available to all. Key practices include preregistration (specifying hypotheses and analysis plans before data collection), open data (sharing datasets so others can verify and reanalyze results), open access publishing (making papers freely available, not behind paywalls), and open-source software (sharing code so others can reproduce computational analyses).

The Reproducibility Project: Cancer Biology, launched in 2013, attempted to replicate 50 high-profile cancer biology studies and encountered difficulties even understanding what the original researchers had done. Many published papers omitted critical methodological details. Some original authors could not or would not share their data. These practical difficulties highlighted the gap between the ideal of scientific transparency and the reality of much published research.

Beyond the replication crisis, contemporary science faces several other challenges. The cost of "big science" — large-scale projects like particle accelerators, genome sequencing, and climate modeling — has grown enormously, concentrating resources in a few well-funded institutions and creating a "winner-take-all" dynamic in which most researchers compete for a shrinking pool of grant money. The increasing complexity and specialization of scientific knowledge makes it difficult for non-specialists (including other scientists) to evaluate research quality. And the relationship between science and the public has been strained by controversies over climate change, vaccines, and genetic modification, in which scientific consensus has been challenged by political and economic interests.

Climate science provides a stark illustration. The scientific consensus on human-caused climate change is overwhelming: over 97% of climate scientists agree that the Earth is warming and that human activities are the primary cause. The Intergovernmental Panel on Climate Change (IPCC) has produced increasingly urgent reports documenting the causes and consequences of climate change. Yet public acceptance of climate science varies dramatically across countries and political affiliations, and effective policy responses have been slow and inadequate. The gap between scientific knowledge and political action is one of the defining challenges of the 21st century.

The COVID-19 pandemic, beginning in late 2019, put the strengths and weaknesses of contemporary science on full display. The rapid development of mRNA vaccines (within about a year of the virus being identified) was an extraordinary scientific achievement. But the pandemic also exposed problems: the flood of preprints (non-peer-reviewed papers) made it difficult to distinguish reliable findings from speculation, the "infodemic" of misinformation eroded public trust in science, and the unequal global distribution of vaccines revealed deep inequities in the global scientific and public health infrastructure.

Visual Beginner

Challenge	Description	Open science response
Replication crisis	Many published findings cannot be reproduced	Preregistration, open data, replication studies
Publication bias	Journals prefer positive/novel results	Registered reports, null results journals
$p$ -hacking	Selective reporting of significant analyses	Preregistered analysis plans, open code
Paywall access	Most research locked behind expensive subscriptions	Open access publishing, preprint servers
Big science costs	Concentration of resources in large projects	Distributed computing, citizen science
Public trust	Controversies erode trust in scientific authority	Science communication, public engagement
Inequality	Global disparities in scientific capacity	Open educational resources, capacity building

Worked example Beginner

The concept of statistical power illustrates one of the key problems in contemporary science. Statistical power is the probability that a study will detect a true effect (if one exists). Power depends on three factors: the sample size, the effect size, and the significance level (usually $α = 0.05$ ).

Suppose a researcher is testing whether a new teaching method improves test scores compared to a traditional method. The true effect size is small — the new method improves scores by an average of 2 points on a 100-point test, with a standard deviation of 15 points. With a sample of 20 students per group and a significance level of $α = 0.05$ , the statistical power is approximately 0.17 — only a 17% chance of detecting the true effect.

This means that if the researcher runs the study 100 times, they will detect the effect only 17 times. The other 83 times, they will get a null result (no statistically significant difference). Given publication bias, those 17 "successful" studies are the ones that get published, while the 83 null studies languish in file drawers.

But there is a further problem. Among the 17 studies that do detect an effect, the estimated effect sizes will tend to be larger than the true effect size. This is because, with low power, only studies that happen (by chance) to overestimate the effect will produce significant results. This "winner's curse" means that published effect sizes are systematically inflated, and subsequent attempts to replicate the finding will tend to find smaller effects.

The solution is to increase sample sizes. With 200 students per group (a much more expensive study), power increases to about 0.80 — an 80% chance of detecting the true effect. With 350 students per group, power exceeds 0.95. But larger studies cost more money and take more time, creating a tension between scientific rigor and practical constraints that researchers must navigate.

Check your understanding Beginner

Exercise (easy, multiple choice).

What is the replication crisis?

A. The inability of scientists to agree on new theories B. The discovery that many published scientific findings cannot be reproduced C. The shortage of laboratory space for conducting experiments D. The competition between countries to publish the most papers

Hint

Consider what happens when independent researchers try to repeat a published experiment and get different results.

Answer

Option B.

The replication crisis refers to the growing body of evidence that many findings published in the scientific literature cannot be reproduced when independent researchers attempt to replicate them. This has been documented most extensively in psychology and medicine but affects many fields. The crisis has led to increased scrutiny of research practices and the development of reforms (preregistration, open data, larger samples) designed to improve the reliability of published research.

Exercise (easy, multiple choice).

What is $p$ -hacking?

A. Stealing passwords from scientific databases B. Running many statistical analyses and reporting only the significant ones C. Using supercomputers to perform statistical calculations D. Applying machine learning to scientific data

Hint

The $p$ in $p$ -hacking refers to the $p$ -value in statistical testing. What does it mean to "hack" a $p$ -value?

Answer

Option B.

$p$ -hacking (also called data dredging or specification searching) is the practice of running many different statistical analyses on a dataset and selectively reporting only those that produce statistically significant results ( $p < 0.05$ ). If you test enough hypotheses, some will be significant purely by chance, even if no real effect exists. $p$ -hacking inflates the false positive rate and is one of the major causes of the replication crisis.

Exercise (medium, short answer).

Explain what preregistration is and how it helps address the replication crisis.

Answer

Preregistration is the practice of publicly registering a research hypothesis, study design, and analysis plan before collecting data. This is typically done through platforms like the Open Science Framework (osf.io) or clinicaltrials.gov.

Preregistration addresses the replication crisis in two ways. First, it prevents $p$ -hacking by creating a public record of the planned analysis, making it difficult to change the analysis after seeing the data and pretend the changes were planned all along. Second, it prevents HARKing (hypothesizing after results are known) by requiring researchers to specify their hypotheses before data collection.

Preregistration does not prevent researchers from doing exploratory analyses — they are free to analyze their data however they like. But it makes the distinction between confirmatory (preregistered) and exploratory (post hoc) analyses transparent, allowing readers to evaluate the strength of evidence accordingly.

Formal definition Intermediate+

The false discovery rate (FDR) is the expected proportion of false positives among all positive results. John Ioannidis's 2005 argument can be formalized using Bayesian reasoning about the FDR.

Let $R$ be the ratio of "true relationships" to "no relationships" among the hypotheses tested in a field. If a field tests many hypotheses, most of which are false (e.g., searching for genetic variants associated with a disease by testing millions of variants), then $R$ is small.

The positive predictive value (PPV) — the probability that a positive finding is true — depends on the statistical power ( $1 - β$ ), the significance level ( $α$ ), and the prior odds ( $R$ ):

$PPV = \frac{R ( 1 - β )}{R ( 1 - β ) + α}$

For a field with $R = 0.1$ (only 10% of tested hypotheses are true), power of 0.20 (typical of many underpowered studies), and $α = 0.05$ :

$PPV = \frac{0.1 \times 0.20}{0.1 \times 0.20 + 0.05} = \frac{0.02}{0.07} \approx 0.29$

Under these conditions, only 29% of positive findings are true. This is the statistical core of Ioannidis's argument: under realistic assumptions about power, significance levels, and the proportion of true hypotheses, most published research findings are expected to be false.

The situation worsens with bias (such as $p$ -hacking or selective reporting), which effectively increases $α$ (the probability of a false positive). Ioannidis showed that with even modest bias, the PPV drops below 50% for most realistic research settings.

The concept of statistical power can be formalized more precisely. For a two-sample t-test comparing means $μ_{1}$ and $μ_{2}$ with common standard deviation $σ$ and sample sizes $n$ per group, the power to detect an effect size $d = (μ_{1} - μ_{2}) / σ$ at significance level $α$ is:

$1 - β = Φ (\frac{d n}{2} - z_{α /2})$

where $Φ$ is the standard normal cumulative distribution function and $z_{α /2}$ is the critical value. This formula makes explicit the dependence of power on sample size ( $n$ ), effect size ( $d$ ), and significance level ( $α$ ). For a small effect size ( $d = 0.2$ , typical in social science research) with $n = 50$ per group and $α = 0.05$ , power is approximately 0.17 — meaning the study has only a 17% chance of detecting the true effect. This is the statistical basis for the claim that much published research in underpowered studies is unreliable.

The Benjamini-Hochberg procedure provides a method for controlling the false discovery rate (FDR) when testing multiple hypotheses simultaneously. Given $m$ hypotheses with $p$ -values $p_{1} \leq p_{2} \leq \dots \leq p_{m}$ , the procedure rejects all hypotheses $H_{(1)}, \dots, H_{(k)}$ where $k$ is the largest index such that $p_{(k)} \leq \frac{k}{m} \cdot q$ for a desired FDR level $q$ . This procedure has become standard in genomics and other fields where thousands or millions of hypotheses are tested simultaneously, and it provides a more appropriate error control than the Bonferroni correction (which controls the familywise error rate but is excessively conservative for large numbers of tests).

Key theorem with proof Intermediate+

Theorem (Ioannidis's PPV bound): For a study with significance level $α$ , power $1 - β$ , and prior odds $R$ of a true relationship, the positive predictive value satisfies:

$PPV = \frac{R ( 1 - β )}{R ( 1 - β ) + α}$

Proof:

Define the events: $T$ = the hypothesis is true, $F$ = the hypothesis is false, $+$ = the test is significant (positive result).

We seek $P (T ∣ +) = P (T \cap +) / P (+)$ .

$P (T \cap +) = P (T) \cdot P (+ ∣ T) = \frac{R}{R + 1} \cdot (1 - β)$

$P (F \cap +) = P (F) \cdot P (+ ∣ F) = \frac{1}{R + 1} \cdot α$

$P (+) = P (T \cap +) + P (F \cap +) = \frac{R ( 1 - β ) + α}{R + 1}$

Therefore:

$PPV = P (T ∣ +) = \frac{R ( 1 - β ) / ( R + 1 )}{[ R ( 1 - β ) + α ] / ( R + 1 )} = \frac{R ( 1 - β )}{R ( 1 - β ) + α}$

This formula shows that PPV increases with higher power ( $1 - β$ ), higher prior odds ( $R$ ), and lower significance level ( $α$ ). When power is low, prior odds are low, and multiple comparisons inflate the effective $α$ , PPV can be very small — meaning most "discoveries" are false positives.

This result has profound implications for the practice of science. It shows that the reliability of published findings depends not just on the statistical methods used in individual studies but on the broader context: how many hypotheses are being tested, how much prior evidence supports them, and whether the research culture rewards rigor or novelty. The PPV formula also demonstrates why multiple testing corrections are essential: when many hypotheses are tested, the effective $α$ is much larger than the nominal 0.05, and the PPV drops correspondingly. The Bonferroni correction, which divides $α$ by the number of tests, provides a conservative adjustment, while the Benjamini-Hochberg procedure controls the false discovery rate less conservatively. Both methods attempt to restore the PPV to a reasonable level by accounting for the multiple comparisons problem.

Exercises Intermediate+

Exercise (medium, calculation).

A genome-wide association study (GWAS) tests 1,000,000 genetic variants for association with a disease. Suppose that 10 of these variants actually affect the disease risk. Using a significance level of $α = 0.05$ and power of 80%, how many true positives and false positives do you expect? What fraction of all positive results are true?

Answer

True hypotheses: 10. False hypotheses: 999,990.

True positives: $10 \times 0.80 = 8$

False positives: $999, 990 \times 0.05 = 49, 999.5 \approx 50, 000$

Total positives: $8 + 50, 000 = 50, 008$

Fraction of positives that are true: $8/50, 008 \approx 0.00016$ (about 0.016%).

This is why GWAS studies use much more stringent significance thresholds (typically $5 \times 1 0^{- 8}$ ) to control the false positive rate. With $α = 5 \times 1 0^{- 8}$ : false positives = $999, 990 \times 5 \times 1 0^{- 8} \approx 0.05$ . Now total positives $\approx 8.05$ , and the fraction that are true is approximately $8/8.05 \approx 99.4%$ .

Exercise (hard, essay).

Sheila Jasanoff has argued that the relationship between science and democracy requires "technologies of humility" — institutional mechanisms that acknowledge the limits of scientific knowledge and the need for public participation in decisions about technology. Evaluate this argument using two examples of technologies where scientific uncertainty intersects with public policy.

Hint

Consider technologies like AI, genetic engineering, climate intervention (geoengineering), or pandemic response. In each case, what is uncertain, who gets to decide, and what are the consequences of getting it wrong?

Answer

Example 1: Climate geoengineering (deliberate large-scale intervention in the climate system, such as injecting sulfate aerosols into the stratosphere to reflect sunlight). The scientific uncertainties are enormous: we do not know the full effects on precipitation patterns, ecosystems, or regional climates. The governance challenges are equally large: who gets to decide whether to deploy geoengineering, given that the effects would be global but unevenly distributed? Jasanoff's "technologies of humility" would require international deliberative processes that include affected communities, not just scientific experts.

Example 2: Artificial intelligence. The capabilities and risks of advanced AI are deeply uncertain. Some experts believe AI poses existential risks, while others see it as a beneficial tool. The governance challenge is that AI development is driven by private companies and military organizations that may not adequately represent public interests. "Technologies of humility" would require mechanisms for public deliberation about AI development priorities, transparency requirements for AI systems, and regulatory frameworks that can adapt to rapidly changing capabilities.

In both cases, Jasanoff's argument highlights that scientific expertise, while essential, is not sufficient for making good decisions about technology. The public has legitimate interests and perspectives that must be incorporated into governance frameworks. The challenge is designing institutions that combine scientific rigor with democratic accountability.

Exercise (medium, short answer).

Explain what Registered Reports are and how they differ from the traditional publishing model. What specific problems do they address?

Answer

Registered Reports are a publication format in which peer review occurs in two stages. In Stage 1, before data collection, researchers submit their research question, theoretical rationale, and proposed methodology for peer review. If the proposal is accepted (an "in-principle acceptance"), the journal commits to publishing the final paper regardless of the results, provided the researchers follow the approved protocol. In Stage 2, after data collection and analysis, the completed paper is reviewed to ensure it followed the approved protocol.

This addresses three specific problems. First, publication bias: since the journal commits to publishing before results are known, null results are as likely to be published as positive ones. Second, $p$ -hacking: since the analysis plan is specified and approved before data collection, post hoc analyses are explicitly flagged as exploratory. Third, HARKing: since hypotheses must be stated before data collection, researchers cannot retroactively present post hoc hypotheses as if they were planned.

Exercise (medium, short answer).

What is "Goodhart's law" and how does it apply to the use of the h-index and journal impact factor in evaluating scientists?

Answer

Goodhart's law states: "When a measure becomes a target, it ceases to be a good measure." In the context of scientific evaluation, the h-index and journal impact factor were originally designed as approximate indicators of research quality and influence. But when hiring committees, promotion boards, and funding agencies began using these metrics as primary criteria for evaluation, researchers began optimizing for the metrics themselves rather than for the underlying quality they were meant to represent.

This leads to behaviors that increase metrics without improving science: submitting papers to high-impact journals regardless of fit, citing one's own work excessively, splitting results into multiple small papers rather than comprehensive ones (salami slicing), and pursuing trendy topics likely to generate citations rather than important but unfashionable questions.

Exercise (hard, essay).

Critically evaluate the claim that "science is self-correcting." Using at least two historical examples, discuss the conditions under which scientific self-correction works effectively and the conditions under which it fails or is delayed.

Hint

Consider how long it took for specific false findings to be corrected, and what institutional or social factors accelerated or delayed correction.

Answer

Example 1 (effective self-correction): The claim that cold fusion had been achieved, announced by Fleischmann and Pons in 1989, was debunked within months as laboratories worldwide failed to replicate the results. The self-correction was rapid because the claim was dramatic, the experiments were relatively straightforward to replicate, and many independent laboratories had both the incentive and the capability to test it.

Example 2 (delayed self-correction): The claim that the MMR vaccine causes autism, based on a 1998 paper by Andrew Wakefield in The Lancet, was not retracted until 2010, despite the fact that numerous large-scale epidemiological studies had found no association. The delay was caused by several factors: the emotional power of the claim (parents of autistic children desperately wanted an explanation), media amplification that created a false sense of controversy, and the reluctance of the medical establishment to appear dismissive of parental concerns. The Wakefield case illustrates that self-correction can be delayed when findings resonate with powerful cultural or emotional narratives, when the media creates a false balance between scientific consensus and fringe views, and when the costs of the false belief fall on a different population than the researchers who propagated it.

The conditions for effective self-correction include: results that can be independently replicated, a community with the motivation and resources to conduct replications, journals willing to publish negative results, and a culture that rewards methodological rigor over novelty. When these conditions are absent, self-correction may be slow, incomplete, or may not occur at all.

Advanced results Master

The contemporary landscape of science policy is shaped by several interacting trends that are transforming how scientific knowledge is produced, validated, and used. These trends have both positive and negative dimensions, and their long-term consequences are uncertain.

The commercialization of science, documented by Philip Mirowski in Science-Mart (2011), has accelerated since the 1980s. The Bayh-Dole Act of 1980 allowed universities to patent inventions resulting from federally funded research, creating incentives for academic scientists to commercialize their discoveries. This has produced economic benefits (the biotechnology industry, university technology transfer offices) but has also created conflicts of interest, reduced the openness of scientific communication (as researchers delay publication to protect patent applications), and shifted research priorities toward commercially promising areas at the expense of basic research with no obvious commercial application.

The globalization of science has been transformative. China has become the world's largest producer of scientific papers by volume, surpassing the United States around 2018. India, Brazil, South Korea, and other countries have dramatically expanded their scientific capacity. This globalization has the potential to reduce the Western bias that has characterized modern science since its inception, and to bring new perspectives and approaches to bear on scientific questions. But it has also raised concerns about quality control (the proliferation of predatory journals and paper mills), intellectual property, and the brain drain from developing to developed countries.

The rise of artificial intelligence in science is creating new possibilities and new challenges. Machine learning algorithms can now analyze datasets too large for humans to process, identify patterns in complex data, generate hypotheses, and even design experiments. AlphaFold, developed by DeepMind, predicts protein structures with accuracy comparable to experimental methods, solving a problem that had challenged biologists for decades. But the use of AI in science also raises questions about reproducibility (neural networks are "black boxes" whose internal logic is opaque), about the potential for automation to reduce the role of human creativity and insight, and about whether AI-generated knowledge should be evaluated by the same standards as human-generated knowledge.

The citizen science movement represents another important trend. Citizen science projects recruit non-professional volunteers to collect data, classify images, transcribe historical documents, and perform other tasks that contribute to scientific research. Projects like eBird (bird observation), Galaxy Zoo (galaxy classification), and Foldit (protein folding) have produced genuine scientific discoveries. Citizen science democratizes participation in science, expands the scale of data collection beyond what professional researchers could achieve alone, and creates opportunities for public engagement with the scientific process.

The crisis of expertise in democratic societies is perhaps the most troubling trend. In many countries, public trust in scientific institutions has declined, and scientific findings on topics like climate change, vaccine safety, and genetic modification have become politically polarized. This is not simply a matter of public ignorance — it reflects genuine disagreements about values, risk tolerance, and the proper role of expert authority in democratic decision-making. The challenge for science is to maintain its commitment to evidence-based reasoning while acknowledging the legitimacy of public concerns and the limits of expert knowledge.

The concept of "post-normal science," developed by Silvio Funtowicz and Jerome Ravetz, describes situations where facts are uncertain, values are in dispute, stakes are high, and decisions are urgent — conditions that characterize many contemporary policy challenges (climate change, pandemic response, AI governance). In post-normal science, the traditional model of scientific advice (experts provide certain knowledge, policymakers make decisions) breaks down. Instead, Funtowicz and Ravetz argue for an "extended peer community" that includes stakeholders beyond the traditional scientific community in the process of assessing evidence and making decisions.

The reproducibility crisis in detail

The reproducibility crisis has unfolded differently across disciplines, revealing field-specific pathologies in research culture. In psychology, the Open Science Collaboration's 2015 replication effort found that while 97% of original studies reported statistically significant results, only 36% of replications achieved the same. Effect sizes in replications were on average half those reported in the original studies. The pattern was consistent across subfields: cognitive psychology, social psychology, and developmental psychology all showed significant replication failures.

In medicine, John Ioannidis and others have documented systematic problems in clinical research. A 2012 study by Begley and Ellis attempted to replicate 53 landmark cancer biology studies and found that only 6 could be reproduced. Many published clinical trials are underpowered, selectively report outcomes, or fail to publish negative results. The AllTrials initiative, launched in 2013, advocates for the registration and reporting of all clinical trials, arguing that the selective publication of positive results distorts the evidence base that doctors use to make treatment decisions.

In economics, the replication rate is somewhat higher than in psychology, but the field has its own distinctive problems. The "identification crisis" (also called the "credibility crisis") concerns the difficulty of establishing causal relationships from observational data. Many widely cited economic findings depend on specific statistical specifications, and small changes in the model can produce dramatically different results. The economics profession has responded by encouraging pre-registration and promoting the publication of "null results."

Structural incentives and reform movements

The structural incentives driving the replication crisis are deeply embedded in the institutional architecture of modern science. The "publish or perish" culture is not merely a cliche but a quantifiable reality. Academic hiring, tenure, promotion, and grant funding decisions are heavily influenced by publication counts, journal prestige (as measured by impact factor), and citation metrics. This creates pressure to produce a steady stream of novel, statistically significant findings — pressure that is incompatible with the careful, time-consuming work of replication and methodological rigor.

The h-index, proposed by Jorge Hirsch in 2005 as a measure of scientific productivity, has become a dominant metric for evaluating researchers despite widespread criticism. An h-index of $n$ means a researcher has $n$ publications that have each been cited at least $n$ times. While intended as a more nuanced measure than raw publication counts, the h-index rewards prolific publication in high-visibility journals and penalizes researchers who publish fewer but more rigorous studies. The Goodhart's law problem applies: when a measure becomes a target, it ceases to be a good measure.

Several reform movements have emerged in response. The Center for Open Science, founded by Brian Nosek in 2013, has developed the Open Science Framework (OSF), a platform for preregistration, data sharing, and collaboration. The TOP Guidelines (Transparency and Openness Promotion), published in 2015, provide standards for journals and funders to promote open science practices. Registered Reports, a publication format in which peer review occurs before data collection (based on the theoretical rationale and methodology rather than the results), have been adopted by over 300 journals and have been shown to dramatically increase the proportion of "null" results published.

Science funding and inequality

The distribution of scientific funding has become increasingly unequal. A 2015 study by Way, Morgan, Clauset, and Larremore found that a small number of "elite" universities concentrate the vast majority of research funding, creating a Matthew effect ("the rich get richer") in which established researchers at prestigious institutions have outsized advantages in securing grants. The success rate for NIH R01 grants has fallen from approximately 30% in 2000 to below 20% in recent years, meaning that the majority of carefully designed, peer-reviewed research proposals go unfunded.

The globalization of science has created new centers of scientific excellence outside the traditional Western strongholds. China's investment in scientific research and development has grown at approximately 10-15% per year for the past two decades, reaching over $500 billion annually. The result has been a dramatic increase in the volume and quality of Chinese scientific publications. India, South Korea, Brazil, and other countries have also expanded their scientific capacity significantly. This globalization has the potential to diversify the perspectives and questions that drive scientific inquiry, though it also raises concerns about quality control and academic integrity in rapidly expanding systems.

The FAIR principles and data stewardship

The FAIR principles (Findable, Accessible, Interoperable, Reusable), published by Wilkinson et al. in 2016, provide a framework for scientific data management that has been widely adopted by funders, journals, and institutions. The principles specify that data should be findable through rich metadata and unique identifiers, accessible through standardized protocols, interoperable through use of common vocabularies and standards, and reusable through clear licensing and documentation. The FAIR principles represent a shift from viewing data as a byproduct of research to viewing it as a valuable resource that must be actively managed and shared.

Implementing FAIR principles raises practical challenges. Many datasets are too large to share conveniently (genomic datasets, climate model outputs, particle physics data). Some data cannot be shared freely due to privacy concerns (medical records, social science survey data). The infrastructure for data storage, curation, and access requires sustained funding that is not always available. And the incentive structures of academic science — which reward publication, not data curation — work against the time and effort required for good data stewardship.

The future of science will likely be shaped by the interaction of several forces: the continued growth of computing power (enabling AI-assisted research), the increasing scale and cost of experimental facilities (concentrating research in large institutions), the tension between open and proprietary models of knowledge production, the globalization of the scientific workforce, and the evolving relationship between science and democratic governance. Whether science can maintain its commitment to truth-seeking while adapting to these pressures is one of the most important questions facing humanity.

Connections Master

Contemporary science policy connects to statistics (chapter 26) through the statistical methods used in meta-science: power analysis, false discovery rates, Bayesian inference, and meta-analysis. The replication crisis is fundamentally a statistical problem, and its resolution requires better statistical practices. The Bayesian framework provides a natural language for discussing the probability that a research finding is true given the data, as formalized in Ioannidis's PPV calculation. Meta-analysis, the statistical synthesis of results from multiple studies, has become an essential tool for establishing the reliability of findings across the literature, and the Cochrane Collaboration has set gold standards for systematic reviews in medicine.

The sociology of science (chapter 30) provides the theoretical framework for understanding the social and institutional dynamics of contemporary science. Robert Merton's norms of science (universalism, communism, disinterestedness, organized skepticism) describe the ideal behavior of the scientific community. The extent to which contemporary science departs from these norms — through conflicts of interest, proprietary research, politicization, and the publish-or-perish culture — is a central concern of the sociology of science. Pierre Bourdieu's concept of scientific capital — the accumulation of prestige, recognition, and institutional power within the scientific field — helps explain the winner-take-all dynamics of contemporary funding and hiring. The sociology of scientific knowledge (SSK), associated with the Edinburgh School, goes further by arguing that the content of scientific knowledge itself is shaped by social factors, a position that remains controversial but has generated productive research programs.

The ethics of scientific research connects to philosophy (chapter 20). The responsible conduct of research — including honesty in reporting, fair credit allocation, protection of human subjects, and responsible communication of results to the public — is an ethical domain that has been codified in institutional policies and professional codes. The Belmont Report (1979) established three principles for the protection of human subjects — respect for persons, beneficence, and justice — that remain the foundation of research ethics. The Tuskegee syphilis study (1932-1972), in which African American men with syphilis were deliberately left untreated so researchers could study the natural progression of the disease, is a permanent reminder of why ethical oversight is necessary.

The open science movement connects to computer science (chapter 25) through the technical infrastructure required for open research: data repositories, version control systems, computational notebooks, and preprint servers. The development of this infrastructure is itself a scientific and engineering challenge. GitHub and GitLab have become essential tools for sharing and version-controlling research code. Jupyter notebooks combine code, data, and narrative text in a single reproducible document. The arXiv preprint server, founded in 1991, has transformed the culture of physics, mathematics, and computer science by making research available immediately, before peer review.

Climate science connects to earth science (chapter 27) and astronomy (chapter 28). The detection and attribution of climate change draws on atmospheric physics, oceanography, paleoclimatology, and computational modeling. The IPCC assessment process is a unique example of large-scale scientific synthesis and communication, involving thousands of scientists from around the world assessing tens of thousands of studies to produce consensus reports that inform policy. The history of climate science illustrates the challenge of translating scientific knowledge into political action: the basic physics of the greenhouse effect has been understood since the 19th century (Arrhenius, 1896), yet effective policy responses remain inadequate more than a century later.

The COVID-19 pandemic connects to health and medicine (chapter 35), genetics (chapter 33.06), and the digital revolution (chapter 33.07). The rapid development of mRNA vaccines was enabled by decades of basic research on mRNA biology, advances in genomic sequencing that allowed rapid characterization of the virus, and computational tools for vaccine design. The pandemic also demonstrated the importance of international scientific collaboration and the dangers of politicizing public health. The unprecedented speed of vaccine development — from viral genome sequence (January 2020) to emergency use authorization (December 2020) — was a triumph of the scientific enterprise, even as the unequal global distribution of vaccines revealed persistent inequities.

The debate about scientific expertise in democracy connects to political philosophy, media literacy (chapter 36), and psychology (chapter 29). The psychology of motivated reasoning — the tendency to evaluate evidence in ways that confirm pre-existing beliefs — helps explain why scientific findings are often rejected when they conflict with political or economic interests. Dan Kahan's work on cultural cognition has shown that individuals with greater scientific literacy are actually more likely to polarize on controversial scientific issues, because they use their scientific knowledge to construct arguments supporting their existing positions rather than to evaluate evidence objectively. This finding has troubling implications for the "deficit model" of science communication, which assumes that public rejection of scientific findings is caused by ignorance and can be remedied by education.

The rise of AI in science connects to the computing and digital revolution history (chapter 33.07) and to contemporary discussions about automation, labor, and cognitive work. AlphaFold's solution of the protein-folding problem builds on decades of research in molecular biology (chapter 33.06), structural biology, and machine learning. The question of whether AI systems can be said to "know" something, or whether they merely process patterns, connects to the philosophy of mind and epistemology (chapter 20).

Historical & philosophical context Master

The contemporary crisis of scientific reliability has deep historical roots. The institutional structures that organize modern science — peer review, competitive grant funding, the research university, the scientific journal — all emerged in specific historical contexts and carry the assumptions and biases of those contexts.

Peer review, widely regarded as the gold standard of scientific quality control, has a surprisingly short history. The first journal to systematically use external peer review was Nature in the 1970s. Before that, editorial decisions were made by journal editors, often without external review. The adoption of peer review reflected growing concerns about quality as the volume of scientific publication increased, but it also reflected Cold War anxieties about scientific accountability. Melinda Baldwin's Making Nature (2015) documents how the institutionalization of peer review changed the culture of scientific publishing.

The grant funding system, which determines how most scientific research is financed, was largely created in the mid-20th century. In the United States, the National Science Foundation (NSF) was established in 1950 and the National Institutes of Health (NIH) expanded dramatically in the postwar period. The competitive grant system — in which researchers submit proposals that are evaluated by panels of their peers — became the dominant funding mechanism. This system has been enormously successful in supporting scientific research, but it has also created perverse incentives: researchers spend increasing amounts of time writing grant proposals (success rates for NIH grants have fallen below 20%), they are incentivized to pursue safe, incremental projects rather than risky but potentially transformative ones, and the concentration of funding among a small number of "star" researchers creates winner-take-all dynamics.

The concept of scientific autonomy — the idea that scientists should be free to pursue their research without external interference — has been a core value of the scientific community since the 17th century. But this autonomy has always been qualified: scientists depend on funding from governments and corporations, and their research has social consequences that create legitimate grounds for public oversight. The question of how to balance scientific autonomy with public accountability is a recurring theme in the history of science and is especially pressing in areas like genetic engineering, AI development, and climate intervention.

The philosopher of science Thomas Kuhn argued that normal science — the day-to-day practice of puzzle-solving within an established paradigm — is the default mode of scientific activity. Scientific revolutions, in which one paradigm is replaced by another, are rare and disruptive events. This framework helps explain some features of contemporary science: the conservatism of peer review (which tends to favor work within established paradigms), the difficulty of publishing genuinely novel ideas, and the resistance to findings that challenge established theories.

The concept of "post-truth" politics, in which objective facts are less influential in shaping public opinion than appeals to emotion and personal belief, has been applied to the relationship between science and society. The term gained prominence in 2016 (it was Oxford Dictionaries' word of the year) and has been used to describe the rejection of scientific consensus on climate change, vaccine safety, and other issues. But the concept of post-truth raises its own questions: who gets to decide what counts as a "fact," and how should societies navigate genuine uncertainty about complex scientific questions?

The philosophy of scientific self-correction

A central philosophical question underlying the contemporary crisis is whether science is genuinely self-correcting. Karl Popper's falsificationist model holds that science progresses by proposing bold conjectures and subjecting them to severe tests. On this view, false findings are eventually eliminated because they fail to replicate. The self-correcting property of science is one of its defining features, distinguishing it from dogmatic systems that resist challenge.

But the replication crisis raises questions about whether self-correction operates effectively in practice. If false findings are published in prestigious journals, cited thousands of times, and incorporated into subsequent research, the correction process may take decades and may never fully undo the damage. The "dead salmon" study by Bennett et al. (2009), which used fMRI to detect neural activity in a dead fish (a methodological joke demonstrating the problems of multiple comparisons), was published after being rejected by several journals that did not appreciate its significance. The episode illustrates that methodological critiques, even when correct, may face resistance from the community whose practices they challenge.

Thomas Kuhn's model of normal science and revolutionary science provides another lens. In normal science, researchers solve puzzles within an established paradigm, not questioning the fundamental assumptions. The replication crisis can be understood as a breakdown of normal science: the accumulated anomalies (failed replications) have reached a point where the community can no longer ignore them, and a period of methodological reform (analogous to a mini-revolution) is underway. Whether this reform will succeed in restoring confidence in the scientific enterprise or will lead to a more fundamental questioning of scientific institutions remains to be seen.

Trust, authority, and democratic governance

The relationship between scientific expertise and democratic governance is one of the most important philosophical and political questions of the 21st century. In technocratic models, experts provide authoritative knowledge that policymakers implement. In democratic models, citizens participate in decisions about how scientific knowledge should be applied. The tension between these models is inherent in modern governance.

Sheila Jasanoff's concept of "technologies of humility" argues that democratic societies need institutional mechanisms for acknowledging the limits of scientific knowledge, incorporating diverse perspectives, and making decisions under uncertainty. This contrasts with what Jasanoff calls "technologies of hubris" — the assumption that scientific expertise alone can solve complex social problems. The COVID-19 pandemic illustrated both approaches: the rapid development of vaccines was a triumph of expert knowledge, but the uneven public health response revealed the limits of expertise without public trust and democratic engagement.

The question of who counts as a scientific expert has become increasingly contested. In areas of policy-relevant science — climate change, vaccine safety, genetically modified organisms, pandemic response — there are always dissenting experts who challenge the consensus. How should democratic societies weigh competing expert claims? The "weight of evidence" approach, which considers the totality of scientific evidence rather than individual studies or experts, is the standard scientific answer. But this approach requires that citizens trust the institutions that evaluate and synthesize evidence — the IPCC, the CDC, the WHO — precisely the institutions whose authority has been challenged.

Non-Western perspectives on scientific governance

The contemporary science policy landscape is dominated by Western institutions and Western assumptions about the relationship between science, politics, and knowledge. But non-Western perspectives offer important alternatives. Indigenous knowledge systems — including Traditional Ecological Knowledge (TEK), Ayurvedic medicine, and Chinese traditional medicine — have been recognized by some scientific institutions as valuable sources of knowledge that complement Western scientific approaches.

The Nagoya Protocol (2010), an international agreement on access to genetic resources and benefit-sharing, represents an attempt to address the historical exploitation of indigenous knowledge by scientific and commercial interests. The protocol requires that researchers obtain informed consent from indigenous communities before using their traditional knowledge and that benefits derived from such knowledge be shared equitably. The implementation of this protocol has been uneven, but it represents a recognition that the governance of science must extend beyond Western institutional frameworks.

The future trajectory of science will depend on how these institutional, philosophical, and political questions are resolved. Several scenarios are conceivable. In the most optimistic scenario, the open science movement succeeds in reforming research practices, AI accelerates scientific discovery, and the globalization of science brings new perspectives and approaches to bear on humanity's most pressing problems. In a more pessimistic scenario, the commercialization of science continues unabated, public trust erodes further, and the capacity of science to address global challenges is undermined by political interference and institutional dysfunction. The actual future will likely fall somewhere between these extremes, and the outcome will depend on choices made by scientists, policymakers, and citizens in the coming decades.

Bibliography Master

Primary sources:

Ioannidis, J. P. A. "Why Most Published Research Findings Are False." PLoS Medicine 2(8), 2005: e124.
Open Science Collaboration. "Estimating the Reproducibility of Psychological Science." Science 349(6251), 2015: aac4716.
Wilkinson, M. D. et al. "The FAIR Guiding Principles for Scientific Data Management and Stewardship." Scientific Data 3, 2016: 160018.
Intergovernmental Panel on Climate Change. Climate Change 2021: The Physical Science Basis. Cambridge: Cambridge University Press, 2021.

Secondary works:

Mirowski, P. Science-Mart: Privatizing American Science. Cambridge, MA: Harvard University Press, 2011.
Nielsen, M. Reinventing Discovery: The New Era of Networked Science. Princeton: Princeton University Press, 2012.
Jasanoff, S. The Ethics of Invention: Technology and the Human Future. New York: Norton, 2016.
Funtowicz, S. and Ravetz, J. "Science for the Post-Normal Age." Futures 25(7), 1993: 739-755.
Ravetz, J. R. Scientific Knowledge and Its Social Problems. Oxford: Oxford University Press, 1971.
Oreskes, N. and Conway, E. M. Merchants of Doubt: How a Handful of Scientists Obscured the Truth on Issues from Tobacco Smoke to Global Warming. New York: Bloomsbury, 2010.
Zimring, J. C. Abductive Cognition: The Epistemological and Eco-Cognitive Dimensions of Hypothetical Reasoning. Berlin: Springer, 2009.

Prerequisites

none — this is a leaf unit

Tier anchors

beginner: McClellan and Dorn, Science and Technology in World History (3e), Ch. 20; Nielsen, Reinventing Discovery
intermediate: Zimring, Scientific Autonomy, Public Accountability, and the Rise of 'Peer Review'; Mirowski, Science-Mart
master: primary sources: Open Science Collaboration 2015, Ioannidis 2005, Wilkinson et al. 2016 (FAIR principles), Intergovernmental Panel on Climate Change reports; secondary: Mirowski, Ravetz, Funtowicz, Jasanoff

References

McClellan, J. E. III and Dorn, H., Science and Technology in World History (3e, Johns Hopkins UP, 2015) · Ch. 20 · source being verified
Nielsen, M., Reinventing Discovery (Princeton UP, 2012) · full text · source being verified
Ioannidis, J. P. A., 'Why Most Published Research Findings Are False' PLoS Medicine 2(8), 2005 · e124 · source being verified
Open Science Collaboration, 'Estimating the Reproducibility of Psychological Science' Science 349(6251), 2015 · aac4716 · source being verified
Mirowski, P., Science-Mart: Privatizing American Science (Harvard UP, 2011) · Ch. 1-5 · source being verified
Jasanoff, S., The Ethics of Invention (Norton, 2016) · Ch. 1-4

Estimated time

beginner: 25m
intermediate: 50m
master: 75m