Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Types of Hypothesis Testing

Types of Hypothesis Testing

There are three types of hypothesis tests: right-tailed, left-tailed, and two-tailed.
When the null and alternative hypotheses are stated, it is observed that the null hypothesis is a neutral statement against which the alternative hypothesis is tested. The alternative hypothesis is a claim that instead has a certain direction. If the null hypothesis claims that p = 0.5, the alternative hypothesis would be an opposing statement to this and can be put either p > 0.5, p < 0.5, or p...

Statistical Hypothesis Testing

Statistical Hypothesis Testing

Hypothesis testing is a critical statistical procedure facilitating informed, evidence-based decisions. It begins with a hypothesis, which is a tentative explanation, or a prediction about a population parameter. This hypothesis can be either a null hypothesis (H0), indicating no effect or difference, or an alternative hypothesis (Ha), suggesting an effect or difference.
Statistical significance measures the probability that an observed result occurred by chance. If this probability, known as...

Stereotypes, Prejudice, and Discrimination

Stereotypes, Prejudice, and Discrimination

Humans are very diverse and although we share many similarities, we also have many differences. The social groups we belong to help form our identities (Tajfel, 1974). These differences may be difficult for some people to reconcile, which may lead to prejudice toward people who are different. Prejudice is a negative attitude and feeling toward an individual based solely on one’s membership in a particular social group (Allport, 1954; Brown, 2010). Prejudice is common against people who...

Hypothesis: Accept or Fail to Reject?

Hypothesis: Accept or Fail to Reject?

The outcome of any hypothesis testing leads to rejecting or not rejecting the null hypothesis. This decision is taken based on the analysis of the data, an appropriate test statistic, an appropriate confidence level, the critical values, and P-values. However, when the evidence suggests that the null hypothesis cannot be rejected, is it right to say, 'Accept' the null hypothesis?
There are two ways to indicate that the null hypothesis is not rejected. 'Accept' the null...

Accuracy and Errors in Hypothesis Testing

Accuracy and Errors in Hypothesis Testing

Hypothesis testing is a fundamental statistical tool that begins with the assumption that the null hypothesis H0 is true. During this process, two types of errors can occur: Type I and Type II. A Type I error refers to the incorrect rejection of a true null hypothesis, while a Type II error involves the failure to reject a false null hypothesis.
In hypothesis testing, the probability of making a Type I error, denoted as α, is commonly set at 0.05. This significance level indicates a 5%...

Confirmation Biases

Confirmation Biases

The confirmation bias is the tendency to focus on information that confirms our existing beliefs and ignore information that is inconsistent with our expectations. For example, if you think that your professor is not very nice, you notice all of the instances of rude behavior exhibited by the professor while ignoring the countless pleasant interactions he is involved in on a daily basis. Have you ever fallen prey to the confirmation bias, either as the source or target of such bias?

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Lactate binds and inhibits the innate immune sensor STING to promote tumor immune evasion.

Immunity·2026

Same author

Molecular Mechanism of Rice Protein Amyloid Fibrils in Modulating Gel Properties of Northern Pike (<i>Esox lucius</i>) Muscle Protein.

Foods (Basel, Switzerland)·2026

Same author

Black Phosphorus Nanosheets Penetrate the Blood-Testis Barrier and Induce Reproductive Toxicity in Male Mice.

Environment & health (Washington, D.C.)·2026

Same author

Autologous tumor lysate vaccines enhance anti-glioma immunity and prolong survival in a GL261 glioblastoma mouse model.

Vaccine·2026

Same author

On the Contextual Constraints of Counter-Expectation Marker and Its Use in Shanghai Dialect-Mandarin Bidialectals: An Experimental Investigation of papa in Shanghai Dialect.

Journal of psycholinguistic research·2026

Same author

Dual-mode spectrometric and colorimetric determination of total sulfur dioxide in wine by using point discharge microplasma.

Analytica chimica acta·2026

Same journal

Turbulent flow in a vortex separator with a directed pipe inlet.

Scientific reports·2026

Same journal

Systematic characteristic evaluation of clay-based cementitious material derived from calcium carbide residue and waste tile powder.

Scientific reports·2026

Same journal

Retraction Note: Improvement of a rapid diagnostic application of monoclonal antibodies against avian influenza H7 subtype virus using Europium nanoparticles.

Scientific reports·2026

Same journal

Applying large language models to spam detection in the Kazakh low-resource language setting.

Scientific reports·2026

Same journal

An open-source 3D printing system enabling in-situ freeze-thaw processing of hydrogels.

Scientific reports·2026

Same journal

An enhanced EfficientNet framework for automated waste classification using cosine annealing and label smoothing.

Scientific reports·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 14, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Detecting implicit biases of large language models with Bayesian hypothesis testing.

Shijing Si¹, Xiaoming Jiang^2,3, Qinliang Su^4,5

¹School of Economics and Finance, Shanghai International Studies University, Shanghai, 201620, China.

Scientific Reports

|April 11, 2025

Summary

This summary is machine-generated.

This study introduces a new hypothesis testing framework to detect social biases in large language models (LLMs). Bayes factors effectively quantify bias evidence, outperforming traditional statistical tests.

Keywords:

Bayes factor Fairness Group bias Large language models

More Related Videos

Post-Movie Subliminal Measurement PMSM, for Investigating Implicit Social Bias

Post-Movie Subliminal Measurement PMSM, for Investigating Implicit Social Bias

Published on: February 29, 2020

Defining the Role Of Language in Infants' Object Categorization with Eye-tracking Paradigms

Defining the Role Of Language in Infants' Object Categorization with Eye-tracking Paradigms

Published on: February 8, 2019

Related Experiment Videos

Last Updated: May 14, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Post-Movie Subliminal Measurement PMSM, for Investigating Implicit Social Bias

Post-Movie Subliminal Measurement PMSM, for Investigating Implicit Social Bias

Published on: February 29, 2020

Defining the Role Of Language in Infants' Object Categorization with Eye-tracking Paradigms

Defining the Role Of Language in Infants' Object Categorization with Eye-tracking Paradigms

Published on: February 8, 2019

Area of Science:

Artificial Intelligence
Natural Language Processing
Computational Social Science

Background:

Large language models (LLMs) exhibit impressive capabilities but often perpetuate societal biases from training data.
Detecting and quantifying these implicit biases in LLMs is crucial for responsible AI development.

Purpose of the Study:

To introduce a novel framework for detecting social bias in LLMs by framing it as a hypothesis testing problem.
To compare the efficacy of classical statistical tests with Bayesian inference for bias quantification.

Main Methods:

Reformulated bias detection as a hypothesis testing problem with the null hypothesis representing the absence of implicit bias.
Utilized binary-choice questions to measure social bias in various LLMs (e.g., ChatGPT, DeepSeek-V3, Llama-3.1-70B).
Integrated exact binomial tests with Bayesian inference using Bayes factors for bias detection and quantification.

Main Results:

Bayes factors demonstrate superior ability in quantifying evidence for competing hypotheses compared to the exact binomial test.
Bayes factors are robust to small sample sizes, offering more reliable bias quantification.
LLM bias behavior showed consistency across English and French versions of the CrowS-Pairs dataset, with minor variations attributed to socio-cultural contexts.

Conclusions:

The proposed hypothesis testing framework, particularly with Bayes factors, provides a robust method for detecting and quantifying social biases in LLMs.
Bayesian inference offers advantages over classical tests in distinguishing evidence of bias from evidence of no bias.
Cross-lingual consistency in bias suggests underlying patterns, though cultural nuances warrant further investigation.