Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Types of Hypothesis Testing01:11

Types of Hypothesis Testing

25.8K
There are three types of hypothesis tests: right-tailed, left-tailed, and two-tailed.
When the null and alternative hypotheses are stated, it is observed that the null hypothesis is a neutral statement against which the alternative hypothesis is tested. The alternative hypothesis is a claim that instead has a certain direction. If the null hypothesis claims that p = 0.5, the alternative hypothesis would be an opposing statement to this and can be put either p > 0.5, p < 0.5, or p...
25.8K
Statistical Hypothesis Testing01:16

Statistical Hypothesis Testing

1.8K
Hypothesis testing is a critical statistical procedure facilitating informed, evidence-based decisions. It begins with a hypothesis, which is a tentative explanation, or a prediction about a population parameter. This hypothesis can be either a null hypothesis (H0), indicating no effect or difference, or an alternative hypothesis (Ha), suggesting an effect or difference.
Statistical significance measures the probability that an observed result occurred by chance. If this probability, known as...
1.8K
Stereotypes, Prejudice, and Discrimination02:55

Stereotypes, Prejudice, and Discrimination

89.7K
Humans are very diverse and although we share many similarities, we also have many differences. The social groups we belong to help form our identities (Tajfel, 1974). These differences may be difficult for some people to reconcile, which may lead to prejudice toward people who are different. Prejudice is a negative attitude and feeling toward an individual based solely on one’s membership in a particular social group (Allport, 1954; Brown, 2010). Prejudice is common against people who...
89.7K
Hypothesis: Accept or Fail to Reject?01:17

Hypothesis: Accept or Fail to Reject?

27.4K
The outcome of any hypothesis testing leads to rejecting or not rejecting the null hypothesis. This decision is taken based on the analysis of the data, an appropriate test statistic, an appropriate confidence level, the critical values, and P-values. However, when the evidence suggests that the null hypothesis cannot be rejected, is it right to say, 'Accept' the null hypothesis?
There are two ways to indicate that the null hypothesis is not rejected. 'Accept' the null...
27.4K
Accuracy and Errors in Hypothesis Testing01:13

Accuracy and Errors in Hypothesis Testing

157
Hypothesis testing is a fundamental statistical tool that begins with the assumption that the null hypothesis H0 is true. During this process, two types of errors can occur: Type I and Type II. A Type I error refers to the incorrect rejection of a true null hypothesis, while a Type II error involves the failure to reject a false null hypothesis.
In hypothesis testing, the probability of making a Type I error, denoted as α, is commonly set at 0.05. This significance level indicates a 5%...
157
Confirmation Biases01:31

Confirmation Biases

5.4K
The confirmation bias is the tendency to focus on information that confirms our existing beliefs and ignore information that is inconsistent with our expectations. For example, if you think that your professor is not very nice, you notice all of the instances of rude behavior exhibited by the professor while ignoring the countless pleasant interactions he is involved in on a daily basis. Have you ever fallen prey to the confirmation bias, either as the source or target of such bias?
5.4K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Lactate binds and inhibits the innate immune sensor STING to promote tumor immune evasion.

Immunity·2026
Same author

Molecular Mechanism of Rice Protein Amyloid Fibrils in Modulating Gel Properties of Northern Pike (<i>Esox lucius</i>) Muscle Protein.

Foods (Basel, Switzerland)·2026
Same author

Black Phosphorus Nanosheets Penetrate the Blood-Testis Barrier and Induce Reproductive Toxicity in Male Mice.

Environment & health (Washington, D.C.)·2026
Same author

Autologous tumor lysate vaccines enhance anti-glioma immunity and prolong survival in a GL261 glioblastoma mouse model.

Vaccine·2026
Same author

On the Contextual Constraints of Counter-Expectation Marker and Its Use in Shanghai Dialect-Mandarin Bidialectals: An Experimental Investigation of papa in Shanghai Dialect.

Journal of psycholinguistic research·2026
Same author

Dual-mode spectrometric and colorimetric determination of total sulfur dioxide in wine by using point discharge microplasma.

Analytica chimica acta·2026
Same journal

Turbulent flow in a vortex separator with a directed pipe inlet.

Scientific reports·2026
Same journal

Systematic characteristic evaluation of clay-based cementitious material derived from calcium carbide residue and waste tile powder.

Scientific reports·2026
Same journal

Retraction Note: Improvement of a rapid diagnostic application of monoclonal antibodies against avian influenza H7 subtype virus using Europium nanoparticles.

Scientific reports·2026
Same journal

Applying large language models to spam detection in the Kazakh low-resource language setting.

Scientific reports·2026
Same journal

An open-source 3D printing system enabling in-situ freeze-thaw processing of hydrogels.

Scientific reports·2026
Same journal

An enhanced EfficientNet framework for automated waste classification using cosine annealing and label smoothing.

Scientific reports·2026
See all related articles

Related Experiment Video

Updated: May 14, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

465

Detecting implicit biases of large language models with Bayesian hypothesis testing.

Shijing Si1, Xiaoming Jiang2,3, Qinliang Su4,5

  • 1School of Economics and Finance, Shanghai International Studies University, Shanghai, 201620, China.

Scientific Reports
|April 11, 2025
PubMed
Summary
This summary is machine-generated.

This study introduces a new hypothesis testing framework to detect social biases in large language models (LLMs). Bayes factors effectively quantify bias evidence, outperforming traditional statistical tests.

Keywords:
Bayes factorFairnessGroup biasLarge language models

More Related Videos

Post-Movie Subliminal Measurement PMSM, for Investigating Implicit Social Bias
09:03

Post-Movie Subliminal Measurement PMSM, for Investigating Implicit Social Bias

Published on: February 29, 2020

5.7K
Defining the Role Of Language in Infants' Object Categorization with Eye-tracking Paradigms
07:31

Defining the Role Of Language in Infants' Object Categorization with Eye-tracking Paradigms

Published on: February 8, 2019

6.5K

Related Experiment Videos

Last Updated: May 14, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

465
Post-Movie Subliminal Measurement PMSM, for Investigating Implicit Social Bias
09:03

Post-Movie Subliminal Measurement PMSM, for Investigating Implicit Social Bias

Published on: February 29, 2020

5.7K
Defining the Role Of Language in Infants' Object Categorization with Eye-tracking Paradigms
07:31

Defining the Role Of Language in Infants' Object Categorization with Eye-tracking Paradigms

Published on: February 8, 2019

6.5K

Area of Science:

  • Artificial Intelligence
  • Natural Language Processing
  • Computational Social Science

Background:

  • Large language models (LLMs) exhibit impressive capabilities but often perpetuate societal biases from training data.
  • Detecting and quantifying these implicit biases in LLMs is crucial for responsible AI development.

Purpose of the Study:

  • To introduce a novel framework for detecting social bias in LLMs by framing it as a hypothesis testing problem.
  • To compare the efficacy of classical statistical tests with Bayesian inference for bias quantification.

Main Methods:

  • Reformulated bias detection as a hypothesis testing problem with the null hypothesis representing the absence of implicit bias.
  • Utilized binary-choice questions to measure social bias in various LLMs (e.g., ChatGPT, DeepSeek-V3, Llama-3.1-70B).
  • Integrated exact binomial tests with Bayesian inference using Bayes factors for bias detection and quantification.

Main Results:

  • Bayes factors demonstrate superior ability in quantifying evidence for competing hypotheses compared to the exact binomial test.
  • Bayes factors are robust to small sample sizes, offering more reliable bias quantification.
  • LLM bias behavior showed consistency across English and French versions of the CrowS-Pairs dataset, with minor variations attributed to socio-cultural contexts.

Conclusions:

  • The proposed hypothesis testing framework, particularly with Bayes factors, provides a robust method for detecting and quantifying social biases in LLMs.
  • Bayesian inference offers advantages over classical tests in distinguishing evidence of bias from evidence of no bias.
  • Cross-lingual consistency in bias suggests underlying patterns, though cultural nuances warrant further investigation.