Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Uncertainty: Confidence Intervals

Uncertainty: Confidence Intervals

The confidence interval is the range of values around the mean that contains the true mean. It is expressed as a probability percentage. The interpretation of a 95% confidence interval, for instance, is that the statistician is 95% confident that the true mean falls within the interval. The upper and lower limits of this range are known as confidence limits. The confidence limits for the true mean are estimated from the sample's mean, the standard deviation, and the statistical factor...

Confidence Intervals

Confidence Intervals

An unbiased point estimate is often insufficient to predict a population estimate, such as population mean or population proportion. In this scenario, a confidence interval is used. A confidence interval is an estimate similar to a sample proportion. However, unlike the point estimate which is a single value, the confidence interval contains a range of values. These values have lower and upper limits, known as confidence limits, and can be designated as L1 and L2, respectively.
A...

Interpretation of Confidence Intervals

Interpretation of Confidence Intervals

A confidence interval is a better estimate of the population than a point estimate, as it uses a range of values from a sample instead of a single value.
Confidence intervals have confidence coefficients that are crucial for their interpretation. The most common confidence coefficients are 0.90, 0.95, and 0.99, which can be written as percentages–90%, 95%, and 99%, respectively.
Suppose a person calculates a confidence interval with a confidence coefficient of 0.95. In that case, they can...

Uncertainty in Measurement: Accuracy and Precision

Uncertainty in Measurement: Accuracy and Precision

Scientists typically make repeated measurements of a quantity to ensure the quality of their findings and to evaluate both the precision and the accuracy of their results. Measurements are said to be precise if they yield very similar results when repeated in the same manner. A measurement is considered accurate if it yields a result that is very close to the true or the accepted value. Precise values agree with each other; accurate values agree with a true value.

Confidence Coefficient

Confidence Coefficient

The confidence coefficient is also known as the confidence level or degree of confidence. It is the percent expression for the probability, 1-α, that the confidence interval contains the true population parameter assuming that the confidence interval is obtained after sufficient unbiased sampling; for example, if the CL = 90%, then in 90 out of 100 samples the interval estimate will enclose the true population parameter. Here α is the area under the curve, distributed equally under...

Uncertainty: Overview

Uncertainty: Overview

In analytical chemistry, we often perform repetitive measurements to detect and minimize inaccuracies caused by both determinate and indeterminate errors. Despite the cares we take, the presence of random errors means that repeated measurements almost never have exactly the same magnitude. The collective difference between these measurements - observed values - and the estimated or expected value is called uncertainty. Uncertainty is conventionally written after the estimated or expected value.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Thinking outside the box means thinking outside the search engine.

Memory & cognition·2025

Same author

Side effects may include: Consequence neglect in generating solutions.

PloS one·2025

Same author

The latent scope bias: Robust and replicable.

Cognition·2024

Same author

Parental rights or parental wrongs: Parents' metacognitive knowledge of the factors that influence their school choice decisions.

PloS one·2024

Same author

Creating a Bot-tleneck for malicious AI: Psychological methods for bot detection.

Behavior research methods·2024

Same author

Visual numerosity perception shows no advantage in real-world scenes compared to artificial displays.

Cognition·2022

Same journal

Music enhances associative generalization: Evidence from a memory integration task.

Memory & cognition·2026

Same journal

Video, text, and memory: An emotional verbal overshadowing effect.

Memory & cognition·2026

Same journal

Limited protective effects of multilingualism against age-related cognitive decline.

Memory & cognition·2026

Same journal

Validation of illustrated texts: Can pictures raise awareness of inconsistencies?

Memory & cognition·2026

Same journal

4I remember (and forget) your happy smiling face: Directed forgetting of emotionally expressive faces of in-group and out-group members.

Memory & cognition·2026

Same journal

Identity in the spotlight: Matching faces without overlapping features.

Memory & cognition·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 14, 2025

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Quantifying uncert-AI-nty: Testing the accuracy of LLMs' confidence judgments.

Trent N Cash^1,2, Daniel M Oppenheimer^3,4, Sara Christie⁴

¹Department of Social and Decision Sciences, Carnegie Mellon University, 5000 Forbes Ave., 224 Porter Hall, Pittsburgh, PA, 15213, USA. trentncash@gmail.com.

Memory & Cognition

|July 22, 2025

Summary

This summary is machine-generated.

Large Language Model (LLM) chatbots show strong metacognitive accuracy in confidence judgments, comparable to humans. However, LLMs, particularly ChatGPT and Gemini, struggle to adjust confidence based on past performance, revealing a key limitation.

Keywords:

Artificial intelligence Confidence judgments Large Language Models Metacognition Metacognitive accuracy

More Related Videos

Assessment and Communication for People with Disorders of Consciousness

Assessment and Communication for People with Disorders of Consciousness

Published on: August 1, 2017

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

Published on: September 19, 2012

Related Experiment Videos

Last Updated: Sep 14, 2025

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Assessment and Communication for People with Disorders of Consciousness

Assessment and Communication for People with Disorders of Consciousness

Published on: August 1, 2017

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

Published on: September 19, 2012

Area of Science:

Artificial Intelligence
Cognitive Science
Human-Computer Interaction

Background:

Large Language Models (LLMs) like ChatGPT and Gemini are transforming information access.
Metacognitive confidence judgments are crucial for human uncertainty quantification.
The accuracy of LLM confidence judgments remains largely unexplored.

Purpose of the Study:

To investigate the capability of LLMs to quantify uncertainty through confidence judgments.
To compare the metacognitive accuracy of LLMs and humans across various tasks.
To identify similarities and differences in confidence judgment strategies between LLMs and humans.

Main Methods:

Four LLMs (ChatGPT, Bard/Gemini, Sonnet, Haiku) and human participants evaluated their confidence in predictions and answers.
Studies covered aleatory uncertainty (NFL, Oscar predictions) and epistemic uncertainty (Pictionary, Trivia, university life questions).
Absolute and relative accuracy of confidence judgments were analyzed.

Main Results:

LLMs demonstrated comparable, and sometimes superior, absolute and relative metacognitive accuracy to humans.
Both LLMs and humans exhibited overconfidence in their judgments.
LLMs, especially ChatGPT and Gemini, often failed to adjust confidence based on prior performance, unlike humans.

Conclusions:

LLMs possess significant capabilities in metacognitive confidence judgments, approaching human levels of accuracy.
Overconfidence is a shared trait between LLMs and humans.
A key limitation for LLMs is their reduced ability to dynamically adjust confidence based on experience, unlike humans.