Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Uncertainty: Confidence Intervals00:54

Uncertainty: Confidence Intervals

4.8K
The confidence interval is the range of values around the mean that contains the true mean. It is expressed as a probability percentage. The interpretation of a 95% confidence interval, for instance, is that the statistician is 95% confident that the true mean falls within the interval. The upper and lower limits of this range are known as confidence limits. The confidence limits for the true mean are estimated from the sample's mean, the standard deviation, and the statistical factor...
4.8K
Confidence Intervals01:21

Confidence Intervals

7.1K
An unbiased point estimate is often insufficient to predict a population estimate, such as population mean or population proportion. In this scenario, a confidence interval is used. A confidence interval is an estimate similar to a  sample proportion. However, unlike the point estimate which is a single value, the confidence interval  contains a range of values. These values have lower and upper limits, known as confidence limits, and can be designated as L1 and L2, respectively.
A...
7.1K
Interpretation of Confidence Intervals01:19

Interpretation of Confidence Intervals

6.6K
A confidence interval is a better estimate of the population than a point estimate, as it uses a range of values from a sample instead of a single value.
Confidence intervals have confidence coefficients that are crucial for their interpretation. The most common confidence coefficients are 0.90, 0.95, and 0.99, which can be written as percentages–90%, 95%, and 99%, respectively.
Suppose a person calculates a confidence interval with a confidence coefficient of 0.95. In that case, they can...
6.6K
Uncertainty in Measurement: Accuracy and Precision03:37

Uncertainty in Measurement: Accuracy and Precision

81.5K
Scientists typically make repeated measurements of a quantity to ensure the quality of their findings and to evaluate both the precision and the accuracy of their results. Measurements are said to be precise if they yield very similar results when repeated in the same manner. A measurement is considered accurate if it yields a result that is very close to the true or the accepted value. Precise values agree with each other; accurate values agree with a true value. 
81.5K
Confidence Coefficient01:24

Confidence Coefficient

7.9K
The confidence coefficient is also known as the confidence level or degree of confidence. It is the percent expression for the probability, 1-α, that the confidence interval contains the true population parameter assuming that the confidence interval is obtained after sufficient unbiased sampling; for example, if the CL = 90%, then in 90 out of 100 samples the interval estimate will enclose the true population parameter. Here α is the area under the curve, distributed equally under...
7.9K
Uncertainty: Overview00:59

Uncertainty: Overview

986
In analytical chemistry, we often perform repetitive measurements to detect and minimize inaccuracies caused by both determinate and indeterminate errors. Despite the cares we take, the presence of random errors means that repeated measurements almost never have exactly the same magnitude. The collective difference between these measurements - observed values - and the estimated or expected value is called uncertainty. Uncertainty is conventionally written after the estimated or expected value.
986

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Thinking outside the box means thinking outside the search engine.

Memory & cognition·2025
Same author

Side effects may include: Consequence neglect in generating solutions.

PloS one·2025
Same author

The latent scope bias: Robust and replicable.

Cognition·2024
Same author

Parental rights or parental wrongs: Parents' metacognitive knowledge of the factors that influence their school choice decisions.

PloS one·2024
Same author

Creating a Bot-tleneck for malicious AI: Psychological methods for bot detection.

Behavior research methods·2024
Same author

Visual numerosity perception shows no advantage in real-world scenes compared to artificial displays.

Cognition·2022
Same journal

Music enhances associative generalization: Evidence from a memory integration task.

Memory & cognition·2026
Same journal

Video, text, and memory: An emotional verbal overshadowing effect.

Memory & cognition·2026
Same journal

Limited protective effects of multilingualism against age-related cognitive decline.

Memory & cognition·2026
Same journal

Validation of illustrated texts: Can pictures raise awareness of inconsistencies?

Memory & cognition·2026
Same journal

4I remember (and forget) your happy smiling face: Directed forgetting of emotionally expressive faces of in-group and out-group members.

Memory & cognition·2026
Same journal

Identity in the spotlight: Matching faces without overlapping features.

Memory & cognition·2026
See all related articles

Related Experiment Video

Updated: Sep 14, 2025

An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.2K

Quantifying uncert-AI-nty: Testing the accuracy of LLMs' confidence judgments.

Trent N Cash1,2, Daniel M Oppenheimer3,4, Sara Christie4

  • 1Department of Social and Decision Sciences, Carnegie Mellon University, 5000 Forbes Ave., 224 Porter Hall, Pittsburgh, PA, 15213, USA. trentncash@gmail.com.

Memory & Cognition
|July 22, 2025
PubMed
Summary
This summary is machine-generated.

Large Language Model (LLM) chatbots show strong metacognitive accuracy in confidence judgments, comparable to humans. However, LLMs, particularly ChatGPT and Gemini, struggle to adjust confidence based on past performance, revealing a key limitation.

Keywords:
Artificial intelligenceConfidence judgmentsLarge Language ModelsMetacognitionMetacognitive accuracy

More Related Videos

Assessment and Communication for People with Disorders of Consciousness
07:37

Assessment and Communication for People with Disorders of Consciousness

Published on: August 1, 2017

9.2K
Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods
13:04

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

Published on: September 19, 2012

12.2K

Related Experiment Videos

Last Updated: Sep 14, 2025

An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.2K
Assessment and Communication for People with Disorders of Consciousness
07:37

Assessment and Communication for People with Disorders of Consciousness

Published on: August 1, 2017

9.2K
Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods
13:04

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

Published on: September 19, 2012

12.2K

Area of Science:

  • Artificial Intelligence
  • Cognitive Science
  • Human-Computer Interaction

Background:

  • Large Language Models (LLMs) like ChatGPT and Gemini are transforming information access.
  • Metacognitive confidence judgments are crucial for human uncertainty quantification.
  • The accuracy of LLM confidence judgments remains largely unexplored.

Purpose of the Study:

  • To investigate the capability of LLMs to quantify uncertainty through confidence judgments.
  • To compare the metacognitive accuracy of LLMs and humans across various tasks.
  • To identify similarities and differences in confidence judgment strategies between LLMs and humans.

Main Methods:

  • Four LLMs (ChatGPT, Bard/Gemini, Sonnet, Haiku) and human participants evaluated their confidence in predictions and answers.
  • Studies covered aleatory uncertainty (NFL, Oscar predictions) and epistemic uncertainty (Pictionary, Trivia, university life questions).
  • Absolute and relative accuracy of confidence judgments were analyzed.

Main Results:

  • LLMs demonstrated comparable, and sometimes superior, absolute and relative metacognitive accuracy to humans.
  • Both LLMs and humans exhibited overconfidence in their judgments.
  • LLMs, especially ChatGPT and Gemini, often failed to adjust confidence based on prior performance, unlike humans.

Conclusions:

  • LLMs possess significant capabilities in metacognitive confidence judgments, approaching human levels of accuracy.
  • Overconfidence is a shared trait between LLMs and humans.
  • A key limitation for LLMs is their reduced ability to dynamically adjust confidence based on experience, unlike humans.