Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Improving Translational Accuracy

Improving Translational Accuracy

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...

Improving Translational Accuracy

Improving Translational Accuracy

Detection of Gross Error: The Q Test

Detection of Gross Error: The Q Test

When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...

Variability: Analysis

Variability: Analysis

Measures of variability are statistical metrics that reveal the dispersion pattern within a dataset. They are pivotal in biostatistics, providing insights into the heterogeneity within health and biological data. Variability signifies the degree to which data points diverge from one another, helping researchers understand the potential range of values and associated uncertainty within the data.
The range is a simple measure of variability, indicating the difference between the highest and...

Random Error

Random Error

Random or indeterminate errors originate from various uncontrollable variables, such as variations in environmental conditions, instrument imperfections, or the inherent variability of the phenomena being measured. Usually, these errors cannot be predicted, estimated, or characterized because their direction and magnitude often vary in magnitude and direction even during consecutive measurements. As a result, they are difficult to eliminate. However, the aggregate effect of these errors can be...

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Poly(glutamic acid-<i>block</i>-tyrosine) peptides designed for gastrointestinal drug adsorption.

Journal of materials chemistry. B·2026

Same author

Frontier Language Models and Optical Character Recognition Preprocessing Against Invisible Text Injection in AI Peer Review.

JAMA network open·2026

Same author

A new type of table one: showing instead of telling.

Journal of clinical epidemiology·2026

Same author

Overlooked and Undernourished: A Case Report of Scurvy Linked to Food Insecurity.

Journal of education & teaching in emergency medicine·2026

Same author

Clinical Predictors of Observation Unit Failure in Patients with Acute Heart Failure Exacerbation: A Quality Improvement Initiative.

American journal of medical quality : the official journal of the American College of Medical Quality·2026

Same author

Crossover Evaluation of Two Ambient AI Scribe Tools in the Emergency Department.

Applied clinical informatics·2026

Same journal

Pleural Toxocariasis Presenting as Eosinophilic Pleural Effusion: A Case Report.

Cureus·2026

Same journal

Left Clavicular Pain Following Splenic Rupture After Colonoscopy: A Variant of Kehr's Sign?

Cureus·2026

Same journal

Severe Polyhydramnios Associated With Antenatal Bartter Syndrome.

Cureus·2026

Same journal

Focal Takotsubo Syndrome Mimicking a Distal Coronary Pathology: A Case Report.

Cureus·2026

Same journal

Metachronous Colorectal Carcinomas and Pancreatic Metastasis in Clinically Suspected Lynch Syndrome: An 18-Year Oncologic Course.

Cureus·2026

Same journal

Regional Blocks in the Era of the Opioid Crisis: Evaluating Their Opioid-Sparing Effect.

Cureus·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 11, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Piloting Temperature-Driven Variability in Emergency Diagnostic Accuracy Using a Leading Large Language Model.

Philip C Jarrett¹, Jared Hill¹, Marshall Howell¹

¹Emergency Medicine, University of Texas Southwestern Medical Center, Dallas, USA.

|November 14, 2025

Summary

This summary is machine-generated.

Lowering the temperature parameter in large language models (LLMs) like GPT-4o improves diagnostic accuracy in emergency medicine cases. Lower temperatures enhance reliability and consistency for clinical AI applications.

Keywords:

artificial intelligence in medicine clinical decision support clinical informatics diagnostic accuracy emergency medicine

More Related Videos

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

Related Experiment Videos

Last Updated: Jan 11, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

Area of Science:

Artificial Intelligence
Medical Diagnostics
Clinical Decision Support

Background:

Large language models (LLMs) utilize a 'temperature' parameter to control output randomness.
This parameter's influence on clinical diagnostic accuracy, particularly in emergency medicine, is not well understood.
Understanding temperature's impact is crucial for reliable AI in healthcare.

Purpose of the Study:

To evaluate the effect of the temperature parameter on GPT-4o's diagnostic accuracy for emergency medicine cases.
To assess how temperature influences diagnostic divergence and consistency across multiple iterations.
To determine optimal temperature settings for reliable clinical diagnostic tasks using LLMs.

Main Methods:

A simulation-based study used four challenging emergency medicine cases.
GPT-4o generated 10,000 differential diagnoses across five temperature settings (0.0-1.0) and with/without physical exam findings.
Diagnostic accuracy was benchmarked against gold standards; diagnostic divergence was measured by unique diagnoses generated.

Main Results:

GPT-4o achieved 100% leading diagnosis accuracy at temperature 0.0, decreasing to 89.4% at temperature 1.0.
Higher temperatures significantly increased diagnostic inaccuracy and divergence (483% increase from 0.0 to 1.0).
Case sensitivity to temperature varied, with some diagnoses heavily impacted by physical exam data exclusion.

Conclusions:

Increasing the temperature parameter in GPT-4o systematically reduces diagnostic accuracy and consistency in emergency medicine scenarios.
Lower temperature settings (e.g., 0.0) are associated with higher accuracy and reliability, making them potentially preferable for clinical use.
Transparent reporting of temperature settings is vital for reproducibility in clinical AI research.