Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Types of Errors: Detection and Minimization01:12

Types of Errors: Detection and Minimization

2.4K
Error is the deviation of the obtained result from the true, expected value or the estimated central value. Errors are expressed in absolute or relative terms.
Absolute error in a measurement is the numerical difference from the true or central value. Relative error is the ratio between absolute error and the true or central value, expressed as a percentage.
Errors can be classified by source, magnitude, and sign. There are three types of errors: systematic, random, and gross.
Systematic or...
2.4K
Mechanistic Models: Compartment Models in Individual and Population Analysis01:23

Mechanistic Models: Compartment Models in Individual and Population Analysis

86
Mechanistic models are utilized in individual analysis using single-source data, but imperfections arise due to data collection errors, preventing perfect prediction of observed data. The mathematical equation involves known values (Xi), observed concentrations (Ci), measurement errors (εi), model parameters (ϕj), and the related function (ƒi) for i number of values. Different least-squares metrics quantify differences between predicted and observed values. The ordinary least...
86
Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving01:29

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

101
Mechanistic models play a crucial role in algorithms for numerical problem-solving, particularly in nonlinear mixed effects modeling (NMEM). These models aim to minimize specific objective functions by evaluating various parameter estimates, leading to the development of systematic algorithms. In some cases, linearization techniques approximate the model using linear equations.
In individual population analyses, different algorithms are employed, such as Cauchy's method, which uses a...
101
Improving Translational Accuracy02:07

Improving Translational Accuracy

11.9K
Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
11.9K
Errors In Hypothesis Tests01:14

Errors In Hypothesis Tests

4.5K
When performing a hypothesis test, there are four possible outcomes depending on the actual truth (or falseness) of the null hypothesis and the decision to reject or not.
4.5K
Detection of Gross Error: The Q Test01:00

Detection of Gross Error: The Q Test

6.4K
When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...
6.4K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Development and evaluation of an ontology for non-invasive respiratory support in acute care.

PloS one·2026
Same author

Failure Modes of Time Series Interpretability Algorithms for Critical Care Applications and Potential Solutions.

AMIA ... Annual Symposium proceedings. AMIA Symposium·2026
Same author

PHEONA: An Evaluation Framework for Large Language Model-based Approaches to Computational Phenotyping.

AMIA ... Annual Symposium proceedings. AMIA Symposium·2026
Same author

SHREC: A framework for advancing next-generation computational phenotyping with large language models.

PLOS digital health·2026
Same author

Standardizing Data Elements for Implementation of ICU Liberation Bundle.

Applied clinical informatics·2026
Same author

Comparative Evaluation of USG, CT, and MRI in Acute Pancreatitis.

Journal of pharmacy & bioallied sciences·2026
Same journal

Poisoning the Genome: Targeted Backdoor Attacks on DNA Foundation Models.

ArXiv·2026
Same journal

Mechanistic mathematical model of the in vitro infection dynamics of Bunyamwera and Batai viruses including MOI-dependent shortening of the eclipse phase.

ArXiv·2026
Same journal

AI-Driven Lumped-Element Modeling of Human Respiratory System for Studying Voice Mechanics.

ArXiv·2026
Same journal

Beyond Algorithms: Conceptual Innovation in Medical Imaging AI.

ArXiv·2026
Same journal

Feynman Kac Reweighted Schrödinger Bridge Matching for Surface-Based Tau PET Harmonization.

ArXiv·2026
Same journal

Agentic Discovery of Non-Canonical Antimicrobial Peptides with AMPGAN v3.

ArXiv·2026
See all related articles

Related Experiment Video

Updated: Sep 12, 2025

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems
05:47

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

579

Lightweight Language Models are Prone to Reasoning Errors for Complex Computational Phenotyping Tasks.

Sarah Pungitore1, Shashank Yadav1, David Maughan1

  • 1College of Engineering, The University of Arizona, Tucson, AZ.

Arxiv
|August 6, 2025
PubMed
Summary
This summary is machine-generated.

Large language models (LLMs) show reasoning errors in complex computational phenotyping tasks. Enhancing LLM evaluation frameworks like PHEONA is crucial for identifying and addressing these errors in artificial intelligence development.

Keywords:
Computational PhenotypingComputer ReasoningElectronic PhenotypingGenerative Artificial IntelligenceLarge Language Models

More Related Videos

In Vivo Modeling of the Morbid Human Genome using Danio rerio
12:31

In Vivo Modeling of the Morbid Human Genome using Danio rerio

Published on: August 24, 2013

20.8K
Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment
06:48

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

9.3K

Related Experiment Videos

Last Updated: Sep 12, 2025

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems
05:47

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

579
In Vivo Modeling of the Morbid Human Genome using Danio rerio
12:31

In Vivo Modeling of the Morbid Human Genome using Danio rerio

Published on: August 24, 2013

20.8K
Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment
06:48

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

9.3K

Area of Science:

  • Biomedical Informatics
  • Artificial Intelligence

Background:

  • Computational phenotyping is essential for cohort identification but is time-intensive due to manual data review.
  • Previous studies showed limitations of LLMs in complex phenotyping tasks, particularly with multiple therapies.

Purpose of the Study:

  • To evaluate the reasoning capabilities of lightweight LLMs in computational phenotyping.
  • To enhance the PHEONA framework for assessing faulty reasoning in LLMs.

Main Methods:

  • Assessed three lightweight LLMs (DeepSeek-r1, Mistral Small, Phi-4) for phenotyping accuracy.
  • Utilized prompt modifications to identify explanation correctness and unfaithfulness errors.
  • Expanded the PHEONA framework to include faulty reasoning evaluation.

Main Results:

  • Reasoning errors, including explanation correctness and unfaithfulness, were prevalent across all tested LLMs.
  • DeepSeek demonstrated the smallest accuracy impact after prompt modifications compared to Mistral and Phi.
  • The enhanced PHEONA framework successfully identified pervasive reasoning errors.

Conclusions:

  • Reasoning errors are ubiquitous in LLM responses for complex tasks like computational phenotyping.
  • The enhanced PHEONA framework is vital for LLM evaluation, highlighting the need for improved interpretability methods.