Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Improving Translational Accuracy02:07

Improving Translational Accuracy

3.8K
3.8K
Improving Translational Accuracy02:07

Improving Translational Accuracy

15.6K
Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
15.6K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Retrospective external validation of the Mayo Delirium Prediction tool in a Swiss cohort of medical and surgical inpatients.

BMC medical informatics and decision making·2026
Same author

Current Evidence of Acetyl-L-Carnitine Use in Mood Disorders-: A Systematic Review and Meta-Analysis.

Neuropsychiatric disease and treatment·2026
Same author

A Structured Consent Framework for Research of Electroconvulsive Therapy in Advanced Dementia: Consent Process for the ECT-AD Trial.

The journal of ECT·2026
Same author

The Rise of Small Language Models in Healthcare: A Comprehensive Survey.

Computer science review·2026
Same author

Socioeconomic deprivation, rurality, and travel distance negatively impact survival in early-stage pancreatic ductal adenocarcinoma but are not associated with stage at diagnosis.

Cancer epidemiology·2026
Same author

A lifecycle governance and learning health system framework for trustworthy, generalizable, and sustainable human-ai partnership in clinical practice: Lessons from the asthma-guidance and prediction system (A-GPS).

Journal of the National Medical Association·2026
Same journal

From Chaos to Care: Personalized AI for Early Cardiac Arrhythmia Warning.

medRxiv : the preprint server for health sciences·2026
Same journal

Large distant deletion disrupts CDKN2A enhancer and predisposes to melanoma.

medRxiv : the preprint server for health sciences·2026
Same journal

Artificial Intelligence-Based Chatbots in Genetic Counseling Practice: Current Uptake, Utilization, and Perspectives.

medRxiv : the preprint server for health sciences·2026
Same journal

Longitudinal MAP-MRI-based Assessment of Tissue Microstructural Alterations in Acute mTBI.

medRxiv : the preprint server for health sciences·2026
Same journal

A class of deep intronic <i>IGHMBP2</i> variants activate a shared cryptic splice donor, enabling correction of select variants with a single antisense oligonucleotide.

medRxiv : the preprint server for health sciences·2026
Same journal

Global Socioeconomic Context and Brain Ageing in Epilepsy: an ENIGMA-Epilepsy study.

medRxiv : the preprint server for health sciences·2026
See all related articles

Related Experiment Video

Updated: Apr 11, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

1.3K

Reproducibility and Robustness of Large Language Models for Mobility Functional Status Extraction.

Xingyi Liu1, Muskan Garg1, Eunji Jeon1

  • 1Department of AI and Informatics, Mayo Clinic, Rochester, USA.

Medrxiv : the Preprint Server for Health Sciences
|April 10, 2026
PubMed
Summary
This summary is machine-generated.

Evaluating large language models (LLMs) for clinical information extraction (IE) is crucial. This study found that prompt paraphrasing and model choice significantly impact LLM stability, but self-consistency can improve reliability.

Keywords:
Clinical NLPInformation ExtractionLarge Language ModelsMobility Functional StatusReliabilityTrustworthiness

More Related Videos

Asthma Detection Research Based on Voice Signal Processing and Machine Learning
04:04

Asthma Detection Research Based on Voice Signal Processing and Machine Learning

Published on: July 22, 2025

1.3K
Deep-Learning Based Multi-Joint Synchronous Tracking for Objective Quantification of Hindlimb Locomotor Kinematics in Rats
06:17

Deep-Learning Based Multi-Joint Synchronous Tracking for Objective Quantification of Hindlimb Locomotor Kinematics in Rats

Published on: April 3, 2026

26

Related Experiment Videos

Last Updated: Apr 11, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

1.3K
Asthma Detection Research Based on Voice Signal Processing and Machine Learning
04:04

Asthma Detection Research Based on Voice Signal Processing and Machine Learning

Published on: July 22, 2025

1.3K
Deep-Learning Based Multi-Joint Synchronous Tracking for Objective Quantification of Hindlimb Locomotor Kinematics in Rats
06:17

Deep-Learning Based Multi-Joint Synchronous Tracking for Objective Quantification of Hindlimb Locomotor Kinematics in Rats

Published on: April 3, 2026

26

Area of Science:

  • Natural Language Processing
  • Medical Informatics
  • Artificial Intelligence in Healthcare

Background:

  • Clinical narrative text is rich in patient data but challenging for information extraction (IE) due to variability.
  • Large language models (LLMs) show promise for clinical IE, but their reproducibility and robustness are critical for deployment.
  • Quantifying LLM stability is essential for reliable clinical applications.

Purpose of the Study:

  • To evaluate the reproducibility and robustness of three open-weight large language models (LLMs) for clinical information extraction.
  • To assess the impact of prompt variations and model architecture on LLM performance and stability.
  • To provide recommendations for improving LLM reliability in clinical settings.

Main Methods:

  • Evaluated three distinct LLMs (Llama 3.3, Llama 4, MedGemma) on binary clinical IE tasks related to the ICF mobility framework.
  • Quantified intra-prompt reproducibility (repeated sampling) and inter-prompt robustness (paraphrased prompts).
  • Measured predictive performance (F1-score) and stability (Fleiss' Kappa), analyzing factor effects with ANOVA.

Main Results:

  • Increasing temperature generally decreased LLM agreement, with model-dependent effects.
  • Prompt paraphrasing significantly reduced LLM stability, especially for Mixture-of-Experts (MoE) models.
  • Self-consistency via majority voting improved stability (Kappa) and often maintained or improved performance (F1-score).

Conclusions:

  • LLM reliability in clinical IE is sensitive to prompt design and model architecture.
  • Self-consistency offers a practical method to enhance the stability and performance of LLMs for clinical tasks.
  • A reproducible framework is presented for evaluating and improving LLM reliability in healthcare.