Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Improving Translational Accuracy02:07

Improving Translational Accuracy

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
Improving Translational Accuracy02:07

Improving Translational Accuracy

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
Accuracy and Precision01:52

Accuracy and Precision

Scientists typically make repeated measurements of a quantity to ensure the quality of their findings and to evaluate both the precision and the accuracy of their results. Measurements are said to be precise if they yield very similar results when repeated in the same manner. A measurement is considered accurate if it yields a result that is very close to the true or the accepted value. Precise values agree with each other; accurate values agree with a true value.  Highly accurate measurements...
Accuracy and Precision01:52

Accuracy and Precision

Scientists typically make repeated measurements of a quantity to ensure the quality of their findings and to evaluate both the precision and the accuracy of their results. Measurements are said to be precise if they yield very similar results when repeated in the same manner. A measurement is considered accurate if it yields a result that is very close to the true or the accepted value. Precise values agree with each other; accurate values agree with a true value.  Highly accurate measurements...
Accuracy, limits, and approximation01:28

Accuracy, limits, and approximation

Accuracy, limits, and approximations are common in many fields, especially in engineering calculations. These concepts are imperative for ensuring that a given value is as close as possible to its true value.
Accuracy is defined as the closeness of the measured value to the true or actual value. In engineering mechanics, repeated measurements are taken during theoretical or experimental analyses to ensure that the result is precise and accurate.
The accuracy of any solution is based on the...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Obesity and cardiovascular risk in down syndrome: Challenges and updated management.

Current problems in cardiology·2026
Same author

Enhancing randomized controlled trials through smartwatch-guided participant matching for infectious disease outcomes.

Scientific reports·2026
Same author

Women's health initiative strong and healthy silent atrial fibrillation recording study: Rationale, study design, and baseline data.

American heart journal·2026
Same author

Distinct mechanistic features of atrial fibrillation in hypertrophic cardiomyopathy.

Heart rhythm·2026
Same author

Ultrasound-guided parasternal tunneling during intermuscular subcutaneous implantable cardioverter-defibrillator implantation is associated with lower shock impedance and more favorable implant geometry.

Heart rhythm·2026
Same author

Alternative Lead ECG Placements.

Annals of noninvasive electrocardiology : the official journal of the International Society for Holter and Noninvasive Electrocardiology, Inc·2026
Same journal

Real-World Effectiveness and Tolerability of Sacubitril/Valsartan in Octogenarian Patients With Heart Failure: Results From the PARACHUTER Study.

The American journal of cardiology·2026
Same journal

ECG-Guided Conduction Pathways as a Lever to Shorten Post-TAVI Hospitalization.

The American journal of cardiology·2026
Same journal

Cystatin-C versus creatinine and kidney function in heart failure with preserved ejection fraction: a SOGALDI-PEF analysis.

The American journal of cardiology·2026
Same journal

Balloon-expandable versus Self-expanding Valves in Patients with Small Aortic Annuli Undergoing Transcatheter Aortic Valve Replacement.

The American journal of cardiology·2026
Same journal

Drug-Coated Balloons versus Drug-Eluting Stents following Coronary Atherectomy in Severely Calcified Lesions: A Systematic Review and Meta-Analysis.

The American journal of cardiology·2026
Same journal

Prehospital Statin Therapy and Outcomes in ST-Elevation Myocardial Infarction Undergoing Primary Percutaneous Coronary Intervention.

The American journal of cardiology·2026
See all related articles

Related Experiment Video

Updated: May 11, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Confidence-Accuracy Alignment in Cardiology Knowledge: Comparing Medical-Specific and General-Purpose Large Language

Ali Zidan1, Mousa El-Sururi2, Avi Belbase3

  • 1Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada.

The American Journal of Cardiology
|May 9, 2026
PubMed
Summary
This summary is machine-generated.

General-purpose large language models (LLMs) outperformed a medical-specific LLM in cardiology knowledge assessment. Despite high confidence, all models showed poor calibration, limiting their clinical reliability without supervision.

Keywords:
Artificial knowledgeCardiologyLarge-language models

Related Experiment Videos

Last Updated: May 11, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Area of Science:

  • Artificial Intelligence in Medicine
  • Clinical Decision Support Systems
  • Medical Education Technology

Background:

  • Large language models (LLMs) are increasingly used in healthcare, but their clinical reliability hinges on accuracy and confidence calibration.
  • General-purpose LLMs show promise in medical tasks, while medical-specific LLMs aim for domain alignment, but their comparative clinical reliability is unclear.
  • Cardiology, with its intricate case-based reasoning, presents a high-stakes environment to evaluate LLM performance.

Purpose of the Study:

  • To compare the diagnostic accuracy, confidence calibration, uncertainty, and fidelity of general-purpose and medical-specific LLMs on a cardiology knowledge benchmark.
  • To assess the impact of domain specialization versus broad training on LLM performance in a complex medical field.

Main Methods:

  • Evaluated 365 text-based cardiology questions from the ACCSAP, excluding image-dependent items.
  • Compared ChatGPT-4o and Gemini 2.5 Pro (general-purpose) against MedGemma 27B (medical-specific LLM).
  • Utilized standardized prompts for stepwise reasoning, answer selection, confidence, uncertainty, and fidelity, followed by statistical analysis.

Main Results:

  • General-purpose LLMs demonstrated higher accuracy: Gemini (87%), ChatGPT (85%), versus MedGemma (67%).
  • All models reported high confidence, but confidence-accuracy calibration was modest, with small differences between correct and incorrect answers.
  • ChatGPT showed the strongest confidence-accuracy correlation (r=0.80), while MedGemma exhibited higher uncertainty and lower fidelity.

Conclusions:

  • General-purpose LLMs may offer advantages in complex clinical reasoning tasks within cardiology compared to specialized models.
  • Confidence calibration remains a significant challenge for all evaluated LLMs, rendering self-reported certainty an unreliable indicator of correctness.
  • Current LLM applications in cardiology should be supportive and clinician-supervised until uncertainty estimation and calibration improve.