Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Improving Translational Accuracy02:07

Improving Translational Accuracy

3.5K
3.5K
Improving Translational Accuracy02:07

Improving Translational Accuracy

14.1K
Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
14.1K
Multiple Comparison Tests01:13

Multiple Comparison Tests

4.4K
Multiple comparison test, abbreviated as MCT, is a post hoc analysis generally performed after comparing multiple samples with one or more tests. An MCT will help identify a significantly different sample among multiple samples or a factor among multiple factors.
It would be easy to compare two samples using a significance alpha level of 0.05. In other words, there is only one sample pair to be compared. However, it would be difficult to identify a significantly different sample if the number...
4.4K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

RadGazeGen: radiomics and gaze-guided chest X-ray generation using diffusion models.

Journal of medical imaging (Bellingham, Wash.)·2026
Same author

AI in science recruitment: friend or foe? Join our free webinar.

Nature·2026
Same author

Science fiction: nine lab-life novels for your holiday reading.

Nature·2026
Same author

Small RNA sequencing datasets from caryopses and flag leaves of multiple rice genotypes exposed to high nighttime temperature stress.

Data in brief·2026
Same author

<i>GAZE2REPORT</i>: RADIOLOGY REPORT GENERATION VIA VISUAL-GAZE PROMPT TUNING OF LLMS.

ArXiv·2026
Same author

Radiomics-based Differentiation of Recurrent Brain Metastases from Treatment Effects: A Multi-Institutional Comparative Study with Advanced Imaging.

Radiology. Imaging cancer·2026
Same journal

Evaluating the Impact of Embolization on Outcomes in Iliopsoas Hematomas: A Multicenter Retrospective Propensity-matched Study.

Academic radiology·2026
Same journal

Comparison of Iterative Metal Artifact Reduction Presets In Ultra-high-resolution Photon-counting CT Angiography of Patients with Total Knee Endoprosthesis.

Academic radiology·2026
Same journal

Deep Learning for Opportunistic Vertebral Fracture Detection on Routine Thoraco-abdominal Computed Tomography: A Systematic Review and Hierarchical Summary Receiver Operating Characteristic Meta-analysis of Patient-level Diagnostic Test Accuracy.

Academic radiology·2026
Same journal

"Where are They Now?": A Single Institution's 10-Years Experience with an Integrated Nuclear Radiology Fellowship.

Academic radiology·2026
Same journal

Dual-layer Spectral Detector CT Quantitative Parameters for Predicting Tumor Budding Grade and Prognosis in Stage Ⅱ Colorectal Cancer.

Academic radiology·2026
Same journal

Promotion from Associate Professor to Full Professor Should Not Be Monolithic: A National Bibliometric Study by Radiology Subspecialty.

Academic radiology·2026
See all related articles

Related Experiment Video

Updated: Jan 16, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

1.0K

Performance Comparison of Cutting-Edge Large Language Models on the ACR In-Training Examination: An Update for 2025.

Austin Young1, Rinald Paloka2, Ariba Islam3

  • 1Northwell Health, Mather Hospital, Port Jefferson, New York (A.Y.).

Academic Radiology
|September 25, 2025
PubMed
Summary
This summary is machine-generated.

Newer large language models (LLMs) show improved performance on radiology board exams. GPT-o1, GPT-4o, and GPT-o3 led in accuracy, suggesting LLMs can aid resident learning with minimal data contamination concerns.

Keywords:
Artificial intelligenceLLMLarge Language ModelsMedical educationRadiology education

Related Experiment Videos

Last Updated: Jan 16, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

1.0K

Area of Science:

  • Artificial Intelligence in Medical Education
  • Radiology Board Examination Preparation
  • Large Language Model (LLM) Performance Evaluation

Background:

  • Prior research evaluated large language model (LLM) performance on radiology board-style assessments.
  • This study extends previous work by assessing newer LLMs on the ACR diagnostic radiology in-training examination (DXIT).

Purpose of the Study:

  • To evaluate the performance of cutting-edge LLMs (GPT-4o, GPT-o1, GPT-o3, Claude, Gemini, Grok) on standardized DXIT questions.
  • To compare model accuracy on text-based versus image-based questions to assess multi-modal reasoning.
  • To investigate the impact of potential data contamination by comparing performance on original versus revised questions.

Main Methods:

  • Seven LLMs were evaluated using 106 publicly available DXIT questions.
  • Models were prompted using a standardized instruction set to simulate resident responses.
  • Unadjusted and logic-adjusted accuracy were calculated, with subgroup analysis for text vs. image-based questions. Revised questions were used to test for data contamination.

Main Results:

  • GPT-o1 (71.7%), GPT-4o (69.8%), and GPT-o3 (68.9%) achieved the highest unadjusted accuracy.
  • Similar trends were observed for logic-adjusted accuracy, with GPT-o1, GPT-4o, and GPT-o3 outperforming other models.
  • GPT-4o performed significantly better on text-based questions; performance on revised questions was comparable to original questions, suggesting limited data contamination.

Conclusions:

  • Modern LLMs, particularly from OpenAI, demonstrate strong and improving performance on radiology board-style assessments.
  • Comparable performance on revised prompts indicates a limited role for data contamination.
  • LLMs show significant potential to support radiology resident education through personalized feedback and practice.