Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Improving Translational Accuracy

Improving Translational Accuracy

Improving Translational Accuracy

Improving Translational Accuracy

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...

Multiple Comparison Tests

Multiple Comparison Tests

Multiple comparison test, abbreviated as MCT, is a post hoc analysis generally performed after comparing multiple samples with one or more tests. An MCT will help identify a significantly different sample among multiple samples or a factor among multiple factors.
It would be easy to compare two samples using a significance alpha level of 0.05. In other words, there is only one sample pair to be compared. However, it would be difficult to identify a significantly different sample if the number...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

RadGazeGen: radiomics and gaze-guided chest X-ray generation using diffusion models.

Journal of medical imaging (Bellingham, Wash.)·2026

Same author

AI in science recruitment: friend or foe? Join our free webinar.

Nature·2026

Same author

Science fiction: nine lab-life novels for your holiday reading.

Nature·2026

Same author

Small RNA sequencing datasets from caryopses and flag leaves of multiple rice genotypes exposed to high nighttime temperature stress.

Data in brief·2026

Same author

<i>GAZE2REPORT</i>: RADIOLOGY REPORT GENERATION VIA VISUAL-GAZE PROMPT TUNING OF LLMS.

ArXiv·2026

Same author

Radiomics-based Differentiation of Recurrent Brain Metastases from Treatment Effects: A Multi-Institutional Comparative Study with Advanced Imaging.

Radiology. Imaging cancer·2026

Same journal

Evaluating the Impact of Embolization on Outcomes in Iliopsoas Hematomas: A Multicenter Retrospective Propensity-matched Study.

Academic radiology·2026

Same journal

Comparison of Iterative Metal Artifact Reduction Presets In Ultra-high-resolution Photon-counting CT Angiography of Patients with Total Knee Endoprosthesis.

Academic radiology·2026

Same journal

Deep Learning for Opportunistic Vertebral Fracture Detection on Routine Thoraco-abdominal Computed Tomography: A Systematic Review and Hierarchical Summary Receiver Operating Characteristic Meta-analysis of Patient-level Diagnostic Test Accuracy.

Academic radiology·2026

Same journal

"Where are They Now?": A Single Institution's 10-Years Experience with an Integrated Nuclear Radiology Fellowship.

Academic radiology·2026

Same journal

Dual-layer Spectral Detector CT Quantitative Parameters for Predicting Tumor Budding Grade and Prognosis in Stage Ⅱ Colorectal Cancer.

Academic radiology·2026

Same journal

Promotion from Associate Professor to Full Professor Should Not Be Monolithic: A National Bibliometric Study by Radiology Subspecialty.

Academic radiology·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 16, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Performance Comparison of Cutting-Edge Large Language Models on the ACR In-Training Examination: An Update for 2025.

Austin Young¹, Rinald Paloka², Ariba Islam³

¹Northwell Health, Mather Hospital, Port Jefferson, New York (A.Y.).

Academic Radiology

|September 25, 2025

Summary

This summary is machine-generated.

Newer large language models (LLMs) show improved performance on radiology board exams. GPT-o1, GPT-4o, and GPT-o3 led in accuracy, suggesting LLMs can aid resident learning with minimal data contamination concerns.

Keywords:

Artificial intelligence LLM Large Language Models Medical education Radiology education

Related Experiment Videos

Last Updated: Jan 16, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Area of Science:

Artificial Intelligence in Medical Education
Radiology Board Examination Preparation
Large Language Model (LLM) Performance Evaluation

Background:

Prior research evaluated large language model (LLM) performance on radiology board-style assessments.
This study extends previous work by assessing newer LLMs on the ACR diagnostic radiology in-training examination (DXIT).

Purpose of the Study:

To evaluate the performance of cutting-edge LLMs (GPT-4o, GPT-o1, GPT-o3, Claude, Gemini, Grok) on standardized DXIT questions.
To compare model accuracy on text-based versus image-based questions to assess multi-modal reasoning.
To investigate the impact of potential data contamination by comparing performance on original versus revised questions.

Main Methods:

Seven LLMs were evaluated using 106 publicly available DXIT questions.
Models were prompted using a standardized instruction set to simulate resident responses.
Unadjusted and logic-adjusted accuracy were calculated, with subgroup analysis for text vs. image-based questions. Revised questions were used to test for data contamination.

Main Results:

GPT-o1 (71.7%), GPT-4o (69.8%), and GPT-o3 (68.9%) achieved the highest unadjusted accuracy.
Similar trends were observed for logic-adjusted accuracy, with GPT-o1, GPT-4o, and GPT-o3 outperforming other models.
GPT-4o performed significantly better on text-based questions; performance on revised questions was comparable to original questions, suggesting limited data contamination.

Conclusions:

Modern LLMs, particularly from OpenAI, demonstrate strong and improving performance on radiology board-style assessments.
Comparable performance on revised prompts indicates a limited role for data contamination.
LLMs show significant potential to support radiology resident education through personalized feedback and practice.