Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Stereotype Content Model02:16

Stereotype Content Model

13.1K
The Stereotype Content Model (SCM) was first proposed by Susan Fiske and her colleagues (Fiske, Cuddy, Glick & Xu, 2002; see also Fiske, 2012 and Fiske, 2017). The SCM specifies that when someone encounters a new group, they will stereotype them based on two metrics: warmth—or that group’s perceived intent, and how likely they are to provide help or inflict harm—and competence—or their ability to carry out that objective. Depending on the warmth-competence...
13.1K
Modeling in Therapy01:26

Modeling in Therapy

823
Modeling, a key technique in therapy, uses observational learning to help clients acquire and practice new skills by watching therapists demonstrate desired behaviors. This approach, rooted in Albert Bandura's concept of vicarious learning, plays a significant role in therapeutic interventions for various psychological conditions, including social anxiety, ADHD, and depression.
Participant Modeling
Participant modeling involves therapists demonstrating calm and effective behaviors in...
823
Self-Evaluation Maintenance Model01:29

Self-Evaluation Maintenance Model

423
The Self-Evaluation Maintenance (SEM) model offers a psychological framework to understand how individuals’ self-esteem is influenced by the achievements of others, particularly those with whom they share close personal bonds. The SEM model operates when personal rather than social identity guides individuals. Central to this model is the notion that individuals have an inherent desire to preserve a favorable self-image, which is continuously shaped by interpersonal comparisons and...
423
Non-Verbal Cues01:29

Non-Verbal Cues

774
Non-verbal communication extends beyond gestures and facial expressions to include vocal elements known as paralanguage. Paralanguage consists of non-verbal vocal cues such as pitch, loudness, speech rate, pauses, and non-verbal vocalizations like laughter, sighs, and moans. These elements not only accompany speech but also provide critical emotional and contextual information.The Role of Paralanguage in CommunicationParalanguage adds depth to spoken language by conveying emotions and...
774

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Sinovenous outflow restriction predisposes lateral sinus dural arteriovenous fistulas to obliteration after endovascular therapy.

Journal of neurointerventional surgeryĀ·2026
Same author

Nikaidoh Procedure for Double Outlet Right Ventricle With Transposition of the Great Arteries and Superior-Inferior Ventricles.

Annals of thoracic surgery short reportsĀ·2026
Same author

Multicenter validation and randomized crossover reader evaluation of deep learning-assisted tri-sequence three-dimensional MRI segmentation for hypopharyngeal tumor.

Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and OncologyĀ·2026
Same author

Could Pretreatment Computed Tomography Imaging Accurately Predict the Pathological Diagnosis of Lymph Node Involvement in Thymic Epithelial Tumors?

World journal of oncologyĀ·2026
Same author

Reply: Delayed Diagnosis of Congenital Aortopathies: Multimodal Imaging Makes the Difference.

JACC. Case reportsĀ·2026
Same author

An Adaptive Deep Learning Framework for Multi-Label Chest X-Ray Diagnosis Using a Hybrid CNN-Transformer Architecture and Class-Wise Ensemble Fusion.

Diagnostics (Basel, Switzerland)Ā·2026
Same journal

Deep Learning for Opportunistic Vertebral Fracture Detection on Routine Thoraco-abdominal Computed Tomography: A Systematic Review and Hierarchical Summary Receiver Operating Characteristic Meta-analysis of Patient-level Diagnostic Test Accuracy.

Academic radiologyĀ·2026
Same journal

"Where are They Now?": A Single Institution's 10-Years Experience with an Integrated Nuclear Radiology Fellowship.

Academic radiologyĀ·2026
Same journal

Dual-layer Spectral Detector CT Quantitative Parameters for Predicting Tumor Budding Grade and Prognosis in Stage ā…” Colorectal Cancer.

Academic radiologyĀ·2026
Same journal

Promotion from Associate Professor to Full Professor Should Not Be Monolithic: A National Bibliometric Study by Radiology Subspecialty.

Academic radiologyĀ·2026
Same journal

Technological Lag of Digitization for Patient Image Transfer.

Academic radiologyĀ·2026
Same journal

Prognostic Value of Coronary Sinus Flow and Aortic Pressure Gradient Quantified by 4D Flow CMR in AMI.

Academic radiologyĀ·2026
See all related articles

Related Experiment Video

Updated: May 1, 2026

Guidelines and Experience Using Imaging Biomarker Explorer IBEX for Radiomics
10:17

Guidelines and Experience Using Imaging Biomarker Explorer IBEX for Radiomics

Published on: January 8, 2018

13.3K

Evaluating Large Language Models for Enhancing Radiology Specialty Examination: A Comparative Study with Human

Hao-Yun Liu1, Shyh-Jye Chen2, Weichung Wang3

  • 1Department of Medical Imaging, National Taiwan University Hospital Hsin-Chu Branch, Br. No.25, Lane 442, Sec.1, Jingguo Rd., Hsinchu City 300, Taiwan ROCĀ (H.Y.L.); National Taiwan University College of Medicine, No.1 Jen Ai Road Section 1, Taipei 100, Taiwan ROCĀ (H.Y.L.).

Academic Radiology
|May 28, 2025
PubMed
Summary
This summary is machine-generated.

Advanced large language models (LLMs) show promise in radiology exams, with GPT-4o and o1-preview outperforming human examinees. These AI tools can help assess question difficulty and standardize medical examinations.

Keywords:
AccuracyEvaluationGPTLLMsRadiology specialty examination

More Related Videos

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application
05:56

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Published on: April 14, 2023

2.7K
Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

693

Related Experiment Videos

Last Updated: May 1, 2026

Guidelines and Experience Using Imaging Biomarker Explorer IBEX for Radiomics
10:17

Guidelines and Experience Using Imaging Biomarker Explorer IBEX for Radiomics

Published on: January 8, 2018

13.3K
Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application
05:56

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Published on: April 14, 2023

2.7K
Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

693

Area of Science:

  • Medical Education
  • Artificial Intelligence
  • Radiology

Background:

  • Traditional radiology exams face challenges in accuracy and relevance due to expanding medical knowledge.
  • Large language models (LLMs) show potential for enhancing medical education and assessment.
  • Developing objective frameworks for radiology exam design is crucial.

Purpose of the Study:

  • To evaluate LLM performance in radiology specialty examinations.
  • To explore LLMs' role in assessing question difficulty and reasoning processes.
  • To investigate the potential for LLMs in developing a more objective and efficient exam design framework.

Main Methods:

  • Compared performance of three LLMs (GPT-4o, o1-preview, GPT-3.5-turbo-1106) against human examinees in a radiology exam.
  • Utilized zero-shot conditions for LLM evaluation.
  • Assessed accuracy, discrimination index, and point-biserial correlation to analyze question difficulty and reasoning.

Main Results:

  • GPT-4o (88.0%) and o1-preview (90.9%) significantly outperformed human examinees (76.3%) in accuracy.
  • GPT-3.5-turbo-1106 showed lower accuracy (50.2%) with greater performance variability.
  • Advanced LLMs accurately identified differentiating questions, mirroring human reasoning patterns and assessing exam difficulty.

Conclusions:

  • Advanced LLMs like GPT-4o and o1-preview demonstrate strong problem-solving capabilities in radiology exams.
  • LLMs can serve as valuable tools for assessing exam question difficulty.
  • LLMs can assist in the standardized development and evaluation of medical examinations.