Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Stereotype Content Model

Stereotype Content Model

The Stereotype Content Model (SCM) was first proposed by Susan Fiske and her colleagues (Fiske, Cuddy, Glick & Xu, 2002; see also Fiske, 2012 and Fiske, 2017). The SCM specifies that when someone encounters a new group, they will stereotype them based on two metrics: warmth—or that group’s perceived intent, and how likely they are to provide help or inflict harm—and competence—or their ability to carry out that objective. Depending on the warmth-competence...

Modeling in Therapy

Modeling in Therapy

Modeling, a key technique in therapy, uses observational learning to help clients acquire and practice new skills by watching therapists demonstrate desired behaviors. This approach, rooted in Albert Bandura's concept of vicarious learning, plays a significant role in therapeutic interventions for various psychological conditions, including social anxiety, ADHD, and depression.
Participant Modeling
Participant modeling involves therapists demonstrating calm and effective behaviors in...

Self-Evaluation Maintenance Model

Self-Evaluation Maintenance Model

The Self-Evaluation Maintenance (SEM) model offers a psychological framework to understand how individuals’ self-esteem is influenced by the achievements of others, particularly those with whom they share close personal bonds. The SEM model operates when personal rather than social identity guides individuals. Central to this model is the notion that individuals have an inherent desire to preserve a favorable self-image, which is continuously shaped by interpersonal comparisons and...

Non-Verbal Cues

Non-Verbal Cues

Non-verbal communication extends beyond gestures and facial expressions to include vocal elements known as paralanguage. Paralanguage consists of non-verbal vocal cues such as pitch, loudness, speech rate, pauses, and non-verbal vocalizations like laughter, sighs, and moans. These elements not only accompany speech but also provide critical emotional and contextual information.The Role of Paralanguage in CommunicationParalanguage adds depth to spoken language by conveying emotions and...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Sinovenous outflow restriction predisposes lateral sinus dural arteriovenous fistulas to obliteration after endovascular therapy.

Journal of neurointerventional surgery·2026

Same author

Nikaidoh Procedure for Double Outlet Right Ventricle With Transposition of the Great Arteries and Superior-Inferior Ventricles.

Annals of thoracic surgery short reports·2026

Same author

Multicenter validation and randomized crossover reader evaluation of deep learning-assisted tri-sequence three-dimensional MRI segmentation for hypopharyngeal tumor.

Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology·2026

Same author

Could Pretreatment Computed Tomography Imaging Accurately Predict the Pathological Diagnosis of Lymph Node Involvement in Thymic Epithelial Tumors?

World journal of oncology·2026

Same author

Reply: Delayed Diagnosis of Congenital Aortopathies: Multimodal Imaging Makes the Difference.

JACC. Case reports·2026

Same author

An Adaptive Deep Learning Framework for Multi-Label Chest X-Ray Diagnosis Using a Hybrid CNN-Transformer Architecture and Class-Wise Ensemble Fusion.

Diagnostics (Basel, Switzerland)·2026

Same journal

Deep Learning for Opportunistic Vertebral Fracture Detection on Routine Thoraco-abdominal Computed Tomography: A Systematic Review and Hierarchical Summary Receiver Operating Characteristic Meta-analysis of Patient-level Diagnostic Test Accuracy.

Academic radiology·2026

Same journal

"Where are They Now?": A Single Institution's 10-Years Experience with an Integrated Nuclear Radiology Fellowship.

Academic radiology·2026

Same journal

Dual-layer Spectral Detector CT Quantitative Parameters for Predicting Tumor Budding Grade and Prognosis in Stage Ⅱ Colorectal Cancer.

Academic radiology·2026

Same journal

Promotion from Associate Professor to Full Professor Should Not Be Monolithic: A National Bibliometric Study by Radiology Subspecialty.

Academic radiology·2026

Same journal

Technological Lag of Digitization for Patient Image Transfer.

Academic radiology·2026

Same journal

Prognostic Value of Coronary Sinus Flow and Aortic Pressure Gradient Quantified by 4D Flow CMR in AMI.

Academic radiology·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 1, 2026

Guidelines and Experience Using Imaging Biomarker Explorer IBEX for Radiomics

Guidelines and Experience Using Imaging Biomarker Explorer IBEX for Radiomics

Published on: January 8, 2018

Evaluating Large Language Models for Enhancing Radiology Specialty Examination: A Comparative Study with Human

Hao-Yun Liu¹, Shyh-Jye Chen², Weichung Wang³

¹Department of Medical Imaging, National Taiwan University Hospital Hsin-Chu Branch, Br. No.25, Lane 442, Sec.1, Jingguo Rd., Hsinchu City 300, Taiwan ROC (H.Y.L.); National Taiwan University College of Medicine, No.1 Jen Ai Road Section 1, Taipei 100, Taiwan ROC (H.Y.L.).

Academic Radiology

|May 28, 2025

Summary

This summary is machine-generated.

Advanced large language models (LLMs) show promise in radiology exams, with GPT-4o and o1-preview outperforming human examinees. These AI tools can help assess question difficulty and standardize medical examinations.

Keywords:

Accuracy Evaluation GPT LLMs Radiology specialty examination

More Related Videos

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Published on: April 14, 2023

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Related Experiment Videos

Last Updated: May 1, 2026

Guidelines and Experience Using Imaging Biomarker Explorer IBEX for Radiomics

Guidelines and Experience Using Imaging Biomarker Explorer IBEX for Radiomics

Published on: January 8, 2018

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Published on: April 14, 2023

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Area of Science:

Medical Education
Artificial Intelligence
Radiology

Background:

Traditional radiology exams face challenges in accuracy and relevance due to expanding medical knowledge.
Large language models (LLMs) show potential for enhancing medical education and assessment.
Developing objective frameworks for radiology exam design is crucial.

Purpose of the Study:

To evaluate LLM performance in radiology specialty examinations.
To explore LLMs' role in assessing question difficulty and reasoning processes.
To investigate the potential for LLMs in developing a more objective and efficient exam design framework.

Main Methods:

Compared performance of three LLMs (GPT-4o, o1-preview, GPT-3.5-turbo-1106) against human examinees in a radiology exam.
Utilized zero-shot conditions for LLM evaluation.
Assessed accuracy, discrimination index, and point-biserial correlation to analyze question difficulty and reasoning.

Main Results:

GPT-4o (88.0%) and o1-preview (90.9%) significantly outperformed human examinees (76.3%) in accuracy.
GPT-3.5-turbo-1106 showed lower accuracy (50.2%) with greater performance variability.
Advanced LLMs accurately identified differentiating questions, mirroring human reasoning patterns and assessing exam difficulty.

Conclusions:

Advanced LLMs like GPT-4o and o1-preview demonstrate strong problem-solving capabilities in radiology exams.
LLMs can serve as valuable tools for assessing exam question difficulty.
LLMs can assist in the standardized development and evaluation of medical examinations.