Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Videos

Using a Large Language Model-Generated Prompt to Extract Features from Synthetic MRI Brain Scan Reports: A

John J Hanna^1,2,3, Christopher S Evans^2,4, Christopher R Dennis²

¹Department of Internal Medicine, ECU Brody School of Medicine, Greenville, North Carolina, United States.

Methods of Information in Medicine

|February 19, 2026

Summary

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Prevention, Screening, Diagnosis, and Treatment of Iron Deficiency and Iron Deficiency Anemia in Infants, Children, and Adolescents: Clinical Report.

Pediatrics·2026

Same author

Association of rurality with brain metastasis at initial diagnosis of lung cancer.

Clinical & translational oncology : official publication of the Federation of Spanish Oncology Societies and of the National Cancer Institute of Mexico·2026

Same author

Generic and Biosimilar Prescribing in Children and Adolescents: Policy Statement.

Pediatrics·2026

Same author

Generic and Biosimilar Prescribing in Children and Adolescents: Technical Report.

Pediatrics·2026

Same author

Beyond the Model: Practical Insights from Monitoring Predictive Models across Diverse Clinical Workflows.

Applied clinical informatics·2026

Same author

An Investigation into Demographic Disparities in Emergency Department Disposition Decisions.

Production and operations management·2026

Same journal

Design and methodological development of a digital clinical safety training programme informed by a national framework: a New Zealand case study.

Methods of information in medicine·2026

Same journal

Panic Prediction from Digital Phenotyping: Subject-Level Cross-Validation Reveals Limited Between-Person Generalization.

Methods of information in medicine·2026

Same journal

Agent-Based Modeling Approach for Population Dynamics of the Biological Vector Aedes Aegypti.

Methods of information in medicine·2026

Same journal

A Statistical Framework for Person-centered Analysis of Digital Service Use in Public Health and Social Care.

Methods of information in medicine·2026

Same journal

Assessing the Quality of Electronic Discharge Summaries: A Cross-Sectional Study Using the Validated Spanish Version of the PDQI-9.

Methods of information in medicine·2026

Same journal

A Knowledge Graph-Driven Hypergeometric Efficacy Prediction Model for Classical Traditional Chinese Herbal Formulas.

Methods of information in medicine·2026

See all related articles

This summary is machine-generated.

Large language models (LLMs) show promise for extracting features from MRI brain reports. Newer models like GPT-4 perform well, and LLMs can even generate effective prompts for this task.

Area of Science:

Medical Informatics
Artificial Intelligence in Medicine

Background:

Automated feature extraction from medical reports is crucial for clinical, operational, and research purposes.
Large Language Models (LLMs) offer potential for automating feature extraction and category assignment from clinical text.

Purpose of the Study:

To compare the accuracy of feature extraction from MRI brain scan reports using clinician-engineered versus LLM-generated prompts.
To evaluate the performance of five OpenAI LLMs in extracting predefined features from synthetic MRI reports.

Main Methods:

Five OpenAI LLMs were tested on their ability to extract nine binary features from synthetic MRI brain reports.
Two prompt types were used: clinician-engineered and LLM-generated. Performance was assessed using recall, precision, accuracy, and F1 score.

Related Experiment Videos

Main Results:

High overall performance was observed across all models and prompts, with average recall of 0.956, precision of 0.9347, accuracy of 0.982, and F1 score of 0.9431.
GPT-3.5-turbo showed better performance with an LLM-generated prompt, while GPT-4 models consistently outperformed others regardless of prompt type.

Conclusions:

LLMs demonstrate significant potential for accurate feature extraction from MRI brain reports, with newer models like GPT-4 showing robust performance.
The choice of LLM and prompt engineering strategy significantly impacts the efficacy of automated feature extraction from medical imaging reports.