Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Using a Large Language Model-Generated Prompt to Extract Features from Synthetic MRI Brain Scan Reports: A

John J Hanna1,2,3, Christopher S Evans2,4, Christopher R Dennis2

  • 1Department of Internal Medicine, ECU Brody School of Medicine, Greenville, North Carolina, United States.

Methods of Information in Medicine
|February 19, 2026
PubMed
Summary

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Prevention, Screening, Diagnosis, and Treatment of Iron Deficiency and Iron Deficiency Anemia in Infants, Children, and Adolescents: Clinical Report.

Pediatrics·2026
Same author

Association of rurality with brain metastasis at initial diagnosis of lung cancer.

Clinical & translational oncology : official publication of the Federation of Spanish Oncology Societies and of the National Cancer Institute of Mexico·2026
Same author

Generic and Biosimilar Prescribing in Children and Adolescents: Policy Statement.

Pediatrics·2026
Same author

Generic and Biosimilar Prescribing in Children and Adolescents: Technical Report.

Pediatrics·2026
Same author

Beyond the Model: Practical Insights from Monitoring Predictive Models across Diverse Clinical Workflows.

Applied clinical informatics·2026
Same author

An Investigation into Demographic Disparities in Emergency Department Disposition Decisions.

Production and operations management·2026
Same journal

Design and methodological development of a digital clinical safety training programme informed by a national framework: a New Zealand case study.

Methods of information in medicine·2026
Same journal

Panic Prediction from Digital Phenotyping: Subject-Level Cross-Validation Reveals Limited Between-Person Generalization.

Methods of information in medicine·2026
Same journal

Agent-Based Modeling Approach for Population Dynamics of the Biological Vector Aedes Aegypti.

Methods of information in medicine·2026
Same journal

A Statistical Framework for Person-centered Analysis of Digital Service Use in Public Health and Social Care.

Methods of information in medicine·2026
Same journal

Assessing the Quality of Electronic Discharge Summaries: A Cross-Sectional Study Using the Validated Spanish Version of the PDQI-9.

Methods of information in medicine·2026
Same journal

A Knowledge Graph-Driven Hypergeometric Efficacy Prediction Model for Classical Traditional Chinese Herbal Formulas.

Methods of information in medicine·2026
See all related articles
This summary is machine-generated.

Large language models (LLMs) show promise for extracting features from MRI brain reports. Newer models like GPT-4 perform well, and LLMs can even generate effective prompts for this task.

Area of Science:

  • Medical Informatics
  • Artificial Intelligence in Medicine

Background:

  • Automated feature extraction from medical reports is crucial for clinical, operational, and research purposes.
  • Large Language Models (LLMs) offer potential for automating feature extraction and category assignment from clinical text.

Purpose of the Study:

  • To compare the accuracy of feature extraction from MRI brain scan reports using clinician-engineered versus LLM-generated prompts.
  • To evaluate the performance of five OpenAI LLMs in extracting predefined features from synthetic MRI reports.

Main Methods:

  • Five OpenAI LLMs were tested on their ability to extract nine binary features from synthetic MRI brain reports.
  • Two prompt types were used: clinician-engineered and LLM-generated. Performance was assessed using recall, precision, accuracy, and F1 score.

Related Experiment Videos

Main Results:

  • High overall performance was observed across all models and prompts, with average recall of 0.956, precision of 0.9347, accuracy of 0.982, and F1 score of 0.9431.
  • GPT-3.5-turbo showed better performance with an LLM-generated prompt, while GPT-4 models consistently outperformed others regardless of prompt type.

Conclusions:

  • LLMs demonstrate significant potential for accurate feature extraction from MRI brain reports, with newer models like GPT-4 showing robust performance.
  • The choice of LLM and prompt engineering strategy significantly impacts the efficacy of automated feature extraction from medical imaging reports.