Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Video

Updated: May 29, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Automated RECIST tumor response classification through prompt-guided large language models.

Markus Mergen^1,2, Felix Busch³, Andreas P Sauter³

¹Department of Diagnostic and Interventional Radiology, Technical University of Munich, School of Medicine and Health, Klinikum rechts der Isar, TUM University Hospital, 81675, Munich, Germany. markus.mergen@tum.de.

Scientific Reports

|May 27, 2026

Summary

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Toward a Biopsy-Free Diagnosis of Prostate Cancer: Potential of Combined <sup>18</sup>F-Flotufolastat PSMA PET and mpMRI.

Journal of nuclear medicine : official publication, Society of Nuclear Medicine·2026

Same author

Erratum for: Associations of MRI-derived Paraspinal IMAT and LMM with Cardiometabolic Risk Factors: Results from a German Cohort.

Radiology·2026

Same author

Evaluating accuracy and reasoning capabilities of large language models for acute ischemic stroke management.

Journal of neurointerventional surgery·2026

Same author

GPT-4.1 and Llama 3.3 70 fail to detect clinically relevant errors in radiology reports in zero-shot evaluation.

European radiology·2026

Same author

Performing Best When Needed Least: Reader Experience Shapes Accuracy Gains in Large Language Model-assisted Brain MRI Differential Diagnosis.

Radiology·2026

Same author

Advanced X-Ray Imaging Technology.

Recent results in cancer research. Fortschritte der Krebsforschung. Progres dans les recherches sur le cancer·2026

Same journal

Therapeutic potential of crude protein extracts from two Egyptian freshwater snails Lanistes carinatus and Bellamya unicolor.

Scientific reports·2026

Same journal

Microbial contamination of donor corneas and post-keratoplasty endophthalmitis: a comparison between Japanese and U.S. eye banks using cold storage.

Scientific reports·2026

Same journal

Prevalence and contributing factors of virological non-suppression among adult patients on first-line antiretroviral therapy in tertiary hospitals in Ethiopia.

Scientific reports·2026

Same journal

An in vitro comparison of color stability between alkasite and different restorative materials in various staining solutions.

Scientific reports·2026

Same journal

Toward accessible mRNA LNP formulation: systematic evaluation of mixing strategies and key parameters.

Scientific reports·2026

Same journal

A network analysis of personality traits, mentalizing, and psychological health in Chinese college students.

Scientific reports·2026

See all related articles

This summary is machine-generated.

An offline large language model (LLM) accurately classified oncology radiology reports using prompt strategies. Chain-of-thought prompting achieved the best results for tumor response assessment (Response Evaluation Criteria in Solid Tumors) while ensuring data privacy.

Area of Science:

Medical Imaging and Radiology
Artificial Intelligence in Healthcare
Oncology

Background:

Accurate tumor response assessment is crucial for cancer treatment evaluation.
Manual classification of radiology reports can be time-consuming and prone to variability.
Large language models (LLMs) show potential for automating clinical text analysis.

Purpose of the Study:

To evaluate an offline, general-purpose LLM's ability to classify radiology reports according to Response Evaluation Criteria in Solid Tumors (RECIST) guidelines.
To assess the impact of different prompting strategies (zero-shot, few-shot, chain-of-thought) on classification accuracy.
To ensure privacy-preserving tumor response assessment without model fine-tuning.

Main Methods:

An in-house, offline LLaMA-3.3 (70B) model was used to process CT imaging reports from oncology patients.

Related Experiment Videos

Last Updated: May 29, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Reports were classified into RECIST categories (Baseline, Complete Response, Partial Response, Stable Disease, Progressive Disease) using three prompting strategies.

Model performance was benchmarked against expert labels using accuracy, precision, recall, and F1 scores.

Main Results:

The LLM achieved strong classification performance across all prompting strategies.
Chain-of-thought prompting yielded the best results, with a micro F1 score of 0.81.
Model predictions showed good alignment with human expert assessments.
The offline system maintained strict data privacy compliance.

Conclusions:

Prompt-driven LLMs can accurately and reliably classify tumor response categories from real-world radiology reports.
Offline LLM deployment, coupled with optimized prompting, offers a scalable and privacy-preserving solution for oncology report interpretation.
This approach has the potential to enhance consistency and efficiency in clinical decision support.