Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Video

Updated: May 12, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Leveraging Large Language Models for Real-World Data Evidence: A Framework for Automated Treatment Extraction and

Abhishek Shivanna¹, Austin Fitts², Jordan Tschida¹

¹Advanced Computing for Health Sciences, Oak Ridge National Laboratory, Oak Ridge, TN.

Journal of Registry Management

|May 11, 2026

Summary

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Linking mixtures of air pollution exposures and preterm birth with a self-organizing map.

Journal of exposure science & environmental epidemiology·2026

Same author

Global Explainability of A Deep Abstaining Classifier for Cancer Pathology Reports.

IEEE journal of biomedical and health informatics·2026

Same author

Learning to Diagnose Privately: DP-Powered LLMs for Radiology Report Classification.

IEEE access : practical innovations, open solutions·2026

Same author

Comparing machine and deep learning models for pediatric anxiety classification using structured EHRs and area-based measures of health data.

PloS one·2026

Same author

Evolving language of pediatric anxiety in electronic health records.

JAMIA open·2026

Same author

Mitigating Algorithmic Bias in Cancer Site Classification Models.

JCO clinical cancer informatics·2026

Same journal

Radon Exposure as an Occupational Hazard and Environmental Risk Factor for Lung Cancer in Utah: Assessment, Mitigation, and Policy Implications.

Journal of registry management·2026

Same journal

Preliminary Estimates of New Invasive Cancers Diagnosed in 2023 in the United States.

Journal of registry management·2026

Same journal

Validation of ICD-9-CM and ICD-10-CM Diagnosis Codes for Identifying Preterm Birth and Low Birthweight Infants in Florida Birth Hospitalizations, 2008-2018.

Journal of registry management·2026

Same journal

Letter from the Editor.

Journal of registry management·2026

Same journal

Evaluation of Cancer Cases Reported through Inter-Registry Data Exchange in New York State.

Journal of registry management·2026

Same journal

The Population Cancer Assessment and Surveillance Engine (PopCASE): An Emerging Population Cancer Data Platform.

Journal of registry management·2026

See all related articles

This summary is machine-generated.

Large language models (LLMs) can automate oncology treatment extraction from clinical text. The 8B-parameter Llama model offers a balance of accuracy and efficiency for cancer registries.

Area of Science:

Oncology
Natural Language Processing
Health Informatics

Background:

Collecting comprehensive cancer treatment data from medical records is crucial for real-world evidence studies.
Unstructured clinical text hinders systematic, high-quality treatment information extraction.
Manual data extraction is time-consuming, necessitating automated solutions.

Purpose of the Study:

To evaluate the utility of Llama family large language models (LLMs) for automated oncology treatment information extraction.
To guide researchers in utilizing cancer registry data for insights beyond clinical trials.

Main Methods:

Four instruction-tuned Llama models (1B, 3B, 8B, 70B parameters) were assessed for treatment extraction from clinical documents.
A unified oncology knowledge base was developed for standardizing extracted entities.

Keywords:

Artificial intelligence cancer registry data extraction data normalization large language models natural language processing oncology informatics

More Related Videos

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

Related Experiment Videos

Last Updated: May 12, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

Performance was measured using accuracy metrics (Precision, Recall, F1-Score) and operational feasibility.

Main Results:

A positive correlation between model size and extraction accuracy was observed, with F1-scores ranging from 0.609 (1B) to 0.828 (70B).
Larger models showed higher accuracy and compliance but incurred greater computational costs.
The 8B model demonstrated strong performance, with diminishing returns noted for the 70B model.

Conclusions:

LLMs are a viable technology for automating oncology treatment extraction.
The 8B-parameter LLM provides an effective balance of accuracy and computational efficiency.
Standardized data integration via a knowledge base enhances real-world evidence analyses.