Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Video

Updated: May 12, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Leveraging Large Language Models for Real-World Data Evidence: A Framework for Automated Treatment Extraction and

Abhishek Shivanna1, Austin Fitts2, Jordan Tschida1

  • 1Advanced Computing for Health Sciences, Oak Ridge National Laboratory, Oak Ridge, TN.

Journal of Registry Management
|May 11, 2026
PubMed
Summary

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Linking mixtures of air pollution exposures and preterm birth with a self-organizing map.

Journal of exposure science & environmental epidemiology·2026
Same author

Global Explainability of A Deep Abstaining Classifier for Cancer Pathology Reports.

IEEE journal of biomedical and health informatics·2026
Same author

Learning to Diagnose Privately: DP-Powered LLMs for Radiology Report Classification.

IEEE access : practical innovations, open solutions·2026
Same author

Comparing machine and deep learning models for pediatric anxiety classification using structured EHRs and area-based measures of health data.

PloS one·2026
Same author

Evolving language of pediatric anxiety in electronic health records.

JAMIA open·2026
Same author

Mitigating Algorithmic Bias in Cancer Site Classification Models.

JCO clinical cancer informatics·2026
Same journal

Radon Exposure as an Occupational Hazard and Environmental Risk Factor for Lung Cancer in Utah: Assessment, Mitigation, and Policy Implications.

Journal of registry management·2026
Same journal

Preliminary Estimates of New Invasive Cancers Diagnosed in 2023 in the United States.

Journal of registry management·2026
Same journal

Validation of ICD-9-CM and ICD-10-CM Diagnosis Codes for Identifying Preterm Birth and Low Birthweight Infants in Florida Birth Hospitalizations, 2008-2018.

Journal of registry management·2026
Same journal

Letter from the Editor.

Journal of registry management·2026
Same journal

Evaluation of Cancer Cases Reported through Inter-Registry Data Exchange in New York State.

Journal of registry management·2026
Same journal

The Population Cancer Assessment and Surveillance Engine (PopCASE): An Emerging Population Cancer Data Platform.

Journal of registry management·2026
See all related articles
This summary is machine-generated.

Large language models (LLMs) can automate oncology treatment extraction from clinical text. The 8B-parameter Llama model offers a balance of accuracy and efficiency for cancer registries.

Area of Science:

  • Oncology
  • Natural Language Processing
  • Health Informatics

Background:

  • Collecting comprehensive cancer treatment data from medical records is crucial for real-world evidence studies.
  • Unstructured clinical text hinders systematic, high-quality treatment information extraction.
  • Manual data extraction is time-consuming, necessitating automated solutions.

Purpose of the Study:

  • To evaluate the utility of Llama family large language models (LLMs) for automated oncology treatment information extraction.
  • To guide researchers in utilizing cancer registry data for insights beyond clinical trials.

Main Methods:

  • Four instruction-tuned Llama models (1B, 3B, 8B, 70B parameters) were assessed for treatment extraction from clinical documents.
  • A unified oncology knowledge base was developed for standardizing extracted entities.
Keywords:
Artificial intelligencecancer registrydata extractiondata normalizationlarge language modelsnatural language processingoncology informatics

More Related Videos

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems
05:47

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

Related Experiment Videos

Last Updated: May 12, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems
05:47

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

  • Performance was measured using accuracy metrics (Precision, Recall, F1-Score) and operational feasibility.
  • Main Results:

    • A positive correlation between model size and extraction accuracy was observed, with F1-scores ranging from 0.609 (1B) to 0.828 (70B).
    • Larger models showed higher accuracy and compliance but incurred greater computational costs.
    • The 8B model demonstrated strong performance, with diminishing returns noted for the 70B model.

    Conclusions:

    • LLMs are a viable technology for automating oncology treatment extraction.
    • The 8B-parameter LLM provides an effective balance of accuracy and computational efficiency.
    • Standardized data integration via a knowledge base enhances real-world evidence analyses.