Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Five-year survival with tebentafusp in metastatic uveal melanoma.

Annals of oncology : official journal of the European Society for Medical Oncology·2026

Same author

IO102-IO103 immune-modulatory cancer vaccine and pembrolizumab in melanoma.

Annals of oncology : official journal of the European Society for Medical Oncology·2026

Same author

ESMO adaptation of Lines of Systemic Therapy (EnLiST): a consensus framework for standardising the designation of lines of therapy in solid tumours.

Annals of oncology : official journal of the European Society for Medical Oncology·2026

Same author

Prediction of Mutations and Outcome in Gastrointestinal Stromal Tumors with Deep Learning: A Multicenter, Multinational Study.

medRxiv : the preprint server for health sciences·2026

Same author

Incidence and risk factors of brain metastases in radically resected melanoma patients: a large international cohort study.

ESMO open·2026

Same author

Exploring the impact of NGS on diagnostics and treatment of sarcoma: insights from real-world data across multiple institutions in Europe.

ESMO open·2025

Same journal

First-line osimertinib in advanced <i>EGFR</i>-mutated NSCLC: real-world outcomes, clinicogenomic correlates, and oligoprogression management in a multicenter Spanish cohort.

ESMO real world data and digital oncology·2026

Same journal

Large language models in oncology: promise, pitfalls, and the path to real-world adoption.

ESMO real world data and digital oncology·2026

Same journal

Systematic identification of genomic nonresponse biomarkers to cancer therapies.

ESMO real world data and digital oncology·2026

Same journal

Retrospective analysis of real-world clinical use of comprehensive genomic profiling in solid tumors in Finland 2017-2020.

ESMO real world data and digital oncology·2026

Same journal

Development and evaluation of a large language model-based, retrieval-augmented generation application for query response in early oncology clinical trials.

ESMO real world data and digital oncology·2026

Same journal

The evolving physician-AI relationship: a five-tier framework for integrating intelligent systems into clinical practice and medical education.

ESMO real world data and digital oncology·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Apr 24, 2026

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

Accelerating real-world data collection using large language models in rare neoplasms: a bone sarcoma example.

P Teterycz^1,2, S Rynkun¹, B Szostakowski^2,3

¹Digital Medicine Center, Maria Sklodowska-Curie National Research Institute of Oncology, Warsaw, Poland.

ESMO Real World Data and Digital Oncology

|April 23, 2026

Summary

This summary is machine-generated.

Extracting oncology data from Polish medical notes using small large language models (LLMs) showed modest single-model accuracy. However, an ensemble voting approach significantly improved performance, demonstrating potential for automated clinical research data extraction.

Keywords:

LLMs artificial intelligence bone sarcoma data extraction

More Related Videos

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

Related Experiment Videos

Last Updated: Apr 24, 2026

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

Area of Science:

Medical Informatics
Natural Language Processing
Oncology Data Science

Background:

Real-world data collection in oncology is challenging due to unstructured medical notes.
Large language models (LLMs) show promise for extracting information from free-text data.
This study assesses small LLMs for information extraction from Polish medical notes.

Purpose of the Study:

To evaluate the performance of multiple small LLMs as information extractors on Polish medical notes.
To determine the effectiveness of different prompting techniques and ensemble methods for data extraction.
To assess the feasibility of automating data extraction from electronic health records (EHRs) in a non-English setting.

Main Methods:

Utilized EHRs from 302 bone sarcoma patients (2016-2022).
Annotated five key variables: pathology type, tumor size, localization, grade, and primary resection.
Employed four small LLMs with multiple prompting techniques and an ensemble voting strategy.

Main Results:

Single-model accuracy ranged from 17.5% to 30.3%, highly dependent on prompts.
Tumor localization was the easiest variable to extract (up to 36.2% accuracy).
The ensemble voting approach significantly boosted overall accuracy to 83.6%, reaching 90.0% for resection type.

Conclusions:

Lightweight LLMs show potential for automating data extraction from medical notes, accelerating clinical research.
Individual small LLMs are insufficient for real-world, non-English applications.
Prompt engineering and ensemble methods are crucial for improving LLM performance in medical data extraction.