A Pipeline for the Automatic Identification of Randomized Controlled Oncology Trials and Assignment of Tumor Entities Using Natural Language Processing
- Paul Windisch 1,2, Fabio Dennstädt 2, Carole Koechli 1,2, Robert Förster 1,2, Christina Schröder 1,2, Daniel M Aebersold 2, Daniel R Zwahlen 1
- Paul Windisch 1,2, Fabio Dennstädt 2, Carole Koechli 1,2
- 1Department of Radiation Oncology, Cantonal Hospital Winterthur, Winterthur, Switzerland.
- 2Department of Radiation Oncology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland.
- 0Department of Radiation Oncology, Cantonal Hospital Winterthur, Winterthur, Switzerland.
Related Experiment Videos
Contact us if these videos are not relevant.
Contact us if these videos are not relevant.
View abstract on PubMed
Summary
This summary is machine-generated.This study shows that classifying medical publications as randomized controlled trials (RCTs) or not, and as oncology-related or not, is feasible. This specialized classification enables more efficient data processing for oncology RCTs.
Area Of Science
- Biomedical Informatics
- Natural Language Processing
- Clinical Trials
Background
- General information extraction tools lack domain specificity.
- Domain-specific retrieval of trials can improve data processing.
Purpose Of The Study
- To classify medical publications into randomized controlled trials (RCTs) vs. non-RCTs and oncology vs. non-oncology topics.
- To evaluate the performance of a small transformer model and large language models (GPT-4o, GPT-4o mini) for this classification task.
- To develop a rule-based system for extracting tumor entities from oncology RCTs.
Main Methods
- Trained a small transformer model for binary classification of RCT status and oncology topic.
- Utilized GPT-4o and GPT-4o mini for the same classification tasks.
- Developed a rule-based system to extract tumor entities from classified oncology RCTs.
Main Results
- Small transformer achieved F1 scores of 0.96 for RCT classification and 0.84 for oncology classification.
- GPT-4o achieved F1 scores of 0.94 for RCT classification and 0.91 for oncology classification.
- The rule-based system accurately assigned all oncology RCTs to a tumor entity.
Conclusions
- Classifying publications as randomized controlled oncology trials is feasible.
- This specialized classification facilitates downstream processing with rule-based systems and dedicated models.
Related Experiment Videos
Contact us if these videos are not relevant.
Contact us if these videos are not relevant.
Related Concept Videos
02:50
Combining two or more treatment methods increases the life span of cancer patients while reducing damage to vital organs or tissue from the overuse of a single treatment. Combination therapy also targets different cancer-inducing pathways, thus reducing the chances of developing resistance to treatment.
The combination of the drug acetazolamide and sulforaphane is a good example of combination therapy to treat cancer. The cells in the interior of a large tumor often die due to the hypoxic and...
02:57
The targeted cancer therapies, also known as “molecular targeted therapies,” take advantage of the molecular and genetic differences between the cancer cells and the normal cells. It needs a thorough understanding of the cancer cells to develop drugs that can target specific molecular aspects that drive the growth, progression, and spread of cancer cells without affecting the growth and survival of other normal cells in the body.
There are several types of targeted therapies against...

