Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Combination Therapies and Personalized Medicine02:50

Combination Therapies and Personalized Medicine

4.8K
Combining two or more treatment methods increases the life span of cancer patients while reducing damage to vital organs or tissue from the overuse of a single treatment. Combination therapy also targets different cancer-inducing pathways, thus reducing the chances of developing resistance to treatment.
The combination of the drug acetazolamide and sulforaphane is a good example of combination therapy to treat cancer. The cells in the interior of a large tumor often die due to the hypoxic and...
4.8K
Cancer Survival Analysis01:21

Cancer Survival Analysis

298
Cancer survival analysis focuses on quantifying and interpreting the time from a key starting point, such as diagnosis or the initiation of treatment, to a specific endpoint, such as remission or death. This analysis provides critical insights into treatment effectiveness and factors that influence patient outcomes, helping to shape clinical decisions and guide prognostic evaluations. A cornerstone of oncology research, survival analysis tackles the challenges of skewed, non-normally...
298
  1. Home
  2. Research Domains
  3. Language, Communication And Culture
  4. Linguistics
  5. Computational Linguistics
  6. Development Of A Synthetic Oncology Pathology Dataset For Large Language Model Evaluation In Medical Text Classification.
  1. Home
  2. Research Domains
  3. Language, Communication And Culture
  4. Linguistics
  5. Computational Linguistics
  6. Development Of A Synthetic Oncology Pathology Dataset For Large Language Model Evaluation In Medical Text Classification.

Related Experiment Video

Generation of Comprehensive Thoracic Oncology Database - Tool for Translational Research
11:18

Generation of Comprehensive Thoracic Oncology Database - Tool for Translational Research

Published on: January 22, 2011

16.0K

Development of a Synthetic Oncology Pathology Dataset for Large Language Model Evaluation in Medical Text Classification.

Werner O Hackl1, Sabrina B Neururer1,2, Stefan Richter1,3

  • 1Division for Digital Health and Telemedicine, UMIT TIROL - Private University for Health Sciences and Health Technology, Hall in Tirol, Austria.

Studies in Health Technology and Informatics
|April 24, 2025

View abstract on PubMed

Summary
This summary is machine-generated.

A synthetic oncology pathology dataset was created for evaluating Large Language Models (LLMs) in cancer report classification. This privacy-preserving benchmark enables reproducible AI research without using real patient data.

Keywords:
Large Language ModelsNeoplasm RegistriesText Mining

More Related Videos

A Knowledge Graph Approach to Elucidate the Role of Organellar Pathways in Disease via Biomedical Reports
07:35

A Knowledge Graph Approach to Elucidate the Role of Organellar Pathways in Disease via Biomedical Reports

Published on: October 13, 2023

1.5K
Performing Data Mining And Integrative Analysis Of Biomarker in Breast Cancer Using Multiple Publicly Accessible Databases
07:41

Performing Data Mining And Integrative Analysis Of Biomarker in Breast Cancer Using Multiple Publicly Accessible Databases

Published on: May 17, 2019

8.8K

Related Experiment Videos

Generation of Comprehensive Thoracic Oncology Database - Tool for Translational Research
11:18

Generation of Comprehensive Thoracic Oncology Database - Tool for Translational Research

Published on: January 22, 2011

16.0K
A Knowledge Graph Approach to Elucidate the Role of Organellar Pathways in Disease via Biomedical Reports
07:35

A Knowledge Graph Approach to Elucidate the Role of Organellar Pathways in Disease via Biomedical Reports

Published on: October 13, 2023

1.5K
Performing Data Mining And Integrative Analysis Of Biomarker in Breast Cancer Using Multiple Publicly Accessible Databases
07:41

Performing Data Mining And Integrative Analysis Of Biomarker in Breast Cancer Using Multiple Publicly Accessible Databases

Published on: May 17, 2019

8.8K

Area of Science:

  • Artificial Intelligence in Pathology
  • Computational Pathology
  • Medical Informatics

Background:

  • Large Language Models (LLMs) show potential for automating oncology pathology report classification.
  • Real patient data use is limited by privacy, legal, and ethical concerns.
  • Privacy-compliant alternatives are crucial for AI research in this field.

Purpose of the Study:

  • To develop a synthetic oncology pathology dataset.
  • To establish a benchmark for evaluating LLM performance in pathology report classification.
  • To facilitate reproducible and privacy-preserving AI research.

Main Methods:

  • Generated 227 synthetic pathology reports using diverse LLMs (Microsoft Copilot, ChatGPT Plus, Perplexity Pro).
  • Included prostate, lung, and breast cancer cases, balanced for malignant and benign findings.
  • Validated reports through classification by three independent cancer registrars.
  • Main Results:

    • Created a structured dataset of synthetic oncology pathology reports.
    • Ensured structural and linguistic diversity across generated reports.
    • Achieved consensus-based validation by expert registrars.

    Conclusions:

    • The synthetic dataset serves as a clinically relevant benchmark for LLM evaluation.
    • Enables AI model assessment in pathology text classification without compromising patient privacy.
    • Supports scalable and ethical development of AI in oncology documentation.