Evaluating Medium Scale, Open-Source Large Language Models: Towards Decision Support in a Precision Oncology Care Delivery Context
- Kevin Kaufmes 1, Georg Mathes 1, Dilyana Vladimirova 2, Stephanie Berger 2, Christian Fegeler 1,3, Stefan Sigle 1
- 1MOLIT Institute, Heilbronn, Germany.
- 2SLK Clinics, Heilbronn, Germany.
- 3University of Heilbronn, Heilbronn, Germany.
- 0MOLIT Institute, Heilbronn, Germany.
Related Experiment Videos
Contact us if these videos are not relevant.
Contact us if these videos are not relevant.
View abstract on PubMed
Summary
This summary is machine-generated.Medium-scale large language models (LLMs) show insufficient reliability for precision oncology molecular tumor board (MTB) preparation. Current LLMs frequently provide outdated or incorrect information, posing risks to patient safety.
Area Of Science
- Artificial Intelligence in Medicine
- Clinical Decision Support Systems
Background
- Precision oncology requires up-to-date knowledge for complex patient cases.
- Preparing cases for molecular tumor boards (MTBs) is labor-intensive.
- Large language models (LLMs) offer potential to streamline information retrieval for MTBs.
Purpose Of The Study
- To evaluate the utility of medium-scale LLMs for answering clinical questions in MTB preparation.
- To assess the performance of on-premise LLMs using consumer hardware for sensitive data handling.
Main Methods
- Three LLMs were selected based on benchmarks and reasoning capabilities.
- Domain experts provided exemplary MTB-related questions.
- Experts evaluated LLM-generated responses for quality and correctness.
Main Results
- Overall LLM performance was modest, with significant issues identified.
- A high percentage of responses contained outdated, incomplete, or factually erroneous information.
- Evaluator discordance and varying confidence levels were observed.
Conclusions
- Medium-scale LLMs are currently unreliable for precision oncology applications.
- Outdated information and confident misinformation highlight a gap between benchmark and real-world performance.
- Future research should explore advanced techniques like RAG and web search, prioritizing patient safety.
Related Experiment Videos
Contact us if these videos are not relevant.
Contact us if these videos are not relevant.
Related Concept Videos
02:50
Combining two or more treatment methods increases the life span of cancer patients while reducing damage to vital organs or tissue from the overuse of a single treatment. Combination therapy also targets different cancer-inducing pathways, thus reducing the chances of developing resistance to treatment.
The combination of the drug acetazolamide and sulforaphane is a good example of combination therapy to treat cancer. The cells in the interior of a large tumor often die due to the hypoxic and...
01:21
Cancer survival analysis focuses on quantifying and interpreting the time from a key starting point, such as diagnosis or the initiation of treatment, to a specific endpoint, such as remission or death. This analysis provides critical insights into treatment effectiveness and factors that influence patient outcomes, helping to shape clinical decisions and guide prognostic evaluations. A cornerstone of oncology research, survival analysis tackles the challenges of skewed, non-normally...

