Evaluating Medium Scale, Open-Source Large Language Models: Towards Decision Support in a Precision Oncology Care Delivery Context

  • 0MOLIT Institute, Heilbronn, Germany.

|

|

Summary

This summary is machine-generated.

Medium-scale large language models (LLMs) show insufficient reliability for precision oncology molecular tumor board (MTB) preparation. Current LLMs frequently provide outdated or incorrect information, posing risks to patient safety.

Area Of Science

  • Artificial Intelligence in Medicine
  • Clinical Decision Support Systems

Background

  • Precision oncology requires up-to-date knowledge for complex patient cases.
  • Preparing cases for molecular tumor boards (MTBs) is labor-intensive.
  • Large language models (LLMs) offer potential to streamline information retrieval for MTBs.

Purpose Of The Study

  • To evaluate the utility of medium-scale LLMs for answering clinical questions in MTB preparation.
  • To assess the performance of on-premise LLMs using consumer hardware for sensitive data handling.

Main Methods

  • Three LLMs were selected based on benchmarks and reasoning capabilities.
  • Domain experts provided exemplary MTB-related questions.
  • Experts evaluated LLM-generated responses for quality and correctness.

Main Results

  • Overall LLM performance was modest, with significant issues identified.
  • A high percentage of responses contained outdated, incomplete, or factually erroneous information.
  • Evaluator discordance and varying confidence levels were observed.

Conclusions

  • Medium-scale LLMs are currently unreliable for precision oncology applications.
  • Outdated information and confident misinformation highlight a gap between benchmark and real-world performance.
  • Future research should explore advanced techniques like RAG and web search, prioritizing patient safety.