Enhancing Oncology-Specific Question Answering With Large Language Models Through Fine-Tuned Embeddings With Synthetic Data
View abstract on PubMed
Summary
This summary is machine-generated.This study introduces an enhanced retrieval-augmented generation (RAG) model for oncology electronic health records (EHRs). The new model significantly improves the accuracy and relevance of retrieving clinical notes for cancer-related queries.
Area Of Science
- Medical Informatics
- Natural Language Processing
- Oncology Data Extraction
Background
- Advancements in retrieval-augmented generation (RAG) and large language models (LLMs) have transformed real-world evidence extraction from electronic health records (EHRs).
- Extracting precise clinical information from unstructured oncology EHRs remains a challenge, impacting research and patient care.
Purpose Of The Study
- To enhance RAG effectiveness for oncology EHRs by developing a specialized retriever encoder.
- To improve the precision and relevance of retrieved clinical notes for oncology-specific queries.
Main Methods
- Pretraining a retriever encoder on over six million oncology notes from 209,135 patients.
- Fine-tuning the model as a sentence transformer using 12,371 LLM-synthesized query-passage pairs.
- Evaluating retrieval performance against six embedding models using NDCG, Precision, and Recall metrics on 50 oncology questions.
Main Results
- The developed model outperformed the runner-up by 9% in NDCG, 7% in Precision, and 6% in Recall (top 10 results).
- Exceptional retrieval performance was observed across all metrics for key oncology categories such as diagnosis, disease status, and tumor characteristics.
- The model demonstrated superior ability in retrieving pertinent clinical notes from oncology EHRs.
Conclusions
- Pretrained contextual embeddings and sentence transformers are effective for retrieving relevant oncology EHR notes.
- LLM-synthesized query-passage pairs offer a viable data augmentation strategy for specialized domains.
- This fine-tuning approach shows promise for improving data extraction in healthcare settings with limited annotated data.
Related Concept Videos
Combining two or more treatment methods increases the life span of cancer patients while reducing damage to vital organs or tissue from the overuse of a single treatment. Combination therapy also targets different cancer-inducing pathways, thus reducing the chances of developing resistance to treatment.
The combination of the drug acetazolamide and sulforaphane is a good example of combination therapy to treat cancer. The cells in the interior of a large tumor often die due to the hypoxic and...
Immunotherapy is a treatment that boosts or manipulates the immune system to fight diseases, including cancer. For instance, by stimulating an immune response through vaccinations against viruses that cause cancers, like hepatitis B virus and human papillomavirus, these diseases can be prevented. Nonetheless, some cancer cells can avoid the immune system due to their rapid mutation and division. The immune response to many cancers involves three phases: elimination, equilibrium, and escape.
The targeted cancer therapies, also known as “molecular targeted therapies,” take advantage of the molecular and genetic differences between the cancer cells and the normal cells. It needs a thorough understanding of the cancer cells to develop drugs that can target specific molecular aspects that drive the growth, progression, and spread of cancer cells without affecting the growth and survival of other normal cells in the body.
There are several types of targeted therapies against...
Mice have long served as models for studying human biology and pathology because of their phylogenetic and physiological similarity with humans. They are also easy to maintain and breed in the laboratory, and hence, many inbred strains are now available for research. Studies on mice have contributed immeasurably to our understanding of cancer biology.
The development of transgenic, knockout, and knock-in mice has led to an exponential increase in their use as model organisms in research,...

