Leveraging GPT-4 for identifying cancer phenotypes in electronic health records: a performance comparison between GPT-4, GPT-3.5-turbo, Flan-T5, Llama-3-8B, and spaCy's rule-based and machine learning-based methods
View abstract on PubMed
Summary
This summary is machine-generated.Generative Pre-trained Transformer (GPT)-4 excels at identifying clinical phenotypes from electronic health records (EHRs) for non-small cell lung cancer (NSCLC) patients, outperforming other models in accuracy and recall.
Area Of Science
- Artificial Intelligence in Medicine
- Natural Language Processing for Healthcare
- Oncology Data Extraction
Background
- Accurate clinical phenotype identification from Electronic Health Records (EHRs) is crucial for patient health insights, especially when structured data is limited.
- Non-small cell lung cancer (NSCLC) research requires detailed phenotypic information often embedded within unstructured clinical notes.
- Existing methods for phenotype extraction face challenges in accurately capturing complex clinical details.
Purpose Of The Study
- To evaluate the efficacy of OpenAI's Generative Pre-trained Transformer (GPT)-4 model for identifying clinical phenotypes in NSCLC patients from EHR text.
- To compare the performance of GPT-4 against other large language models (LLMs) including GPT-3.5-turbo, Flan-T5 variants, and Llama-3-8B.
- To assess GPT-4's performance relative to established rule-based and machine learning methods like scispaCy and medspaCy.
Main Methods
- Phenotype extraction for NSCLC patients using clinical notes from Washington University.
- Identification of key phenotypes including initial cancer stage, treatment, recurrence, and affected organs.
- Comparative analysis of GPT-4 against GPT-3.5-turbo, Flan-T5-xl, Flan-T5-xxl, Llama-3-8B, medspaCy, and scispaCy using precision, recall, and micro-F1 scores.
Main Results
- GPT-4 demonstrated superior performance, achieving higher F1 scores, precision, and recall compared to Flan-T5 variants, Llama-3-8B, medspaCy, and scispaCy.
- GPT-3.5-turbo exhibited performance comparable to GPT-4.
- LLMs (GPT, Flan-T5, Llama) showed advantages over rule-based spaCy models due to their ability to recognize contextual patterns without explicit rule constraints.
Conclusions
- GPT-4 significantly enhances clinical phenotype identification from EHRs owing to its advanced pre-training and pattern recognition capabilities.
- GPT models offer superior contextual understanding and robust clinical phenotype extraction compared to traditional rule-based approaches.
- The findings highlight the potential of advanced LLMs like GPT-4 for improving data-driven healthcare insights in oncology.

