Leveraging large language models to mimic domain expert labeling in unstructured text-based electronic healthcare records in non-english languages
View abstract on PubMed
Summary
This summary is machine-generated.Fine-tuned large language models (LLMs) accurately classify electronic health records (EHR) with errors. This AI approach significantly improves data labeling efficiency for healthcare applications.
Area Of Science
- Artificial Intelligence in Healthcare
- Natural Language Processing for EHR Analysis
- Machine Learning for Medical Diagnostics
Background
- Electronic health records (EHR) offer vast potential for improving healthcare diagnostics and patient outcomes.
- Processing unstructured EHR data, especially with errors, is a major bottleneck for AI applications.
- Domain-specific, fine-tuned large language models (LLMs) are investigated for classifying noisy EHR text.
Purpose Of The Study
- To evaluate the efficacy of fine-tuned LLMs in classifying unstructured EHR texts containing typographical errors.
- To improve the efficiency and reliability of AI-driven supervised learning models in healthcare settings.
- To assess the performance of LLMs in named entity recognition tasks on clinical notes.
Main Methods
- Analysis of Turkish clinical notes from pediatric emergency room admissions (2018-2023).
- Data preprocessing using Python libraries and classification with a pretrained GPT-3 model before and after domain-specific fine-tuning on respiratory tract infections (RTI).
- Comparison of model predictions against ground truth labels determined by pediatric specialists.
Main Results
- A fine-tuned LLM achieved 99.88% accuracy in identifying RTI cases, a significant improvement over the pretrained model's 78.54%.
- The fine-tuned model demonstrated superior performance metrics across all evaluated aspects.
- Out of 24,229 poorly labeled records, 18,879 were confirmed for RTI after filtering.
Conclusions
- Fine-tuned LLMs can accurately categorize unstructured EHR data, nearing expert performance levels.
- This method substantially reduces manual data labeling time and costs.
- The approach shows promise for streamlining large-scale healthcare data processing for AI.
Related Concept Videos
Electronic Medical Records (EMRs) primarily center around electronically documenting patients' health information within a single healthcare organization or practice. They contain essential clinical data related to a patient's medical history, diagnoses, medications, treatment plans, lab results, and other pertinent information relevant to the specific encounter or episode of care. EMRs are designed to streamline documentation and workflow processes within individual healthcare...
The issues and trends in healthcare delivery are constantly changing. The COVID-19 pandemic is one recent issue that wreaked havoc on healthcare systems, causing a shortage of healthcare workers, high demand for medicines and supplies, and increased medical expenditure due to a lack of insurance. Other issues include rising healthcare costs and care fragmentation.
Cost Containment
Payment for healthcare services has historically promoted adoption of costly and often unnecessary or inefficient...

