Improving Drug Identification in Overdose Death Surveillance using Large Language Models
View abstract on PubMed
Summary
This summary is machine-generated.Natural language processing (NLP) models, specifically BioClinicalBERT, accurately classify drug-involved overdose deaths from free-text reports. This enhances timely surveillance, overcoming limitations of manual coding for emerging substance use trends.
Area Of Science
- Public Health Surveillance
- Computational Linguistics
- Data Science
Background
- Drug-related deaths, primarily fentanyl-driven, necessitate rapid and precise surveillance in the U.S.
- Current methods relying on manual coding of coroner reports into ICD-10 classifications cause data delays and loss.
- Existing natural language processing (NLP) applications for overdose surveillance have shown limitations.
Purpose Of The Study
- To evaluate and compare various NLP models for classifying specific drug involvement from unstructured death certificate text.
- To assess the performance of traditional machine learning, general-domain BERT, and large language models (LLMs) against fine-tuned clinical NLP models.
- To determine the efficacy of NLP in automating and enhancing overdose surveillance data extraction.
Main Methods
- Utilized a dataset of 35,433 U.S. death records from 2020 for training and internal testing.
- Performed external validation on a separate dataset of 3,335 records from 2023-2024.
- Compared traditional classifiers, Bidirectional Encoder Representations from Transformers (BERT), BioClinicalBERT, Qwen 3, and Llama 3 using macro-averaged F1 scores.
Main Results
- Fine-tuned BioClinicalBERT models achieved near-perfect performance (macro F1 >=0.998) on the internal test set.
- External validation demonstrated the robustness of BioClinicalBERT (macro F1=0.966), outperforming other evaluated models.
- NLP models significantly outperformed conventional machine learning and general-domain BERT and LLMs.
Conclusions
- Fine-tuned clinical NLP models, like BioClinicalBERT, provide a highly accurate and scalable solution for classifying drug-involved overdose deaths from free-text reports.
- These NLP methods can substantially accelerate surveillance workflows, surpassing the limitations of manual ICD-10 coding.
- The approach supports near real-time detection of emerging substance use trends, improving public health response.
Related Concept Videos
Drug discovery is a multifaceted process involving extensive screening, testing, and optimization of lead compounds to identify potential new drugs for therapeutic use. It combines several approaches, including screening large numbers of natural products, chemical modification of known active molecules, identification of new drug targets, and rational design based on biological mechanisms and drug-receptor structure. These approaches are carried out in both academic research laboratories and...
Poison can be effectively removed from the gastrointestinal (GI) tract through various decontamination procedures.
Antidotes serve a crucial role in counteracting the effects of poison by inhibiting enzymes responsible for producing harmful drug metabolites. In some cases, these toxic metabolites can be neutralized by endogenous cosubstrates, which are maintained at specific concentrations to prevent interaction with cellular macromolecules and subsequent cell death.
Renal excretion is the...
During the development of a new pharmaceutical, the manufacturer initially assigns a code name to the drug. Once approved, the drug receives a United States Adopted Name (USAN)—a generic, nonproprietary designation. Upon being listed in the United States Pharmacopeia, this nonproprietary name becomes the drug's official name. Additionally, the manufacturer assigns a proprietary name or trademark, which serves as the brand name under which the drug is marketed. It is worth noting that...
Post-marketing surveillance is a critical component of pharmaceutical regulation, often uncovering unanticipated adverse drug reactions (ADRs) once a drug is widely used over an extended period.
This process, termed pharmacovigilance, aims to detect, evaluate, and minimize harmful effects related to medication use. The data collection for pharmacovigilance depends on spontaneous reporting systems, where healthcare professionals or patients voluntarily report suspected ADRs.
In some cases, there...
Analysis of population pharmacokinetic data involves studying the behavior of drugs within diverse populations to understand their pharmacokinetic parameters. Traditional pharmacokinetic methods typically involve collecting samples from a few individuals and estimating these parameters. While these methods are commonly used, they have limitations in capturing the variability in drug response among individuals or heterogeneous populations. Population pharmacokinetics is employed to address these...

