Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text | JoVE Visualize

Area of Science:

Biomedical Informatics
Natural Language Processing
Health Informatics

Background:

Patient-generated health text is increasing but difficult to process with existing biomedical Natural Language Processing (NLP) tools.
Current NLP tools are often designed for clinical or researcher text, facing challenges with evolving technologies and vocabularies.
Manual annotation for NLP evaluation is time-consuming and resource-intensive, necessitating alternative assessment methods.

Purpose of the Study:

To explore a low-cost, automated approach for detecting failures in biomedical NLP tools processing patient-generated text.
To characterize common NLP failures in online health community text.
To demonstrate the feasibility of automated failure detection using MetaMap, a popular biomedical NLP tool.

Main Methods:

Manual review of 9657 online cancer community posts processed by MetaMap to characterize and categorize NLP failures.
Identification of 12 causes for inaccurate concept mappings across three failure types: boundary, missed term, and word ambiguity.
Development of automated methods combining NLP techniques and dictionary matching to detect identified failure types, followed by manual evaluation.

Main Results:

Characterized three primary failure types: boundary, missed term, and word ambiguity, with 12 underlying causes.
Automated methods detected nearly half of 383,572 MetaMap mappings as problematic.
Word sense ambiguity was the most frequent failure (82.22%), followed by boundary failures (15.90%) and missed term failures (1.88%).
Automated failure detection achieved high performance metrics: 83.00% precision, 92.57% recall, 88.17% accuracy, and 87.52% F1 score.

Conclusions:

Challenges in processing patient-generated health text with NLP tools were highlighted.
A feasible, low-cost automated approach for detecting NLP failures in patient-generated text was demonstrated.
The approach offers a scalable solution for continuously assessing and improving NLP tools and vocabularies for patient-generated health data.