Chinese Clinical Named Entity Recognition With Segmentation Synonym Sentence Synthesis Mechanism: Algorithm Development and Validation
View abstract on PubMed
Summary
This summary is machine-generated.This study introduces a novel dataset augmentation algorithm for clinical named entity recognition (CNER) that enhances model performance by overcoming data scarcity and labeling challenges. The Segmentation Synonym Sentence Synthesis (SSSS) method improves CNER model accuracy without needing extensive manual annotation.
Area Of Science
- Natural Language Processing
- Machine Learning
- Bioinformatics
Background
- Clinical Named Entity Recognition (CNER) is crucial for extracting information from electronic medical records.
- Deep learning models are prevalent in CNER but often require large annotated datasets and extensive dictionaries.
- Existing CNER methods face challenges due to text complexity, entity diversity, and boundary ambiguity.
Purpose Of The Study
- To address data scarcity and labeling difficulties in CNER tasks.
- To propose a dataset augmentation algorithm that leverages existing knowledge without manual dictionary expansion.
- To improve the generalization performance of CNER models.
Main Methods
- Developed the Segmentation Synonym Sentence Synthesis (SSSS) algorithm using proximity word calculation and lexical segmentation.
- Recombined synonymous vocabulary from natural language data to expand the dataset.
- Applied the SSSS algorithm to RoBERTa + CRF and RoBERTa + BiLSTM + CRF models, evaluating on CCKS 2017 and 2019 datasets.
Main Results
- SSSS + RoBERTa + CRF achieved an F1-score of 91.30% on the CCKS-2017 dataset.
- SSSS + RoBERTa + BiLSTM + CRF achieved an F1-score of 91.35% on the CCKS-2017 dataset.
- Both models achieved F1-scores above 83% on the CCKS-2019 dataset.
Conclusions
- The SSSS algorithm effectively expands CNER datasets.
- The proposed method significantly improves CNER model performance.
- This approach successfully mitigates challenges related to data acquisition, annotation, and model generalization.

