Domain-Specific Pretraining of NorDeClin-Bidirectional Encoder Representations From Transformers for International Statistical Classification of Diseases, Tenth Revision, Code Prediction in Norwegian Clinical Texts: Model Development and Evaluation Study
View abstract on PubMed
Summary
This summary is machine-generated.New Norwegian BERT models, NorDeClin-BERT, significantly improve International Statistical Classification of Diseases, Tenth Revision (ICD-10) coding accuracy. Domain-specific pretraining enhances performance over general models for Norwegian clinical text.
Area Of Science
- Natural Language Processing (NLP)
- Machine Learning
- Health Informatics
Background
- Accurate International Statistical Classification of Diseases, Tenth Revision (ICD-10) coding is vital for healthcare operations, but manual processes are error-prone and inefficient.
- Existing NLP models for ICD-10 coding primarily focus on English, creating a research gap for Norwegian clinical text.
- There is a need for automated ICD-10 coding solutions tailored to the Norwegian healthcare system.
Purpose Of The Study
- Introduce NorDeClin-BERT, a domain-specific Norwegian BERT model for enhanced medical language understanding.
- Evaluate the impact of domain-specific pretraining and model size on ICD-10 code classification performance.
- Compare NorDeClin-BERT against general-purpose and cross-lingual BERT models for Norwegian ICD-10 coding.
Main Methods
- Pretrained two versions of NorDeClin-BERT (base and large) on the 8.8 million deidentified Norwegian clinical notes of the ClinCode Gastro Corpus.
- Fine-tuned the models for ICD-10 diagnosis code prediction.
- Benchmarked NorDeClin-BERT against SweDeClin-BERT, ScandiBERT, NorBERT3-base, and NorBERT3-large using accuracy, precision, recall, and F1-score.
Main Results
- Both NorDeClin-BERT versions outperformed general Norwegian BERT models and Swedish clinical BERT models in classifying ICD-10 codes.
- NorDeClin-BERT-large achieved the highest performance across all evaluation metrics, demonstrating the benefits of domain-specific pretraining and model capacity.
- Swedish clinical models showed limited transferability, underscoring the need for Norwegian-specific clinical pretraining.
Conclusions
- NorDeClin-BERT shows significant potential to improve ICD-10 code classification in Norwegian gastroenterology, streamlining documentation and reducing administrative burden.
- The study establishes NorDeClin-BERT as a state-of-the-art model for Norwegian medical NLP and ICD-10 coding, setting a new research baseline.
- Future research should explore advanced domain adaptation, external knowledge integration, and cross-hospital generalizability for broader clinical applications.
Related Concept Videos
Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y.

