Mind the Performance Gap: Examining Dataset Shift During Prospective Validation | JoVE Visualize

Area of Science:

Clinical Informatics
Health Services Research
Predictive Analytics

Background:

Patient risk stratification models are crucial for clinical care but often show decreased performance post-implementation compared to initial retrospective validation.
Performance degradation is attributed to temporal shifts (care processes, patient populations) and infrastructure shifts (data access, extraction, transformation).
Prospective validation is infrequently reported, hindering understanding of real-world model performance and the factors contributing to performance gaps.

Purpose of the Study:

To compare the prospective performance of a patient risk stratification model with its prior retrospective validation performance.
To quantify the performance gap between retrospective and prospective validation of a healthcare-associated infection (HAI) prediction model.
To differentiate the contributions of temporal shift and infrastructure shift to the observed performance gap.

Main Methods:

A patient risk stratification model for predicting HAIs was applied prospectively to 26,864 hospital encounters from July 2020 to June 2021.
The prospective performance was compared to the model's retrospective validation performance using data from July 2019 to June 2020.
Key performance metrics included Area Under the Receiver Operating Characteristic Curve (AUROC) and Brier score. Temporal and infrastructure shifts were analyzed as contributors to the performance gap.

Main Results:

The model achieved a prospective AUROC of 0.767 (95% CI: 0.737, 0.801) and a Brier score of 0.189 (95% CI: 0.186, 0.191).
Retrospective validation showed an AUROC of 0.778 (95% CI: 0.744, 0.815) and a Brier score of 0.163 (95% CI: 0.161, 0.165).
A performance gap was observed, primarily driven by infrastructure shift rather than temporal shift.

Conclusions:

Prospective performance of risk stratification models can differ from retrospective validation, with infrastructure shifts being a significant factor.
The study highlights the critical impact of data access, extraction, and transformation processes on model performance in real-world clinical settings.
Future model development and validation should account for and mitigate the effects of data infrastructure differences to ensure reliable prospective performance.