Optimal two-phase sampling design for comparing accuracies of two binary classification rules | JoVE Visualize

Area of Science:

Statistical methodology
Machine learning evaluation

Background:

Comparing binary classification rules (e.g., record linkage, screening tests) is crucial for performance assessment.
Existing statistical methods often lack optimized sampling schemes, potentially leading to higher variance in performance estimates.
The gold standard is typically required for all units or a subsample in two-phase studies, but these schemes are not optimized for variance reduction.

Purpose of the Study:

To develop and evaluate an optimal sampling design for comparing the performance of two binary classification rules.
To minimize the variance of estimators for key accuracy measures, including sensitivity, specificity, and positive predictive values.
To provide a method for optimizing sampling schemes when comparing classification algorithms or diagnostic tests.

Main Methods:

Derived analytic variance formulas for estimates of differences in sensitivity, specificity, and positive predictive values.
Developed an optimal sampling design based on these variance formulas.
Conducted an empirical investigation comparing the optimal design with simple random sampling and proportional allocation.
Applied the optimal sampling strategy to a real-world record linkage case study.

Main Results:

The optimal sampling design is similar for estimating differences in sensitivities and specificities.
Significant variance reduction was achieved by over-sampling subjects with discordant results and under-sampling those with concordant results.
The empirical study demonstrated the efficiency of the optimal sampling design compared to traditional methods.
A heuristic rule was proposed for situations with limited prior knowledge of classification rule performance or prevalence.

Conclusions:

The proposed optimal sampling design effectively reduces variance in comparing binary classification rules.
This approach enhances the accuracy and efficiency of evaluating record linkage algorithms and screening tests.
The findings offer practical guidance for designing studies that compare classification performance, particularly when optimizing resource allocation.
The optimal sampling strategy is valuable for real-world applications requiring precise performance comparisons.