Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Testing a Claim about Population Proportion01:24

Testing a Claim about Population Proportion

3.3K
A complete procedure for testing a claim about a population proportion is provided here.
There are two methods of testing a claim about a population proportion: (1) Using the sample proportion from the data where a binomial distribution is approximated to the normal distribution and (2) Using the binomial probabilities calculated from the data.
The first method uses normal distribution as an approximation to the binomial distribution. The requirements are as follows: sample size is large...
3.3K
Receiver Operating Characteristic Plot01:15

Receiver Operating Characteristic Plot

106
A ROC (Receiver Operating Characteristic) plot is a graphical tool used to assess the performance of a binary classification model by illustrating the trade-off between sensitivity (true positive rate) and specificity (false positive rate). By plotting sensitivity against 1 - specificity across various threshold settings, the ROC curve shows how well the model distinguishes between classes, with a curve closer to the top-left corner indicating a more accurate model. The area under the ROC curve...
106
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

2.5K
A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).
2.5K
Classification of Signals01:30

Classification of Signals

424
In signal processing, signals are classified based on various characteristics: continuous-time versus discrete-time, periodic versus aperiodic, analog versus digital, and causal versus noncausal. Each category highlights distinct properties crucial for understanding and manipulating signals.
A continuous-time signal holds a value at every instant in time, representing information seamlessly. In contrast, a discrete-time signal holds values only at specific moments, often denoted as x(n), where...
424
Comparing the Survival Analysis of Two or More Groups01:20

Comparing the Survival Analysis of Two or More Groups

164
Survival analysis is a cornerstone of medical research, used to evaluate the time until an event of interest occurs, such as death, disease recurrence, or recovery. Unlike standard statistical methods, survival analysis is particularly adept at handling censored data—instances where the event has not occurred for some participants by the end of the study or remains unobserved. To address these unique challenges, specialized techniques like the Kaplan-Meier estimator, log-rank test, and...
164
Sensitivity, Specificity, and Predicted Value01:13

Sensitivity, Specificity, and Predicted Value

228
In healthcare diagnostics, laboratory tests play a crucial role in identifying and diagnosing a wide range of medical conditions. However, interpreting test results is not always straightforward. An abnormal test result does not always confirm the presence of a disease, just as a normal result does not guarantee its absence. To assess the reliability of these diagnostic tools, healthcare practitioners rely on two key statistical indicators: sensitivity and specificity.
Sensitivity is the...
228

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Outcome-Assisted Multiple Imputation of Missing Treatments.

Observational studies·2026
Same author

Optimal <i>F</i>-score Matching for Bipartite Record Linkage.

Statistics and computing·2026
Same author

Fully Synthetic Data for Complex Surveys.

Survey methodology·2025
Same author

Studying Chinese immigrants' spatial distribution in the Raleigh-Durham area by linking survey and commercial data using romanized names.

Journal of the Royal Statistical Society. Series A, (Statistics in Society)·2025
Same author

The association between long-term PM2.5 exposure and risk for pancreatic cancer: an application of social informatics.

American journal of epidemiology·2024
Same author

Regression-Assisted Bayesian Record Linkage for Causal Inference in Observational Studies with Covariates Spread Over Two Files.

Journal of statistical planning and inference·2024
Same journal

Can the All of Us sample be reweighted to mirror a nationally representative sample? A comparison of mortality predictors.

Epidemiology (Cambridge, Mass.)·2026
Same journal

Gut health, systemic inflammation, and linear growth among Indonesian infants: findings from the Action Against Stunting Hub observation cohort: Erratum.

Epidemiology (Cambridge, Mass.)·2026
Same journal

Evaluating Estimators in Partially Identified Models.

Epidemiology (Cambridge, Mass.)·2026
Same journal

Stratification and accumulation? Explaining changing mortality inequities between business owners and non-owners in the U.S. (1984-2022).

Epidemiology (Cambridge, Mass.)·2026
Same journal

Be wary of age-stratum aging in early-onset cancer trends.

Epidemiology (Cambridge, Mass.)·2026
Same journal

The Authors Respond.

Epidemiology (Cambridge, Mass.)·2026
See all related articles

Related Experiment Video

Updated: Jun 17, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.4K

Evaluating Binary Outcome Classifiers Estimated from Survey Data.

Adway S Wadekar1, Jerome P Reiter

  • 1From the Department of Statistical Science, Duke University, Durham, NC.

Epidemiology (Cambridge, Mass.)
|August 14, 2024
PubMed
Summary
This summary is machine-generated.

Using survey weights improves predictive model evaluation on complex survey data. Weighted metrics accurately reflect population performance, unlike unweighted metrics, especially with class imbalance mitigation.

More Related Videos

An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.0K
Cross-Modal Multivariate Pattern Analysis
13:51

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

19.9K

Related Experiment Videos

Last Updated: Jun 17, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.4K
An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.0K
Cross-Modal Multivariate Pattern Analysis
13:51

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

19.9K

Area of Science:

  • Epidemiology
  • Health Sciences
  • Social and Behavioral Sciences

Background:

  • Surveys are vital research tools but often use complex sampling designs, not simple random samples.
  • Survey respondents are typically assigned weights to account for unequal selection probabilities.
  • Evaluating predictive models on survey data requires careful consideration of these complex designs.

Purpose of the Study:

  • To demonstrate the benefit of using survey weights for assessing predictive model quality.
  • To compare weighted versus unweighted performance metrics on complex survey data.
  • To evaluate the impact of weighting on models trained with class imbalance mitigation.

Main Methods:

  • Characterized model assessment statistics (e.g., sensitivity, specificity) as finite population quantities.
  • Computed survey-weighted estimates using random subsets of original survey data for testing.
  • Conducted simulations using data from the National Survey on Drug Use and Health and National Comorbidity Survey.

Main Results:

  • Unweighted metrics using sample test data can inaccurately represent population performance.
  • Weighted metrics appropriately adjust for complex sampling designs, providing accurate population estimates.
  • The benefit of weighted metrics persists even when models are trained using upsampling for class imbalance.

Conclusions:

  • Survey weights are crucial for accurate predictive model performance evaluation on complex survey data.
  • Weighted metrics provide a more reliable assessment of model generalizability to the target population.
  • Researchers should adopt weighted metrics when evaluating models trained or tested on complex survey datasets.