Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Testing a Claim about Population Proportion

Testing a Claim about Population Proportion

A complete procedure for testing a claim about a population proportion is provided here.
There are two methods of testing a claim about a population proportion: (1) Using the sample proportion from the data where a binomial distribution is approximated to the normal distribution and (2) Using the binomial probabilities calculated from the data.
The first method uses normal distribution as an approximation to the binomial distribution. The requirements are as follows: sample size is large...

Receiver Operating Characteristic Plot

Receiver Operating Characteristic Plot

A ROC (Receiver Operating Characteristic) plot is a graphical tool used to assess the performance of a binary classification model by illustrating the trade-off between sensitivity (true positive rate) and specificity (false positive rate). By plotting sensitivity against 1 - specificity across various threshold settings, the ROC curve shows how well the model distinguishes between classes, with a curve closer to the top-left corner indicating a more accurate model. The area under the ROC curve...

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

Classification of Signals

Classification of Signals

In signal processing, signals are classified based on various characteristics: continuous-time versus discrete-time, periodic versus aperiodic, analog versus digital, and causal versus noncausal. Each category highlights distinct properties crucial for understanding and manipulating signals.
A continuous-time signal holds a value at every instant in time, representing information seamlessly. In contrast, a discrete-time signal holds values only at specific moments, often denoted as x(n), where...

Comparing the Survival Analysis of Two or More Groups

Comparing the Survival Analysis of Two or More Groups

Survival analysis is a cornerstone of medical research, used to evaluate the time until an event of interest occurs, such as death, disease recurrence, or recovery. Unlike standard statistical methods, survival analysis is particularly adept at handling censored data—instances where the event has not occurred for some participants by the end of the study or remains unobserved. To address these unique challenges, specialized techniques like the Kaplan-Meier estimator, log-rank test, and...

Sensitivity, Specificity, and Predicted Value

Sensitivity, Specificity, and Predicted Value

In healthcare diagnostics, laboratory tests play a crucial role in identifying and diagnosing a wide range of medical conditions. However, interpreting test results is not always straightforward. An abnormal test result does not always confirm the presence of a disease, just as a normal result does not guarantee its absence. To assess the reliability of these diagnostic tools, healthcare practitioners rely on two key statistical indicators: sensitivity and specificity.
Sensitivity is the...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Outcome-Assisted Multiple Imputation of Missing Treatments.

Observational studies·2026

Same author

Optimal <i>F</i>-score Matching for Bipartite Record Linkage.

Statistics and computing·2026

Same author

Fully Synthetic Data for Complex Surveys.

Survey methodology·2025

Same author

Studying Chinese immigrants' spatial distribution in the Raleigh-Durham area by linking survey and commercial data using romanized names.

Journal of the Royal Statistical Society. Series A, (Statistics in Society)·2025

Same author

The association between long-term PM2.5 exposure and risk for pancreatic cancer: an application of social informatics.

American journal of epidemiology·2024

Same author

Regression-Assisted Bayesian Record Linkage for Causal Inference in Observational Studies with Covariates Spread Over Two Files.

Journal of statistical planning and inference·2024

Same journal

Can the All of Us sample be reweighted to mirror a nationally representative sample? A comparison of mortality predictors.

Epidemiology (Cambridge, Mass.)·2026

Same journal

Gut health, systemic inflammation, and linear growth among Indonesian infants: findings from the Action Against Stunting Hub observation cohort: Erratum.

Epidemiology (Cambridge, Mass.)·2026

Same journal

Evaluating Estimators in Partially Identified Models.

Epidemiology (Cambridge, Mass.)·2026

Same journal

Stratification and accumulation? Explaining changing mortality inequities between business owners and non-owners in the U.S. (1984-2022).

Epidemiology (Cambridge, Mass.)·2026

Same journal

Be wary of age-stratum aging in early-onset cancer trends.

Epidemiology (Cambridge, Mass.)·2026

Same journal

The Authors Respond.

Epidemiology (Cambridge, Mass.)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 17, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Evaluating Binary Outcome Classifiers Estimated from Survey Data.

Adway S Wadekar¹, Jerome P Reiter

¹From the Department of Statistical Science, Duke University, Durham, NC.

Epidemiology (Cambridge, Mass.)

|August 14, 2024

Summary

This summary is machine-generated.

Using survey weights improves predictive model evaluation on complex survey data. Weighted metrics accurately reflect population performance, unlike unweighted metrics, especially with class imbalance mitigation.

More Related Videos

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Cross-Modal Multivariate Pattern Analysis

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

Related Experiment Videos

Last Updated: Jun 17, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Cross-Modal Multivariate Pattern Analysis

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

Area of Science:

Epidemiology
Health Sciences
Social and Behavioral Sciences

Background:

Surveys are vital research tools but often use complex sampling designs, not simple random samples.
Survey respondents are typically assigned weights to account for unequal selection probabilities.
Evaluating predictive models on survey data requires careful consideration of these complex designs.

Purpose of the Study:

To demonstrate the benefit of using survey weights for assessing predictive model quality.
To compare weighted versus unweighted performance metrics on complex survey data.
To evaluate the impact of weighting on models trained with class imbalance mitigation.

Main Methods:

Characterized model assessment statistics (e.g., sensitivity, specificity) as finite population quantities.
Computed survey-weighted estimates using random subsets of original survey data for testing.
Conducted simulations using data from the National Survey on Drug Use and Health and National Comorbidity Survey.

Main Results:

Unweighted metrics using sample test data can inaccurately represent population performance.
Weighted metrics appropriately adjust for complex sampling designs, providing accurate population estimates.
The benefit of weighted metrics persists even when models are trained using upsampling for class imbalance.

Conclusions:

Survey weights are crucial for accurate predictive model performance evaluation on complex survey data.
Weighted metrics provide a more reliable assessment of model generalizability to the target population.
Researchers should adopt weighted metrics when evaluating models trained or tested on complex survey datasets.