Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Sampling Plans

Sampling Plans

Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...

Cluster Sampling Method

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

Stratified Sampling Method

Stratified Sampling Method

Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a stratified sample, divide the population into groups called strata and then take a...

Convenience Sampling Method

Convenience Sampling Method

Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Data are the result of sampling from a population. The sampling method ensures that samples are drawn without bias and accurately represent the population.
Convenience sampling is a non-random method of sample selection; this method selects individuals that are easily accessible and may result in biased data. For example, a marketing...

Randomized Experiments

Randomized Experiments

The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...

Bootstrapping

Bootstrapping

The term "bootstrap" originated in the 19th century as a metaphor for self-improvement or achieving something independently, without external assistance. This concept extends to statistical bootstrapping, a self-contained method for estimating population parameters through resampling, even though it can be computationally intensive. Developed by the American statistician Dr. Bradley Efron in 1979, bootstrapping provides a robust way to perform inference when the original sample size is...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Wavelet Decomposition-Based Genomic Analysis of the Human Electrocardiogram.

medRxiv : the preprint server for health sciences·2026

Same author

Quantifying Anterior Cruciate Ligament Injury Resilience: A Screening and Composite Score Framework.

Orthopaedic journal of sports medicine·2026

Same author

Estimating heterogeneous treatment effects for general responses.

Biometrics·2025

Same author

Using pre-training and interaction modeling for ancestry-specific disease prediction using multiomics data from the UK Biobank.

PloS one·2025

Same author

Annotation-free discovery of disease-relevant cells in single-cell datasets.

Science advances·2025

Same author

A statistical view of column subset selection.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2025

Same journal

Towards a Unified Theory for Semiparametric Data Fusion with Individual-Level Data.

Annals of statistics·2026

Same journal

One-Step Estimation of Differentiable Hilbert-Valued Parameters.

Annals of statistics·2026

Same journal

GENERALIZATION ERROR BOUNDS OF DYNAMIC TREATMENT REGIMES IN PENALIZED REGRESSION-BASED LEARNING.

Annals of statistics·2026

Same journal

EFFICIENT AND MULTIPLY ROBUST RISK ESTIMATION UNDER GENERAL FORMS OF DATASET SHIFT.

Annals of statistics·2026

Same journal

TESTING HIGH-DIMENSIONAL REGRESSION COEFFICIENTS IN LINEAR MODELS.

Annals of statistics·2026

Same journal

COUNTERFACTUAL INFERENCE IN SEQUENTIAL EXPERIMENTS.

Annals of statistics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Apr 19, 2026

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

LOCAL CASE-CONTROL SAMPLING: EFFICIENT SUBSAMPLING IN IMBALANCED DATA SETS.

William Fithian¹, Trevor Hastie¹

¹Department of Statistics, Stanford University, 390 Serra Mall, Stanford, California 94305-4065, USA.

Annals of Statistics

|December 11, 2014

Summary

This summary is machine-generated.

This study introduces an efficient subsampling method for imbalanced classification, improving logistic regression parameter estimation. The technique offers a consistent and more accurate alternative to standard case-control sampling.

Keywords:

Logistic regression case-control sampling subsampling

More Related Videos

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Related Experiment Videos

Last Updated: Apr 19, 2026

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Area of Science:

Machine Learning
Statistical Modeling
Data Science

Background:

Class imbalance in classification presents challenges for model parameter estimation.
Subsampling methods reduce computational costs but can inflate variance.
Standard case-control sampling may lack consistency under model misspecification.

Purpose of the Study:

To develop an efficient subsampling method for logistic regression in imbalanced datasets.
To improve parameter estimation accuracy and consistency compared to existing methods.
To address the trade-off between computational efficiency and statistical variance.

Main Methods:

Proposes an accept-reject scheme to adjust class balance locally in feature space.
Utilizes a pilot estimate to preferentially select rare examples.
Employs a post-hoc analytic adjustment to correct for biased subsampling.

Main Results:

The proposed method generalizes standard case-control sampling.
Achieves consistency for population risk-minimizing coefficients under pilot estimate consistency.
Demonstrates substantial performance improvements over standard case-control subsampling in simulations and real-world data.

Conclusions:

The novel subsampling technique offers an efficient and statistically robust approach for imbalanced classification.
Provides a consistent estimator that outperforms traditional methods, especially in severely imbalanced scenarios.
The method is simple, parallelizable, and adaptable for improved variance reduction.