Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Sampling Plans01:23

Sampling Plans

1.5K
Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...
1.5K
Cluster Sampling Method01:20

Cluster Sampling Method

15.8K
Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
15.8K
Stratified Sampling Method01:16

Stratified Sampling Method

16.5K
Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a stratified sample, divide the population into groups called strata and then take a...
16.5K
Convenience Sampling Method00:55

Convenience Sampling Method

12.4K
Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Data are the result of sampling from a population. The sampling method ensures that samples are drawn without bias and accurately represent the population.
Convenience sampling is a non-random method of sample selection; this method selects individuals that are easily accessible and may result in biased data. For example, a marketing...
12.4K
Randomized Experiments01:13

Randomized Experiments

9.4K
The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...
9.4K
Bootstrapping01:24

Bootstrapping

1.0K
The term "bootstrap" originated in the 19th century as a metaphor for self-improvement or achieving something independently, without external assistance. This concept extends to statistical bootstrapping, a self-contained method for estimating population parameters through resampling, even though it can be computationally intensive. Developed by the American statistician Dr. Bradley Efron in 1979, bootstrapping provides a robust way to perform inference when the original sample size is...
1.0K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Wavelet Decomposition-Based Genomic Analysis of the Human Electrocardiogram.

medRxiv : the preprint server for health sciences·2026
Same author

Quantifying Anterior Cruciate Ligament Injury Resilience: A Screening and Composite Score Framework.

Orthopaedic journal of sports medicine·2026
Same author

Estimating heterogeneous treatment effects for general responses.

Biometrics·2025
Same author

Using pre-training and interaction modeling for ancestry-specific disease prediction using multiomics data from the UK Biobank.

PloS one·2025
Same author

Annotation-free discovery of disease-relevant cells in single-cell datasets.

Science advances·2025
Same author

A statistical view of column subset selection.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2025
Same journal

Towards a Unified Theory for Semiparametric Data Fusion with Individual-Level Data.

Annals of statistics·2026
Same journal

One-Step Estimation of Differentiable Hilbert-Valued Parameters.

Annals of statistics·2026
Same journal

GENERALIZATION ERROR BOUNDS OF DYNAMIC TREATMENT REGIMES IN PENALIZED REGRESSION-BASED LEARNING.

Annals of statistics·2026
Same journal

EFFICIENT AND MULTIPLY ROBUST RISK ESTIMATION UNDER GENERAL FORMS OF DATASET SHIFT.

Annals of statistics·2026
Same journal

TESTING HIGH-DIMENSIONAL REGRESSION COEFFICIENTS IN LINEAR MODELS.

Annals of statistics·2026
Same journal

COUNTERFACTUAL INFERENCE IN SEQUENTIAL EXPERIMENTS.

Annals of statistics·2026
See all related articles

Related Experiment Video

Updated: Apr 19, 2026

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

15.5K

LOCAL CASE-CONTROL SAMPLING: EFFICIENT SUBSAMPLING IN IMBALANCED DATA SETS.

William Fithian1, Trevor Hastie1

  • 1Department of Statistics, Stanford University, 390 Serra Mall, Stanford, California 94305-4065, USA.

Annals of Statistics
|December 11, 2014
PubMed
Summary
This summary is machine-generated.

This study introduces an efficient subsampling method for imbalanced classification, improving logistic regression parameter estimation. The technique offers a consistent and more accurate alternative to standard case-control sampling.

Keywords:
Logistic regressioncase-control samplingsubsampling

More Related Videos

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

8.2K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

8.1K

Related Experiment Videos

Last Updated: Apr 19, 2026

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

15.5K
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

8.2K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

8.1K

Area of Science:

  • Machine Learning
  • Statistical Modeling
  • Data Science

Background:

  • Class imbalance in classification presents challenges for model parameter estimation.
  • Subsampling methods reduce computational costs but can inflate variance.
  • Standard case-control sampling may lack consistency under model misspecification.

Purpose of the Study:

  • To develop an efficient subsampling method for logistic regression in imbalanced datasets.
  • To improve parameter estimation accuracy and consistency compared to existing methods.
  • To address the trade-off between computational efficiency and statistical variance.

Main Methods:

  • Proposes an accept-reject scheme to adjust class balance locally in feature space.
  • Utilizes a pilot estimate to preferentially select rare examples.
  • Employs a post-hoc analytic adjustment to correct for biased subsampling.

Main Results:

  • The proposed method generalizes standard case-control sampling.
  • Achieves consistency for population risk-minimizing coefficients under pilot estimate consistency.
  • Demonstrates substantial performance improvements over standard case-control subsampling in simulations and real-world data.

Conclusions:

  • The novel subsampling technique offers an efficient and statistically robust approach for imbalanced classification.
  • Provides a consistent estimator that outperforms traditional methods, especially in severely imbalanced scenarios.
  • The method is simple, parallelizable, and adaptable for improved variance reduction.