Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Bootstrapping

Bootstrapping

The term "bootstrap" originated in the 19th century as a metaphor for self-improvement or achieving something independently, without external assistance. This concept extends to statistical bootstrapping, a self-contained method for estimating population parameters through resampling, even though it can be computationally intensive. Developed by the American statistician Dr. Bradley Efron in 1979, bootstrapping provides a robust way to perform inference when the original sample size is...

Distributions to Estimate Population Parameter

Distributions to Estimate Population Parameter

The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...

Choosing Between z and t Distribution

Choosing Between z and t Distribution

The z and the Student t distribution estimate the population mean using the sample mean and standard deviation. However, to decide which distribution to use for a calculation, one needs to determine the sample size, the nature of the distribution, and whether the population standard deviation is known. If the population standard deviation is known and the population is normally distributed, or if the sample size is greater than 30, the z distribution is preferred. The Student t distribution is...

Binomial Probability Distribution

Binomial Probability Distribution

A binomial distribution is a probability distribution for a procedure with a fixed number of trials, where each trial can have only two outcomes.
The outcomes of a binomial experiment fit a binomial probability distribution. A statistical experiment can be classified as a binomial experiment if the following conditions are met:
There are a fixed number of trials. Think of trials as repetitions of an experiment. The letter n denotes the number of trials.
There are only two possible outcomes,...

Parametric Survival Analysis: Weibull and Exponential Methods

Parametric Survival Analysis: Weibull and Exponential Methods

Parametric survival analysis models survival data by assuming a specific probability distribution for the time until an event occurs. The Weibull and exponential distributions are two of the most commonly used methods in this context, due to their versatility and relatively straightforward application.
Weibull Distribution
The Weibull distribution is a flexible model used in parametric survival analysis. It can handle both increasing and decreasing hazard rates, depending on its shape parameter...

One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for ka Estimation

One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for k_a Estimation

This lesson introduces two critical methods in pharmacokinetics, the Wagner-Nelson and Loo-Riegelman methods, used for estimating the absorption rate constant (ka) for drugs administered via non-intravenous routes. The Wagner-Nelson method relates ka to the plasma concentration derived from the slope of a semilog percent unabsorbed time plot. However, it is limited to drugs with one-compartment kinetics and can be impacted by factors like gastrointestinal motility or enzymatic degradation.
On...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same authorSame journal

Sequential Gibbs posteriors with applications to principal component analysis.

Biometrika·2026

Same author

Enhancing the Interfacial Adhesion by a Novel Benzofuran-Substituted Self-Assembled Molecules for Thermal Cycle Stable Perovskite Solar Cells and Modules.

Small (Weinheim an der Bergstrasse, Germany)·2026

Same author

Scalable and robust regression models for continuous proportional data.

Journal of the American Statistical Association·2026

Same author

Local graph estimation with pathwise false discovery control.

Nature communications·2026

Same author

Integration of Cervical Length, Inflammatory Marker, and Vaginal Biomarkers (PAMG-1 and fFN) in the Diagnosis of Threatened Preterm Labor.

Iranian journal of allergy, asthma, and immunology·2026

Same author

Multifunctional Additives Suppressed Phase Segregation of Wide-Bandgap Perovskites for Semitransparent Solar Cells.

ChemSusChem·2026

Same journal

Individualized dynamic latent factor model for multi-resolutional data with application to mobile health.

Biometrika·2026

Same journal

Functional principal component analysis forsparse censored data.

Biometrika·2026

Same journal

Finding distributions that differ, with false discovery rate control.

Biometrika·2026

Same journal

Comparing causal parameters with many treatments and positivity violations.

Biometrika·2026

Same journal

Leveraging External Data for Testing Experimental Therapies with Biomarker Interactions in Randomized Clinical Trials.

Biometrika·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Nov 21, 2025

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

Efficient posterior sampling for high-dimensional imbalanced logistic regression.

Deborshee Sen¹, Matthias Sachs², Jianfeng Lu²

¹Department of Statistical Science, Duke University, Box 90251, Durham, North Carolina 27708, U.S.A.

|January 19, 2021

Summary

This summary is machine-generated.

This study introduces improved Bayesian classification methods for high-dimensional, imbalanced data. New algorithms enhance computational efficiency and accuracy, outperforming existing techniques in simulations and cancer data analysis.

Keywords:

Imbalanced data Logistic regression Piecewise-deterministic Markov process Scalable inference Subsampling

More Related Videos

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Related Experiment Videos

Last Updated: Nov 21, 2025

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Area of Science:

Statistics
Machine Learning
Computational Biology

Background:

High-dimensional data classification is crucial but challenging, especially with imbalanced datasets.
Current Bayesian classification methods using Markov chain Monte Carlo (MCMC) are computationally inefficient for large datasets due to slow mixing rates and high computational cost per step.
Standard subsampling techniques for efficiency fail with imbalanced data.

Purpose of the Study:

To develop efficient Bayesian classification algorithms for high-dimensional and imbalanced data.
To overcome the computational limitations of traditional MCMC methods in large-scale Bayesian classification.
To address the breakdown of standard subsampling in imbalanced data scenarios.

Main Methods:

Generalization of piecewise-deterministic Markov chain Monte Carlo (PD-MCMC) algorithms.
Incorporation of importance-weighted and mini-batch subsampling strategies.
Theoretical analysis and validation through simulated data and a real-world cancer dataset.

Main Results:

The proposed generalized PD-MCMC algorithms maintain correct stationary distributions even with small subsamples.
These novel methods demonstrate substantial performance gains over existing competitors.
The approach shows effectiveness in both simulated scenarios and practical application to cancer data.

Conclusions:

The developed importance-weighted and mini-batch subsampling for PD-MCMC offers a robust solution for Bayesian classification with high-dimensional, imbalanced data.
This approach significantly improves computational efficiency and classification accuracy.
The methods provide a valuable tool for analyzing complex biological datasets, such as cancer data.