Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Bootstrapping

Bootstrapping

The term "bootstrap" originated in the 19th century as a metaphor for self-improvement or achieving something independently, without external assistance. This concept extends to statistical bootstrapping, a self-contained method for estimating population parameters through resampling, even though it can be computationally intensive. Developed by the American statistician Dr. Bradley Efron in 1979, bootstrapping provides a robust way to perform inference when the original sample size is...

Censoring Survival Data

Censoring Survival Data

Survival analysis is a statistical method used to analyze time-to-event data, often employed in fields such as medicine, engineering, and social sciences. One of the key challenges in survival analysis is dealing with incomplete data, a phenomenon known as "censoring." Censoring occurs when the event of interest (such as death, relapse, or system failure) has not occurred for some individuals by the end of the study period or is otherwise unobservable, and it might have many different...

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a...

Distributions to Estimate Population Parameter

Distributions to Estimate Population Parameter

The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Choosing Between z and t Distribution

Choosing Between z and t Distribution

The z and the Student t distribution estimate the population mean using the sample mean and standard deviation. However, to decide which distribution to use for a calculation, one needs to determine the sample size, the nature of the distribution, and whether the population standard deviation is known. If the population standard deviation is known and the population is normally distributed, or if the sample size is greater than 30, the z distribution is preferred. The Student t distribution is...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Heterogeneity-aware Clustered Distributed Learning for Multi-source Data Analysis.

Journal of machine learning research : JMLR·2026

Same author

Medicare Insurance Type and Broad Genomic Profiling in Metastatic Cancer.

JAMA network open·2026

Same author

Doubly Robust Estimators of the Restricted Mean Time in Favor Estimands in Individual- and Cluster-Randomized Trials.

Statistics in medicine·2026

Same author

JOINT IDENTIFICATION OF SPATIALLY VARIABLE GENES VIA A NETWORK-ASSISTED BAYESIAN REGULARIZATION APPROACH.

The annals of applied statistics·2026

Same author

Subgroup Analysis of Differential Networks with Latent Variables.

Statistics and computing·2026

Same author

Robust Heterogeneity Adjustment for Gaussian Graphical Model With Latent Variables.

Statistics in medicine·2026

Same journal

Ensuring Quality in Preclinical Research: The Importance of Being Human.

Biometrical journal. Biometrische Zeitschrift·2026

Same journal

Addressing Cluster-Level Treatment Effect Heterogeneity in Sample Size Determination for Hierarchical 2 × 2 Factorial Designs.

Biometrical journal. Biometrische Zeitschrift·2026

Same journal

A Multiple Imputation Approach to Distinguish Curative From Life-Prolonging Effects in the Presence of Missing Covariates.

Biometrical journal. Biometrische Zeitschrift·2026

Same journal

Tests for Categorical Data Beyond Pearson: A Distance Covariance and Energy Distance Approach.

Biometrical journal. Biometrische Zeitschrift·2026

Same journal

Nonparametric Estimation of the Patient-Weighted While-Alive Estimand.

Biometrical journal. Biometrische Zeitschrift·2026

Same journal

Two-Stage Multiple Test Procedures Controlling False Discovery Rate With Auxiliary Variable and Their Application to Set4 <math><semantics><mi>Δ</mi> <annotation>$\Delta$</annotation></semantics></math> Mutant Data.

Biometrical journal. Biometrische Zeitschrift·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Mar 11, 2026

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Analyzing large datasets with bootstrap penalization.

Kuangnan Fang¹, Shuangge Ma^1,2

¹Department of Statistics, Xiamen University, Xiamen, Fujian, China.

Biometrical Journal. Biometrische Zeitschrift

|November 22, 2016

Summary

This summary is machine-generated.

Bootstrap penalization offers a computationally efficient solution for analyzing large datasets. This method breaks down complex penalized estimation into smaller, parallelizable tasks, reducing the need for high-performance computing resources.

Keywords:

Bootstrap Computational feasibility Large datasets Penalization

More Related Videos

A User-friendly and Powerful R Analysis of Large-scale Datasets

A User-friendly and Powerful R Analysis of Large-scale Datasets

Published on: November 4, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Related Experiment Videos

Last Updated: Mar 11, 2026

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

A User-friendly and Powerful R Analysis of Large-scale Datasets

A User-friendly and Powerful R Analysis of Large-scale Datasets

Published on: November 4, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Area of Science:

Statistical computing
Machine learning
Data science

Background:

Large-scale datasets (high dimensions/sample sizes) are increasingly common.
Standard penalized estimation methods require significant computational resources.
Need for computationally feasible methods for big data analysis.

Purpose of the Study:

To develop a computationally efficient penalized estimation method for large datasets.
To introduce bootstrap penalization as an alternative to straightforward penalization.
To provide strategies tailored to different data characteristics (large p, large n, or both).

Main Methods:

Bootstrap penalization dissects large penalized estimation into smaller, parallelizable tasks.
For large p/small n data: covariate block clustering and sequential penalization.
For large n/small p data: subject bootstrapping and weighted averaging.
For large p and large n data: combination of the above strategies.

Main Results:

The proposed bootstrap penalization demonstrates computational and numerical advantages.
Effectively reduces computational burden compared to standard penalization.
Validated through simulations and real-world data analysis.

Conclusions:

Bootstrap penalization provides an efficient and scalable approach for penalized estimation on large datasets.
The method is adaptable to various data structures.
An R package is available for practical implementation.