Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Bootstrapping01:24

Bootstrapping

883
The term "bootstrap" originated in the 19th century as a metaphor for self-improvement or achieving something independently, without external assistance. This concept extends to statistical bootstrapping, a self-contained method for estimating population parameters through resampling, even though it can be computationally intensive. Developed by the American statistician Dr. Bradley Efron in 1979, bootstrapping provides a robust way to perform inference when the original sample size is...
883
Censoring Survival Data01:09

Censoring Survival Data

625
Survival analysis is a statistical method used to analyze time-to-event data, often employed in fields such as medicine, engineering, and social sciences. One of the key challenges in survival analysis is dealing with incomplete data, a phenomenon known as "censoring." Censoring occurs when the event of interest (such as death, relapse, or system failure) has not occurred for some individuals by the end of the study period or is otherwise unobservable, and it might have many different...
625
Survival Tree01:19

Survival Tree

457
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
457
Distributions to Estimate Population Parameter01:26

Distributions to Estimate Population Parameter

5.3K
The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...
5.3K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

4.3K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
4.3K
Choosing Between z and t Distribution01:25

Choosing Between z and t Distribution

3.7K
The z and the Student t distribution estimate the population mean using the sample mean and standard deviation. However, to decide which distribution to use for a calculation, one needs to determine the sample size, the nature of the distribution, and whether the population standard deviation is known. If the population standard deviation is known and the population is normally distributed, or if the sample size is greater than 30, the z distribution is preferred. The Student t distribution is...
3.7K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Heterogeneity-aware Clustered Distributed Learning for Multi-source Data Analysis.

Journal of machine learning research : JMLR·2026
Same author

Medicare Insurance Type and Broad Genomic Profiling in Metastatic Cancer.

JAMA network open·2026
Same author

Doubly Robust Estimators of the Restricted Mean Time in Favor Estimands in Individual- and Cluster-Randomized Trials.

Statistics in medicine·2026
Same author

JOINT IDENTIFICATION OF SPATIALLY VARIABLE GENES VIA A NETWORK-ASSISTED BAYESIAN REGULARIZATION APPROACH.

The annals of applied statistics·2026
Same author

Subgroup Analysis of Differential Networks with Latent Variables.

Statistics and computing·2026
Same author

Robust Heterogeneity Adjustment for Gaussian Graphical Model With Latent Variables.

Statistics in medicine·2026
Same journal

Ensuring Quality in Preclinical Research: The Importance of Being Human.

Biometrical journal. Biometrische Zeitschrift·2026
Same journal

Addressing Cluster-Level Treatment Effect Heterogeneity in Sample Size Determination for Hierarchical 2 × 2 Factorial Designs.

Biometrical journal. Biometrische Zeitschrift·2026
Same journal

A Multiple Imputation Approach to Distinguish Curative From Life-Prolonging Effects in the Presence of Missing Covariates.

Biometrical journal. Biometrische Zeitschrift·2026
Same journal

Tests for Categorical Data Beyond Pearson: A Distance Covariance and Energy Distance Approach.

Biometrical journal. Biometrische Zeitschrift·2026
Same journal

Nonparametric Estimation of the Patient-Weighted While-Alive Estimand.

Biometrical journal. Biometrische Zeitschrift·2026
Same journal

Two-Stage Multiple Test Procedures Controlling False Discovery Rate With Auxiliary Variable and Their Application to Set4 <math><semantics><mi>Δ</mi> <annotation>$\Delta$</annotation></semantics></math> Mutant Data.

Biometrical journal. Biometrische Zeitschrift·2026
See all related articles

Related Experiment Video

Updated: Mar 11, 2026

An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.7K

Analyzing large datasets with bootstrap penalization.

Kuangnan Fang1, Shuangge Ma1,2

  • 1Department of Statistics, Xiamen University, Xiamen, Fujian, China.

Biometrical Journal. Biometrische Zeitschrift
|November 22, 2016
PubMed
Summary
This summary is machine-generated.

Bootstrap penalization offers a computationally efficient solution for analyzing large datasets. This method breaks down complex penalized estimation into smaller, parallelizable tasks, reducing the need for high-performance computing resources.

Keywords:
BootstrapComputational feasibilityLarge datasetsPenalization

More Related Videos

A User-friendly and Powerful R Analysis of Large-scale Datasets
10:56

A User-friendly and Powerful R Analysis of Large-scale Datasets

Published on: November 4, 2025

438
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

8.2K

Related Experiment Videos

Last Updated: Mar 11, 2026

An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.7K
A User-friendly and Powerful R Analysis of Large-scale Datasets
10:56

A User-friendly and Powerful R Analysis of Large-scale Datasets

Published on: November 4, 2025

438
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

8.2K

Area of Science:

  • Statistical computing
  • Machine learning
  • Data science

Background:

  • Large-scale datasets (high dimensions/sample sizes) are increasingly common.
  • Standard penalized estimation methods require significant computational resources.
  • Need for computationally feasible methods for big data analysis.

Purpose of the Study:

  • To develop a computationally efficient penalized estimation method for large datasets.
  • To introduce bootstrap penalization as an alternative to straightforward penalization.
  • To provide strategies tailored to different data characteristics (large p, large n, or both).

Main Methods:

  • Bootstrap penalization dissects large penalized estimation into smaller, parallelizable tasks.
  • For large p/small n data: covariate block clustering and sequential penalization.
  • For large n/small p data: subject bootstrapping and weighted averaging.
  • For large p and large n data: combination of the above strategies.

Main Results:

  • The proposed bootstrap penalization demonstrates computational and numerical advantages.
  • Effectively reduces computational burden compared to standard penalization.
  • Validated through simulations and real-world data analysis.

Conclusions:

  • Bootstrap penalization provides an efficient and scalable approach for penalized estimation on large datasets.
  • The method is adaptable to various data structures.
  • An R package is available for practical implementation.