Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Sample Size Calculation

Sample Size Calculation

Knowledge of the sample size is the first requirement to conduct random sampling or an experiment. The sample size is the total number of units, observations, or groups (in some cases) used to get the data to estimate a population parameter. As the name suggests, the sample size is that of the sample drawn from the population and differs from the population size.
The sample size for the given experiment or sampling effort is fundamental to any study design. Sample size decides the number of...

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a survival tree begins...

Prediction Intervals

Prediction Intervals

The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y.
The...

Choosing Between z and t Distribution

Choosing Between z and t Distribution

The z and the Student t distribution estimate the population mean using the sample mean and standard deviation. However, to decide which distribution to use for a calculation, one needs to determine the sample size, the nature of the distribution, and whether the population standard deviation is known. If the population standard deviation is known and the population is normally distributed, or if the sample size is greater than 30, the z distribution is preferred. The Student t distribution is...

Bootstrapping

Bootstrapping

The term "bootstrap" originated in the 19th century as a metaphor for self-improvement or achieving something independently, without external assistance. This concept extends to statistical bootstrapping, a self-contained method for estimating population parameters through resampling, even though it can be computationally intensive. Developed by the American statistician Dr. Bradley Efron in 1979, bootstrapping provides a robust way to perform inference when the original sample size is small or...

Distributions to Estimate Population Parameter

Distributions to Estimate Population Parameter

The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Stepwise Protocol for Alternative Splicing Analysis in Single-Cell SMART-Seq2 RNA-Seq Data.

Bio-protocol·2026

Same author

BSA@Cu<sub>3</sub>(PO<sub>4</sub>)<sub>2</sub> hybrid nanoflower as bioinspired tyrosinase-mimicking nanozyme for production of amino acid surfactant from caffeate.

Food chemistry·2026

Same author

Vagal nerve innervation divergence in liver/pancreas: A forgotten key to endocrine recovery after transplantation?

World journal of transplantation·2026

Same author

Triggering receptor expressed on myeloid cells 2-driven pancreatic macrophage crosstalk: Key regulator of obesity pathophysiology and metabolic dysregulation.

World journal of biological chemistry·2026

Same author

Childhood Ayme-Gripp syndrome: A case report.

The Journal of international medical research·2026

Same author

When MicroRNAs meet hypoxic pulmonary hypertension.

Biomedicine & pharmacotherapy = Biomedecine & pharmacotherapie·2026

Same journal

A Bayesian functional concurrent zero-inflated Dirichlet-multinomial regression model with application to infant microbiome.

Biostatistics (Oxford, England)·2026

Same journal

Towards optimal environmental policies: policy learning under arbitrary bipartite network interference.

Biostatistics (Oxford, England)·2026

Same journal

Multilevel functional quantile principal component analysis.

Biostatistics (Oxford, England)·2026

Same journal

Adaptive transfer learning for time-to-event modeling with applications in disease risk assessment.

Biostatistics (Oxford, England)·2026

Same journal

High-dimensional test for one-sided hypotheses.

Biostatistics (Oxford, England)·2026

Same journal

NBSR: a Negative Binomial Softmax Regression model for microRNA-seq data analysis.

Biostatistics (Oxford, England)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 9, 2026

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Sample size requirements for training high-dimensional risk predictors.

Kevin K Dobbin¹, Xiao Song

¹College of Public Health, University of Georgia, 101 Buck Road, Athens, GA 30602, USA.

Biostatistics (Oxford, England)

|July 23, 2013

Summary

This summary is machine-generated.

Developing accurate patient survival predictors requires determining the optimal number of samples. This study introduces a new non-parametric sample size method for survival data, improving study design for biomarker discovery.

Keywords:

Conditional score Cox regression High-dimensional data Risk prediction Sample size Training set

Related Experiment Videos

Last Updated: May 9, 2026

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Area of Science:

Biostatistics
Bioinformatics
Medical Informatics

Background:

Biomarker studies commonly aim to predict patient survival outcomes.
Accurate sample size determination is crucial for designing effective biomarker studies.
Existing methods for sample size calculation in survival analysis are limited by parametric assumptions and inability to handle right-censored data.

Purpose of the Study:

To develop a novel, non-parametric sample size calculation method for training survival predictors.
To address the limitations of existing methods in handling high-dimensional data and right-censored outcomes.
To ensure the expected performance of a trained predictor is within a specified tolerance of optimal performance.

Main Methods:

A non-parametric approach for sample size determination applicable to various prediction algorithms.
Utilizing a pilot dataset to determine the required sample size.
Developing a method for constructing confidence intervals to quantify uncertainty in performance tolerance.
Presenting an alternative model-based method for sample size estimation when pilot data is insufficient, using a specified covariance matrix.

Main Results:

The proposed method is non-parametric regarding high-dimensional vectors and handles right-censored survival data.
Sample size is determined to achieve a predictor performance within a user-defined tolerance of optimal.
A confidence interval method is developed to assess the uncertainty of the performance tolerance.
The identity covariance matrix is shown to provide adequate sample size in the model-based approach under specific user-defined quantities.

Conclusions:

The presented non-parametric sample size method offers a robust approach for designing biomarker studies predicting survival outcomes.
The method accommodates right-censored data and is adaptable to different prediction algorithms.
Both pilot dataset-based and model-based approaches are provided to estimate sample size requirements, enhancing study design flexibility.