Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Sampling Plans

Sampling Plans

Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...

Cluster Sampling Method

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

Sample Size Calculation

Sample Size Calculation

Knowledge of the sample size is the first requirement to conduct random sampling or an experiment. The sample size is the total number of units, observations, or groups (in some cases) used to get the data to estimate a population parameter. As the name suggests, the sample size is that of the sample drawn from the population and differs from the population size.
The sample size for the given experiment or sampling effort is fundamental to any study design. Sample size decides the number of...

Systematic Sampling Method

Systematic Sampling Method

Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Data are the result of sampling from a population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
Systematic sampling is one of the simplest methods...

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...

One-Way ANOVA: Unequal Sample Sizes

One-Way ANOVA: Unequal Sample Sizes

One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Wind-driven seed dispersal differentially promotes seed trapping and retention across alpine plants.

American journal of botany·2026

Same author

An integrated integral projection model (IPM<sup>2</sup>) to disentangle size-structured harvest and natural mortality.

The Journal of animal ecology·2025

Same author

Map of death: spatially explicit mortality of the grey wolf.

Proceedings. Biological sciences·2025

Same author

Fast maximum likelihood estimation for general hierarchical models.

Journal of applied statistics·2025

Same author

A cloudy forecast for species distribution models: Predictive uncertainties abound for California birds after a century of climate and land-use change.

Global change biology·2023

Same author

Concordant and opposing effects of climate and land-use change on avian assemblages in California's most transformed landscapes.

Science advances·2023

Same journal

A Bayesian functional concurrent zero-inflated Dirichlet-multinomial regression model with application to infant microbiome.

Biostatistics (Oxford, England)·2026

Same journal

Towards optimal environmental policies: policy learning under arbitrary bipartite network interference.

Biostatistics (Oxford, England)·2026

Same journal

Multilevel functional quantile principal component analysis.

Biostatistics (Oxford, England)·2026

Same journal

Adaptive transfer learning for time-to-event modeling with applications in disease risk assessment.

Biostatistics (Oxford, England)·2026

Same journal

High-dimensional test for one-sided hypotheses.

Biostatistics (Oxford, England)·2026

Same journal

NBSR: a Negative Binomial Softmax Regression model for microRNA-seq data analysis.

Biostatistics (Oxford, England)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 25, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

A simulation-approximation approach to sample size planning for high-dimensional classification studies.

Perry de Valpine¹, Hans-Marcus Bitter, Michael P S Brown

¹Department of Environmental Science, Policy, & Management, University of California, 137 Hilgard Hall No. 3114, Berkeley, CA 94720-3114, USA. pdevalpine@berkeley.edu

Biostatistics (Oxford, England)

|February 24, 2009

Summary

This summary is machine-generated.

This study introduces a new method to estimate generalization error in high-dimensional classification. The findings suggest many study designs may yield suboptimal patterns and lack statistical significance.

More Related Videos

Sampling Soils in a Heterogeneous Research Plot

Sampling Soils in a Heterogeneous Research Plot

Published on: January 7, 2019

Related Experiment Videos

Last Updated: Jun 25, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Sampling Soils in a Heterogeneous Research Plot

Sampling Soils in a Heterogeneous Research Plot

Published on: January 7, 2019

Area of Science:

Statistics
Machine Learning
Bioinformatics

Background:

High-dimensional classification studies with limited sample sizes are common.
Assessing the impact of sample size on study performance is crucial but challenging due to complex pattern discovery methods.

Purpose of the Study:

To develop an efficient method for estimating generalization error in high-dimensional classification.
To investigate how study design parameters influence classification performance and validation results.

Main Methods:

Combines Monte Carlo methods with novel approximations for linear discriminant analysis under multivariate normal distributions.
Compares Taylor series approximation of generalization error with normal distribution approximation of discriminant scores.
Utilizes full simulations to evaluate the developed method across various realistic study design scenarios.

Main Results:

The combined Monte Carlo and approximation approach efficiently estimates expected generalization error.
Both approximation methods performed well generally, but normal discriminant score approximation excelled with many uninformative features.
Simulations revealed that many realistic study designs may lead to suboptimal pattern estimation and low statistical validation power.

Conclusions:

The developed method provides an efficient way to estimate generalization error in high-dimensional classification.
Study design choices significantly impact classification performance, with potential for suboptimal results and low statistical significance in practice.
Careful consideration of sample size, feature informativeness, and feature selection is critical for robust high-dimensional classification studies.