Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Sample Size Calculation

Sample Size Calculation

Knowledge of the sample size is the first requirement to conduct random sampling or an experiment. The sample size is the total number of units, observations, or groups (in some cases) used to get the data to estimate a population parameter. As the name suggests, the sample size is that of the sample drawn from the population and differs from the population size.
The sample size for the given experiment or sampling effort is fundamental to any study design. Sample size decides the number of...

One-Way ANOVA: Unequal Sample Sizes

One-Way ANOVA: Unequal Sample Sizes

One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...

Sampling Plans

Sampling Plans

Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...

Cluster Sampling Method

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

Convenience Sampling Method

Convenience Sampling Method

Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Data are the result of sampling from a population. The sampling method ensures that samples are drawn without bias and accurately represent the population.
Convenience sampling is a non-random method of sample selection; this method selects individuals that are easily accessible and may result in biased data. For example, a marketing...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Long-term patterns of health services utilization after hospitalization for COPD. A nationwide registry-based study from Norway.

Respiratory medicine·2026

Same author

Uptake of visibility materials for injury prevention and associated factors among commercial motorcycle riders: a cross-sectional study in Cameroon.

BMC public health·2026

Same author

An approach to nonparametric inference on the causal dose-response function.

Journal of causal inference·2026

Same author

Target product profiles of laboratory and data analytical frameworks for genotyping to monitor antimalarial efficacy.

PLOS global public health·2026

Same author

Epidemiological patterns of motorcycle-related injuries in Cameroon: A comparative analysis of motorcycle users and pedestrians.

PLOS global public health·2026

Same author

Sequential invitations to FOBT screening and colorectal cancer incidence.

Scientific reports·2026

Same journal

Targeted maximum likelihood estimation (TMLE) in regulatory submissions and research: a landscape analysis.

The international journal of biostatistics·2026

Same journal

Predicting birth weight by multivariate functional principal component regressions.

The international journal of biostatistics·2026

Same journal

Robust median regression for count data with general lower truncation using a contaminated discrete Weibull model.

The international journal of biostatistics·2026

Same journal

Handling the uncertainty issue of missingness via a mixture-structure-based method.

The international journal of biostatistics·2026

Same journal

Statistical method for pooling categorical biomarker data from multi-center matched/nested case-control studies.

The international journal of biostatistics·2026

Same journal

Prognostic score methods for the estimation of the average causal effect.

The international journal of biostatistics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Feb 28, 2026

A User-friendly and Powerful R Analysis of Large-scale Datasets

A User-friendly and Powerful R Analysis of Large-scale Datasets

Published on: November 4, 2025

Big Data, Small Sample.

Inna Gerlovina, Mark J van der Laan, Alan Hubbard

The International Journal of Biostatistics

|June 11, 2017

Summary

This summary is machine-generated.

Genomic studies with many tests and small sample sizes often have unreliable results. New methods using Edgeworth expansions can improve error control and reduce false positives in big data analysis.

Keywords:

finite sample inference hypothesis testing multiple comparisons

More Related Videos

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

Modeling the Size Spectrum for Macroinvertebrates and Fishes in Stream Ecosystems

Modeling the Size Spectrum for Macroinvertebrates and Fishes in Stream Ecosystems

Published on: July 30, 2019

Related Experiment Videos

Last Updated: Feb 28, 2026

A User-friendly and Powerful R Analysis of Large-scale Datasets

A User-friendly and Powerful R Analysis of Large-scale Datasets

Published on: November 4, 2025

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

Modeling the Size Spectrum for Macroinvertebrates and Fishes in Stream Ecosystems

Modeling the Size Spectrum for Macroinvertebrates and Fishes in Stream Ecosystems

Published on: July 30, 2019

Area of Science:

Statistics
Genomics
Bioinformatics

Background:

Big Data studies, especially in genomics, face challenges with multiple comparisons and small sample sizes, impacting inference reliability.
Current multiple testing procedures require very small tail probabilities, a condition rarely met in practice.
Existing methods like permutation tests may not adequately control error rates in these scenarios.

Purpose of the Study:

To investigate the impact of non-standard sampling distributions on error rates in multiple testing.
To assess the reliability of statistical inference in genomic studies with large numbers of tests and limited sample sizes.
To explore advanced statistical methods for improving error rate control.

Main Methods:

Utilizing Edgeworth expansions to approximate sampling distributions beyond typical assumptions.
Analyzing departures from standard distributional assumptions and their effect on actual error rates.
Reviewing commonly used methods like permutation tests and finite sampling inequalities.

Main Results:

Actual error rates can significantly deviate from nominal levels, leading to excessive false positives.
The condition for error rate control based on large deviation theory is frequently not satisfied.
Edgeworth expansions demonstrate potential for higher-order approximations to sampling distributions.

Conclusions:

Widespread problems with error rate control, particularly inflated false positive rates, may exist in Big Data and genomic studies.
This lack of reliability contributes to the "reproducibility crisis" in science.
Edgeworth expansions offer a promising approach to enhance the reliability of studies with numerous comparisons and modest sample sizes.