Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Sample Size Calculation01:19

Sample Size Calculation

6.8K
Knowledge of the sample size is the first requirement to conduct random sampling or an experiment. The sample size is the total number of units, observations, or groups (in some cases) used to get the data to estimate a population parameter. As the name suggests, the sample size is that of the sample drawn from the population and differs from the population size.
The sample size for the given experiment or sampling effort is fundamental to any study design. Sample size decides the number of...
6.8K
One-Way ANOVA: Unequal Sample Sizes01:15

One-Way ANOVA: Unequal Sample Sizes

6.8K
One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:
6.8K
One-Way ANOVA: Equal Sample Sizes01:15

One-Way ANOVA: Equal Sample Sizes

4.2K
One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...
4.2K
Sampling Plans01:23

Sampling Plans

1.1K
Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...
1.1K
Cluster Sampling Method01:20

Cluster Sampling Method

15.2K
Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
15.2K
Convenience Sampling Method00:55

Convenience Sampling Method

11.9K
Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Data are the result of sampling from a population. The sampling method ensures that samples are drawn without bias and accurately represent the population.
Convenience sampling is a non-random method of sample selection; this method selects individuals that are easily accessible and may result in biased data. For example, a marketing...
11.9K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Long-term patterns of health services utilization after hospitalization for COPD. A nationwide registry-based study from Norway.

Respiratory medicine·2026
Same author

Uptake of visibility materials for injury prevention and associated factors among commercial motorcycle riders: a cross-sectional study in Cameroon.

BMC public health·2026
Same author

An approach to nonparametric inference on the causal dose-response function.

Journal of causal inference·2026
Same author

Target product profiles of laboratory and data analytical frameworks for genotyping to monitor antimalarial efficacy.

PLOS global public health·2026
Same author

Epidemiological patterns of motorcycle-related injuries in Cameroon: A comparative analysis of motorcycle users and pedestrians.

PLOS global public health·2026
Same author

Sequential invitations to FOBT screening and colorectal cancer incidence.

Scientific reports·2026
Same journal

Targeted maximum likelihood estimation (TMLE) in regulatory submissions and research: a landscape analysis.

The international journal of biostatistics·2026
Same journal

Predicting birth weight by multivariate functional principal component regressions.

The international journal of biostatistics·2026
Same journal

Robust median regression for count data with general lower truncation using a contaminated discrete Weibull model.

The international journal of biostatistics·2026
Same journal

Handling the uncertainty issue of missingness via a mixture-structure-based method.

The international journal of biostatistics·2026
Same journal

Statistical method for pooling categorical biomarker data from multi-center matched/nested case-control studies.

The international journal of biostatistics·2026
Same journal

Prognostic score methods for the estimation of the average causal effect.

The international journal of biostatistics·2026
See all related articles

Related Experiment Video

Updated: Feb 28, 2026

A User-friendly and Powerful R Analysis of Large-scale Datasets
10:56

A User-friendly and Powerful R Analysis of Large-scale Datasets

Published on: November 4, 2025

425

Big Data, Small Sample.

Inna Gerlovina, Mark J van der Laan, Alan Hubbard

    The International Journal of Biostatistics
    |June 11, 2017
    PubMed
    Summary
    This summary is machine-generated.

    Genomic studies with many tests and small sample sizes often have unreliable results. New methods using Edgeworth expansions can improve error control and reduce false positives in big data analysis.

    Keywords:
    finite sample inferencehypothesis testingmultiple comparisons

    More Related Videos

    Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering
    09:43

    Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

    Published on: November 22, 2019

    6.8K
    Modeling the Size Spectrum for Macroinvertebrates and Fishes in Stream Ecosystems
    07:41

    Modeling the Size Spectrum for Macroinvertebrates and Fishes in Stream Ecosystems

    Published on: July 30, 2019

    8.1K

    Related Experiment Videos

    Last Updated: Feb 28, 2026

    A User-friendly and Powerful R Analysis of Large-scale Datasets
    10:56

    A User-friendly and Powerful R Analysis of Large-scale Datasets

    Published on: November 4, 2025

    425
    Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering
    09:43

    Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

    Published on: November 22, 2019

    6.8K
    Modeling the Size Spectrum for Macroinvertebrates and Fishes in Stream Ecosystems
    07:41

    Modeling the Size Spectrum for Macroinvertebrates and Fishes in Stream Ecosystems

    Published on: July 30, 2019

    8.1K

    Area of Science:

    • Statistics
    • Genomics
    • Bioinformatics

    Background:

    • Big Data studies, especially in genomics, face challenges with multiple comparisons and small sample sizes, impacting inference reliability.
    • Current multiple testing procedures require very small tail probabilities, a condition rarely met in practice.
    • Existing methods like permutation tests may not adequately control error rates in these scenarios.

    Purpose of the Study:

    • To investigate the impact of non-standard sampling distributions on error rates in multiple testing.
    • To assess the reliability of statistical inference in genomic studies with large numbers of tests and limited sample sizes.
    • To explore advanced statistical methods for improving error rate control.

    Main Methods:

    • Utilizing Edgeworth expansions to approximate sampling distributions beyond typical assumptions.
    • Analyzing departures from standard distributional assumptions and their effect on actual error rates.
    • Reviewing commonly used methods like permutation tests and finite sampling inequalities.

    Main Results:

    • Actual error rates can significantly deviate from nominal levels, leading to excessive false positives.
    • The condition for error rate control based on large deviation theory is frequently not satisfied.
    • Edgeworth expansions demonstrate potential for higher-order approximations to sampling distributions.

    Conclusions:

    • Widespread problems with error rate control, particularly inflated false positive rates, may exist in Big Data and genomic studies.
    • This lack of reliability contributes to the "reproducibility crisis" in science.
    • Edgeworth expansions offer a promising approach to enhance the reliability of studies with numerous comparisons and modest sample sizes.