Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Sample Size Calculation

Sample Size Calculation

Knowledge of the sample size is the first requirement to conduct random sampling or an experiment. The sample size is the total number of units, observations, or groups (in some cases) used to get the data to estimate a population parameter. As the name suggests, the sample size is that of the sample drawn from the population and differs from the population size.
The sample size for the given experiment or sampling effort is fundamental to any study design. Sample size decides the number of...

Estimating Population Standard Deviation

Estimating Population Standard Deviation

When the population standard deviation is unknown and the sample size is large, the sample standard deviation s is commonly used as a point estimate of σ. However, it can sometimes under or overestimate the population standard deviation. To overcome this drawback, confidence intervals are determined to estimate population parameters and eliminate any calculation bias accurately. However, this only applies to random samples from normally distributed populations. Knowing the sample mean and...

Genome Size and the Evolution of New Genes

Genome Size and the Evolution of New Genes

While every living organism has a genome of some kind (be it RNA, or DNA), there is considerable variation in the sizes of these blueprints. One major factor that impacts genome size is whether the organism is prokaryotic or eukaryotic. In prokaryotes, the genome contains little to no non-coding sequence, such that genes are tightly clustered in groups or operons sequentially along the chromosome. Conversely, the genes in eukaryotes are punctuated by long stretches of non-coding sequence.

Estimating Population Mean with Known Standard Deviation

Estimating Population Mean with Known Standard Deviation

To construct a confidence interval for a single unknown population mean μ, where the population standard deviation is known, we need sample mean as an estimate for μ and we need the margin of error. Here, the margin of error (EBM) is called the error bound for a population mean (abbreviated EBM). The sample mean is the point estimate of the unknown population mean μ.
The confidence interval estimate will have the form as follows:
(point estimate - error bound, point estimate +...

Estimating Population Mean with Unknown Standard Deviation

Estimating Population Mean with Unknown Standard Deviation

In practice, we rarely know the population standard deviation. In the past, when the sample size was large, this did not present a problem to statisticians. They used the sample standard deviation s as an estimate for σ and proceeded as before to calculate a confidence interval with close enough results. However, statisticians ran into problems when the sample size was small. A small sample size caused inaccuracies in the confidence interval.
William S. Gosset (1876–1937) of the...

Bootstrapping

Bootstrapping

The term "bootstrap" originated in the 19th century as a metaphor for self-improvement or achieving something independently, without external assistance. This concept extends to statistical bootstrapping, a self-contained method for estimating population parameters through resampling, even though it can be computationally intensive. Developed by the American statistician Dr. Bradley Efron in 1979, bootstrapping provides a robust way to perform inference when the original sample size is...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Bridging tradition and innovation: a review of computer simulations in plant breeding.

TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik·2026

Same author

The potential of considering photosynthesis parameters in crop yield breeding by genomic prediction.

Journal of experimental botany·2026

Same author

Optimizing training sets for genomic selection to identify superior genotypes across multiple environments.

G3 (Bethesda, Md.)·2026

Same author

Assessment of segregation variance estimates from derivation, simulations, and empirical data in autotetraploid species exemplified in potato.

Genetics·2026

Same author

Genetic architecture and cellular basis of flag leaf size in barley.

Journal of experimental botany·2025

Same author

Methylome differences among barley inbreds and their association with genomic, transcriptomic, and phenotypic variation.

Journal of experimental botany·2025

Same journal

Unveiling core genomic regions shaping plant architecture, productivity, and seed quality traits in sesame (Sesamum indicum L.): insights from Meta-QTL study into breeding targets.

TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik·2026

Same journal

Watkins wheat landraces: a treasure of stripe rust resistance alleles identified using multi-model association analyses.

TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik·2026

Same journal

Selection of four mutant alleles of fatty acid desaturase genes for a stable high oleic and low linolenic acid soybean seed oil trait.

TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik·2026

Same journal

Harnessing artificial intelligence in plant breeding: innovations in digital phenotyping and breeding methodologies.

TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik·2026

Same journal

Identification of a novel major QTL and F-box candidate genes controlling seed dormancy in common wheat.

TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik·2026

Same journal

Genomic loci associated with Fusarium stalk rot resistance and related agronomic traits in maize.

TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Aug 7, 2025

MEDUSA for Identifying Death Regulatory Genes in Chemo-genetic Profiling Data

MEDUSA for Identifying Death Regulatory Genes in Chemo-genetic Profiling Data

Published on: February 7, 2025

Sample size determination for training set optimization in genomic prediction.

Po-Ya Wu^1,2, Jen-Hsiang Ou^1,3, Chen-Tuo Liao⁴

¹Department of Agronomy, National Taiwan University, Taipei, Taiwan.

TAG. Theoretical and Applied Genetics. Theoretische Und Angewandte Genetik

|March 13, 2023

Summary

This summary is machine-generated.

Determining the optimal training set size is crucial for genomic prediction (GP) studies. This research presents a cost-effective method using logistic growth curves to find the ideal sample size for selective phenotyping, aiding breeders in economical genotype selection.

More Related Videos

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

Candidate Gene Testing in Clinical Cohort Studies with Multiplexed Genotyping and Mass Spectrometry

Candidate Gene Testing in Clinical Cohort Studies with Multiplexed Genotyping and Mass Spectrometry

Published on: June 21, 2018

Related Experiment Videos

Last Updated: Aug 7, 2025

MEDUSA for Identifying Death Regulatory Genes in Chemo-genetic Profiling Data

MEDUSA for Identifying Death Regulatory Genes in Chemo-genetic Profiling Data

Published on: February 7, 2025

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

Candidate Gene Testing in Clinical Cohort Studies with Multiplexed Genotyping and Mass Spectrometry

Candidate Gene Testing in Clinical Cohort Studies with Multiplexed Genotyping and Mass Spectrometry

Published on: June 21, 2018

Area of Science:

Quantitative genetics
Animal and plant breeding
Statistical genomics

Background:

Genomic prediction (GP) utilizes genomic estimated breeding values (GEBVs) for trait selection in breeding programs.
Establishing an optimal training set size for GP models is critical but often unresolved.
Current practices often overlook cost-effectiveness and resource constraints in training set determination.

Purpose of the Study:

To develop a practical and cost-effective approach for determining the optimal training set size in genomic prediction studies.
To provide a method for optimizing selective phenotyping strategies.
To facilitate the efficient selection of genotypes with economical sample sizes.

Main Methods:

Applied logistic growth curve analysis to model prediction accuracy in relation to training set size.
Utilized three real genome datasets to validate the proposed approach.
Developed an R function to enable widespread application of the sample size determination method.

Main Results:

A method was established to identify a cost-effective optimal training set size for genomic prediction.
The approach effectively balances prediction accuracy with the economic constraints of phenotyping.
Demonstrated the utility of the method across diverse genomic datasets.

Conclusions:

The developed approach provides a practical solution for optimizing training set sizes in genomic prediction.
The accompanying R function simplifies the implementation for breeders seeking economical phenotyping strategies.
This facilitates more efficient and cost-effective breeding programs through informed genotype selection.