Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Stratified Sampling Method01:16

Stratified Sampling Method

Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a stratified sample, divide the population into groups called strata and then take a...
Friedman Two-way Analysis of Variance by Ranks01:21

Friedman Two-way Analysis of Variance by Ranks

Friedman's Two-Way Analysis of Variance by Ranks is a nonparametric test designed to identify differences across multiple test attempts when traditional assumptions of normality and equal variances do not apply. Unlike conventional ANOVA, which requires normally distributed data with equal variances, Friedman's test is ideal for ordinal or non-normally distributed data, making it particularly useful for analyzing dependent samples, such as matched subjects over time or repeated measures from...
Randomized Experiments01:13

Randomized Experiments

The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...
One-Way ANOVA: Unequal Sample Sizes01:15

One-Way ANOVA: Unequal Sample Sizes

One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:
Cluster Sampling Method01:20

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
Estimating Population Standard Deviation01:26

Estimating Population Standard Deviation

When the population standard deviation is unknown and the sample size is large, the sample standard deviation s is commonly used as a point estimate of σ. However, it can sometimes under or overestimate the population standard deviation. To overcome this drawback, confidence intervals are determined to estimate population parameters and eliminate any calculation bias accurately. However, this only applies to random samples from normally distributed populations. Knowing the sample mean and...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Testing for gene-environment interaction under exposure misspecification.

Biometrics·2017
Same author

Novel genetic variants in the P38MAPK pathway gene ZAK and susceptibility to lung cancer.

Molecular carcinogenesis·2017
Same author

Updating the "Risk Index": A systematic review and meta-analysis of occupational injuries and work schedule characteristics.

Chronobiology international·2017
Same author

Genome-wide interaction study of smoking behavior and non-small cell lung cancer risk in Caucasian population.

Carcinogenesis·2017
Same author

Epigenomic study identifies a novel mesenchyme homeobox2-GLI1 transcription axis involved in cancer drug resistance, overall survival and therapy prognosis in lung cancer patients.

Oncotarget·2017
Same author

Common <i>TDP1</i> Polymorphisms in Relation to Survival among Small Cell Lung Cancer Patients: A Multicenter Study from the International Lung Cancer Consortium.

Clinical cancer research : an official journal of the American Association for Cancer Research·2017
Same journal

Age at menarche and adverse pregnancy and perinatal outcomes: triangulating evidence from multivariable and Mendelian randomization analyses.

International journal of epidemiology·2026
Same journal

Life-course trajectories of cardiovascular disease risk factors in rural India: Andhra Pradesh Children and Parents Study (APCAPS) 2003-2023.

International journal of epidemiology·2026
Same journal

Cohort Profile Update: The Young Lives study.

International journal of epidemiology·2026
Same journal

From the departing Editors in Chief.

International journal of epidemiology·2026
Same journal

Data Resource Profile: Cheeloo Lifespan Electronic-health reseArch Data-library (Cheeloo LEAD).

International journal of epidemiology·2026
Same journal

Cohort Profile Update: The Swiss Childhood Cancer Survivor Cohort.

International journal of epidemiology·2026
See all related articles

Related Experiment Video

Updated: May 16, 2026

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model
07:13

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Published on: April 18, 2025

Correction for population stratification in random forest analysis.

Yang Zhao1, Feng Chen, Rihong Zhai

  • 1Environmental and Occupational Medicine and Epidemiology Program, Department of Environmental Health, Harvard School of Public Health, Harvard University, Boston, MA, USA.

International Journal of Epidemiology
|November 14, 2012
PubMed
Summary
This summary is machine-generated.

This study introduces a method to correct for population structure in random forest analysis for genome-wide association studies. The approach improves causal SNP importance and removes spurious associations, enhancing GWAS accuracy.

More Related Videos

Automatic Image Processing to Determine the Community Size Structure of Riverine Macroinvertebrates
08:56

Automatic Image Processing to Determine the Community Size Structure of Riverine Macroinvertebrates

Published on: January 13, 2023

Sampling Soils in a Heterogeneous Research Plot
07:11

Sampling Soils in a Heterogeneous Research Plot

Published on: January 7, 2019

Related Experiment Videos

Last Updated: May 16, 2026

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model
07:13

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Published on: April 18, 2025

Automatic Image Processing to Determine the Community Size Structure of Riverine Macroinvertebrates
08:56

Automatic Image Processing to Determine the Community Size Structure of Riverine Macroinvertebrates

Published on: January 13, 2023

Sampling Soils in a Heterogeneous Research Plot
07:11

Sampling Soils in a Heterogeneous Research Plot

Published on: January 7, 2019

Area of Science:

  • Genetics
  • Bioinformatics
  • Statistical Genomics

Background:

  • Population structure (PS) confounds genome-wide association studies (GWAS), potentially causing spurious associations.
  • Random forest (RF) is useful for high-dimensional genetic data analysis in GWAS, providing SNP importance measures for feature selection.
  • Uncorrected PS in RF analysis can lead to inaccurate results by overemphasizing unrelated SNPs.

Purpose of the Study:

  • To propose and evaluate a method for correcting the confounding effect of population structure in RF analysis for GWAS data.
  • To enhance the accuracy of SNP importance measures and reduce spurious findings in GWAS.

Main Methods:

  • Population structure information extracted using EIGENSTRAT or multi-dimensional scaling.
  • Phenotype and genotype data adjusted for population structure.
  • Adjusted data utilized as outcome and predictors in Random Forest analysis.

Main Results:

  • Simulations show increased importance measures for causal SNPs after PS correction.
  • The proposed method successfully removed a spurious association between the lactase gene and height in a real dataset.
  • Enhanced accuracy in identifying true genetic associations.

Conclusions:

  • A straightforward method is presented to correct for population structure in RF-based GWAS.
  • The approach demonstrates potential for improving the reliability of GWAS findings.
  • Further validation on diverse GWAS datasets is recommended to confirm the robustness of the method.