Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Chi-square Analysis

Chi-square Analysis

The chi-square test is a statistical hypothesis test. It is used to check whether there is a significant difference between an expected value and an observed value. In the context of genetics, it enables us to either accept or reject a hypothesis, based on how much the observed values deviate from the expected values.
The chi-square test was developed by Pearson in 1990.
The first step of performing a Chi-square analysis is to establish a null hypothesis, which assumes that there is no real...

Finding Critical Values for Chi-Square

Finding Critical Values for Chi-Square

Consider a curve representing sample data drawn randomly from a normally distributed population. One must construct confidence intervals to estimate or to test a claim regarding the population standard deviation. For example, a 95% confidence interval covers 95% of the area under the curve, and the remaining 5% is equally distributed on either side of the curve. To achieve such confidence intervals, one must determine the critical values. The critical values are simply the values separating the...

Chi-square Distribution

Chi-square Distribution

How does one determine if bingo numbers are evenly distributed or if some numbers occurred with a greater frequency? Or if the types of movies people preferred were different across different age groups or if a coffee machine dispensed approximately the same amount of coffee each time. These questions can be addressed by conducting a hypothesis test. One distribution that can be used to find answers to such questions is known as the chi-square distribution. The chi-square distribution has...

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

Test for Homogeneity

Test for Homogeneity

The goodness–of–fit test can be used to decide whether a population fits a given distribution, but it will not suffice to decide whether two populations follow the same unknown distribution. A different test, called the test for homogeneity, can be used to conclude whether two populations have the same distribution. To calculate the test statistic for a test for homogeneity, follow the same procedure as with the test of independence. The hypotheses for the test for homogeneity can be stated as...

Introduction to Test of Independence

Introduction to Test of Independence

In statistics, the term independence means that one can directly obtain the probability of any event involving both variables by multiplying their individual probabilities. Tests of independence are chi-square tests involving the use of a contingency table of observed (data) values.
The test statistic for a test of independence is similar to that of a goodness-of-fit test:

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Effects of <i>Lactobacillus plantarum</i> P9 Probiotics on Defecation and Quality of Life of Individuals with Chronic Constipation: Protocol for a Randomized, Double-Blind, Placebo-Controlled Clinical Trial.

Evidence-based complementary and alternative medicine : eCAM·2022

Same author

Super-taxon in human microbiome are identified to be associated with colorectal cancer.

BMC bioinformatics·2022

Same author

Pre-IVF treatment with a GnRH antagonist in women with endometriosis (PREGNANT): study protocol for a prospective, double-blind, placebo-controlled trial.

BMJ open·2022

Same author

Comparative genomic analysis revealed genetic divergence between Bifidobacterium catenulatum subspecies present in infant versus adult guts.

BMC microbiology·2022

Same author

Probiotics synergized with conventional regimen in managing Parkinson's disease.

NPJ Parkinson's disease·2022

Same author

Protocol of a randomized, double-blind, placebo-controlled study of the effect of probiotics on the gut microbiome of patients with gastro-oesophageal reflux disease treated with rabeprazole.

BMC gastroenterology·2022

Same journal

3DICE: Interpretable 3D Cross-Modal Learning for Drug-Target Interaction Prediction and Large-Scale Drug Discovery.

Bioinformatics (Oxford, England)·2026

Same journal

KASSPer: Kinase Active Site Structure Prediction using Protein and Ligand Language Models and Its Application to Virtual Screening.

Bioinformatics (Oxford, England)·2026

Same journal

IDR searcher: a search engine solution for public image resources.

Bioinformatics (Oxford, England)·2026

Same journal

KCFtools: Rapid alignment-free method for introgression screening and GWAS using k-mer profiles.

Bioinformatics (Oxford, England)·2026

Same journal

Meta2DB: Curated shotgun metagenomic feature sets and metadata for health state prediction.

Bioinformatics (Oxford, England)·2026

Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026

See all related articles

Search research articles

Related Experiment Videos

Maximal conditional chi-square importance in random forests.

Minghui Wang¹, Xiang Chen, Heping Zhang

¹Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT 06520-8034, USA.

Bioinformatics (Oxford, England)

|February 5, 2010

Summary

This summary is machine-generated.

We developed a new method using maximal conditional chi-square (MCC) in random forests to identify disease-associated single nucleotide polymorphisms (SNPs) in genome-wide association studies (GWAS). This approach improves upon existing methods for detecting complex genetic effects and understanding disease etiology.

Related Experiment Videos

Area of Science:

Genetics and Bioinformatics
Statistical Genomics
Computational Biology

Background:

Genome-wide association studies (GWAS) generate high-dimensional data, necessitating robust methods for identifying disease-associated single nucleotide polymorphisms (SNPs).
Existing variable importance measures in random forests have limitations for complex genetic analyses.
Accurate identification of SNPs associated with diseases is crucial for understanding disease etiology.

Purpose of the Study:

To propose and evaluate a novel importance measure for random forests to overcome shortcomings of existing methods.
To enhance the identification of disease-associated SNPs in high-dimensional genetic data.
To improve the understanding of the relationship between genetic variants and complex diseases.

Main Methods:

Developed a new importance measure utilizing maximal conditional chi-square (MCC) within random forests.
Employed a permutation test with the MCC importance measure to estimate empirical P-values for SNPs.
Compared the proposed method against univariate tests and existing random forest importance measures (Gini, permutation importance).

Main Results:

The proposed MCC importance measure demonstrated superior performance in simulations for identifying risk SNPs compared to other methods.
In a GWAS of age-related macular degeneration, the method successfully confirmed two significant SNPs.
The MCC measure is more sensitive to complex SNP effects by incorporating conditional information, outperforming existing measures.

Conclusions:

The maximal conditional chi-square (MCC) importance measure in random forests offers a sensitive and efficient approach for SNP identification in GWAS.
The proposed method facilitates a deeper understanding of the etiological links between genetic variants and complex diseases.
This approach provides a valuable tool for genetic research, particularly in analyzing high-dimensional genomic data.