Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Multiple Allele Traits01:49

Multiple Allele Traits

33.9K
The Concept of Multiple Allelism
33.9K
Polygenic Traits01:18

Polygenic Traits

64.6K
When more than one gene is responsible for a given phenotype, the trait is considered polygenic. Human height is a polygenic trait. Studies have uncovered hundreds of loci that influence height, and there are believed to be many more. Due to the high number of genes involved, as well as environmental and nutritional factors, height varies significantly within a given population. The distribution of height forms a bell-shaped curve, with relatively few individuals in the population at the...
64.6K
Pleiotropy01:33

Pleiotropy

39.3K
Pleiotropy is the phenomenon in which a single gene impacts multiple, seemingly unrelated phenotypic traits. For example, defects in the SOX10 gene cause Waardenburg Syndrome Type 4, or WS4, which can cause defects in pigmentation, hearing impairments, and an absence of intestinal contractions necessary for elimination. This diversity of phenotypes results from the expression pattern of SOX10 in early embryonic and fetal development. SOX10 is found in neural crest cells that form melanocytes,...
39.3K
Background and Environment Affect Phenotype02:27

Background and Environment Affect Phenotype

6.4K
Although the genetic makeup of an organism plays a major role in determining the phenotype, there are also several environmental factors, such as temperature, oxygen availability, presence of mutagens, that can alter an organism’s phenotype.
An example of how genetic background affects phenotype can be seen in horses. The Extension gene in horses is responsible for their coat color. A wild-type gene (EE) produces black pigment in the coat, while a mutant gene (ee) produces red pigment. A...
6.4K
Epistasis Analysis01:09

Epistasis Analysis

4.9K
Although Mendel chose seven unrelated traits in peas to study gene segregation, most traits involve multiple gene interactions that create a spectrum of phenotypes. When the interaction of various genes or alleles at different locations influences a phenotype, this is called epistasis. Epistasis often involves one gene masking or interfering with the expression of another (antagonistic epistasis). Epistasis often occurs when different genes are part of the same biochemical pathway. The...
4.9K
Transformation of Plane Strain01:12

Transformation of Plane Strain

149
When analyzing elongated structures like bars subjected to uniformly distributed loads, it is essential to understand the transformation of plane strain when coordinate axes are rotated. This transformation helps to assess how material deformation characteristics vary with orientation, which is crucial in materials science and structural engineering.
Under plane strain conditions, typical for members where one dimension significantly exceeds the others, deformations and resultant strains are...
149

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

End-to-end evaluation of pipelines for metagenome-assembled genomes reveals hidden performance gaps.

bioRxiv : the preprint server for biology·2026
Same author

A generalizable cross-continent prediction of esophageal squamous cell carcinoma using the oral microbiome.

Communications medicine·2026
Same author

Comparative metagenomics using pan-metagenomic graphs.

bioRxiv : the preprint server for biology·2025
Same author

A generalizable cross-continent prediction of esophageal squamous cell carcinoma using the oral microbiome.

bioRxiv : the preprint server for biology·2025
Same author

Distributional bias compromises leave-one-out cross-validation.

Science advances·2025
Same author

Identification of Sample Processing Errors in Microbiome Studies Using Host Genetic Profiles.

bioRxiv : the preprint server for biology·2025
Same journal

Global distribution of isoprenoid quinones across Bacteria.

mSystems·2026
Same journal

Environmental former <i>Massilia</i> group bacteria secrete metabolites that promote <i>Leptospira</i> growth.

mSystems·2026
Same journal

Signatures in the gut microbiome of German elite athletes: insights from a matched-subgroup analysis.

mSystems·2026
Same journal

MeLSI: Metric Learning for Statistical Inference in microbiome community composition analysis.

mSystems·2026
Same journal

Disentangling production and persistence of extracellular virions in grassland soils with SIP-viromics.

mSystems·2026
Same journal

Microbial consortia mediating lignocellulose turnover and denitrification in eutrophic lake sediment enrichments.

mSystems·2026
See all related articles

Related Experiment Video

Updated: May 23, 2025

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data
14:27

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data

Published on: June 26, 2013

15.6K

Compositional transformations can reasonably introduce phenotype-associated values into sparse features.

George I Austin1,2, Tal Korem2,3

  • 1Department of Biomedical Informatics, Columbia University Irving Medical, New York, New York, USA.

Msystems
|May 2, 2025
PubMed
Summary
This summary is machine-generated.

Sparse microbiome features becoming phenotype-associated after transformation does not necessarily indicate information leakage. Our counterexamples show these changes can arise from valid data processing, not just flawed pipelines.

Keywords:
compositional data analysisimputationmachine learningmicrobiome

More Related Videos

Quantification of Orofacial Phenotypes in Xenopus
09:26

Quantification of Orofacial Phenotypes in Xenopus

Published on: November 6, 2014

9.7K
Pure Shift Nuclear Magnetic Resonance: a New Tool for Plant Metabolomics
13:16

Pure Shift Nuclear Magnetic Resonance: a New Tool for Plant Metabolomics

Published on: July 31, 2021

1.7K

Related Experiment Videos

Last Updated: May 23, 2025

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data
14:27

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data

Published on: June 26, 2013

15.6K
Quantification of Orofacial Phenotypes in Xenopus
09:26

Quantification of Orofacial Phenotypes in Xenopus

Published on: November 6, 2014

9.7K
Pure Shift Nuclear Magnetic Resonance: a New Tool for Plant Metabolomics
13:16

Pure Shift Nuclear Magnetic Resonance: a New Tool for Plant Metabolomics

Published on: July 31, 2021

1.7K

Area of Science:

  • Microbiome bioinformatics
  • Computational biology
  • Statistical analysis of high-dimensional data

Background:

  • Gihawi et al. questioned the validity of tumor microbiome analyses by Poore et al., citing that sparse features (genera with few reads) became phenotype-associated after batch correction.
  • This critique implies that such transformations indicate information leakage and invalidate the analysis, impacting the interpretation of The Cancer Genome Atlas (TCGA) microbiome studies and broader microbiome research.

Purpose of the Study:

  • To investigate whether the emergence of phenotype-associated sparse features after data transformation necessarily signifies information leakage or processing errors in microbiome analyses.
  • To provide counterexamples demonstrating that such observations can result from valid statistical transformations, challenging the broad invalidation claims.

Main Methods:

  • Examined the centered log ratio (CLR) transformation, a common method for compositional microbiome data, noting its sample-wise nature and similarities to batch correction methods.
  • Utilized synthetic and vaginal microbiome datasets to demonstrate how CLR transformation, coupled with imputation strategies, can associate sparse features with the geometric mean and, consequently, the phenotype.
  • Re-analyzed specific features highlighted by Gihawi et al. to show that the observed phenomenon can occur even after a CLR transformation, serving as a counterexample to information leakage claims.

Main Results:

  • The centered log ratio (CLR) transformation, a sample-wise operation, cannot inherently leak information or invalidate downstream analyses.
  • Common imputation methods for zero or missing values in CLR-transformed data can lead to associations between transformed features and the sample's geometric mean.
  • When the geometric mean is phenotype-associated, sparse and CLR-transformed features also become associated with the phenotype, a phenomenon observed in both synthetic and real microbiome data.
  • Re-analysis confirmed that sparse features becoming phenotype-associated can occur after CLR transformation, refuting the claim that this observation alone indicates information leakage.

Conclusions:

  • The appearance of phenotype-associated sparse features after data transformation is not sufficient evidence to claim information leakage in machine learning pipelines.
  • Sample-wise transformations like CLR can generate such associations without artificially inflating performance, suggesting that the original critique may be overly broad.
  • Interpreting individual features in microbiome data requires caution due to the multivariate nature of the data and the impact of transformations and batch correction methods.