Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Multiple Allele Traits01:49

Multiple Allele Traits

34.0K
The Concept of Multiple Allelism
34.0K
Polygenic Traits01:18

Polygenic Traits

64.7K
When more than one gene is responsible for a given phenotype, the trait is considered polygenic. Human height is a polygenic trait. Studies have uncovered hundreds of loci that influence height, and there are believed to be many more. Due to the high number of genes involved, as well as environmental and nutritional factors, height varies significantly within a given population. The distribution of height forms a bell-shaped curve, with relatively few individuals in the population at the...
64.7K
Pleiotropy01:33

Pleiotropy

39.6K
Pleiotropy is the phenomenon in which a single gene impacts multiple, seemingly unrelated phenotypic traits. For example, defects in the SOX10 gene cause Waardenburg Syndrome Type 4, or WS4, which can cause defects in pigmentation, hearing impairments, and an absence of intestinal contractions necessary for elimination. This diversity of phenotypes results from the expression pattern of SOX10 in early embryonic and fetal development. SOX10 is found in neural crest cells that form melanocytes,...
39.6K
X-linked Traits01:19

X-linked Traits

53.2K
In most mammalian species, females have two X sex chromosomes and males have an X and Y. As a result, mutations on the X chromosome in females may be masked by the presence of a normal allele on the second X. In contrast, a mutation on the X chromosome in males more often causes observable biological defects, as there is no normal X to compensate. Trait variations arising from mutations on the X chromosome are called “X-linked”.
53.2K
Background and Environment Affect Phenotype02:27

Background and Environment Affect Phenotype

6.4K
Although the genetic makeup of an organism plays a major role in determining the phenotype, there are also several environmental factors, such as temperature, oxygen availability, presence of mutagens, that can alter an organism’s phenotype.
An example of how genetic background affects phenotype can be seen in horses. The Extension gene in horses is responsible for their coat color. A wild-type gene (EE) produces black pigment in the coat, while a mutant gene (ee) produces red pigment. A...
6.4K
Epistasis Analysis01:09

Epistasis Analysis

4.9K
Although Mendel chose seven unrelated traits in peas to study gene segregation, most traits involve multiple gene interactions that create a spectrum of phenotypes. When the interaction of various genes or alleles at different locations influences a phenotype, this is called epistasis. Epistasis often involves one gene masking or interfering with the expression of another (antagonistic epistasis). Epistasis often occurs when different genes are part of the same biochemical pathway. The...
4.9K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

End-to-end evaluation of pipelines for metagenome-assembled genomes reveals hidden performance gaps.

bioRxiv : the preprint server for biology·2026
Same author

A generalizable cross-continent prediction of esophageal squamous cell carcinoma using the oral microbiome.

Communications medicine·2026
Same author

Comparative metagenomics using pan-metagenomic graphs.

bioRxiv : the preprint server for biology·2025
Same author

A generalizable cross-continent prediction of esophageal squamous cell carcinoma using the oral microbiome.

bioRxiv : the preprint server for biology·2025
Same author

Distributional bias compromises leave-one-out cross-validation.

Science advances·2025
Same author

Identification of Sample Processing Errors in Microbiome Studies Using Host Genetic Profiles.

bioRxiv : the preprint server for biology·2025
Same journal

Layered social competition coordinates reproductive hierarchy formation in ants.

bioRxiv : the preprint server for biology·2026
Same journal

Combination epigenetic-targeted therapy increases the immunogenicity of poorly immunogenic sarcomas.

bioRxiv : the preprint server for biology·2026
Same journal

Loss of LanC-like proteins delays post-injury regeneration of aging skeletal muscles.

bioRxiv : the preprint server for biology·2026
Same journal

Integrative Transfer Network: Deep Transfer Learning Across Populations and Prediction Targets.

bioRxiv : the preprint server for biology·2026
Same journal

Confidence-supported label-free metabolic imaging with FPhaS phase autofluorescence microscopy.

bioRxiv : the preprint server for biology·2026
Same journal

Sequence-encoded autoinhibition couples mRNA decapping activity to phase separation.

bioRxiv : the preprint server for biology·2026
See all related articles

Related Experiment Video

Updated: Jun 1, 2025

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data
14:27

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data

Published on: June 26, 2013

15.6K

Compositional transformations can reasonably introduce phenotype-associated values into sparse features.

George I Austin1,2, Tal Korem2,3

  • 1Department of Biomedical Informatics, Columbia University Irving Medical , New York, NY, USA.

Biorxiv : the Preprint Server for Biology
|January 20, 2025
PubMed
Summary
This summary is machine-generated.

Sparse features in tumor microbiome data can become associated with phenotypes due to data transformations like centered log ratio (CLR), not necessarily indicating machine learning pipeline issues. This finding challenges claims of information leakage in microbiome analysis.

More Related Videos

Quantification of Orofacial Phenotypes in Xenopus
09:26

Quantification of Orofacial Phenotypes in Xenopus

Published on: November 6, 2014

9.7K
Temporal Ordering of Dynamic Expression Data from Detailed Spatial Expression Maps
11:52

Temporal Ordering of Dynamic Expression Data from Detailed Spatial Expression Maps

Published on: February 9, 2017

5.9K

Related Experiment Videos

Last Updated: Jun 1, 2025

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data
14:27

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data

Published on: June 26, 2013

15.6K
Quantification of Orofacial Phenotypes in Xenopus
09:26

Quantification of Orofacial Phenotypes in Xenopus

Published on: November 6, 2014

9.7K
Temporal Ordering of Dynamic Expression Data from Detailed Spatial Expression Maps
11:52

Temporal Ordering of Dynamic Expression Data from Detailed Spatial Expression Maps

Published on: February 9, 2017

5.9K

Area of Science:

  • Microbiome Research
  • Bioinformatics
  • Machine Learning

Background:

  • Recent arguments suggest tumor-associated microbiome data analysis is invalid if sparse features become phenotype-associated after batch correction.
  • This raises concerns about potential information leakage in processing or machine learning pipelines.

Purpose of the Study:

  • To investigate whether sparse features becoming phenotype-associated necessarily indicates issues with microbiome data processing or machine learning pipelines.
  • To demonstrate that sample-wise transformations can create such associations without information leakage.

Main Methods:

  • Utilized the centered log ratio (CLR) transformation, a common method for compositional microbiome data.
  • Analyzed synthetic and vaginal microbiome datasets.
  • Re-analyzed features previously highlighted as problematic by Gihawi et al.

Main Results:

  • Demonstrated that the CLR transformation, a sample-wise operation, can cause initially sparse features to become associated with a phenotype.
  • This association occurs when the geometric mean, used in CLR, is linked to the phenotype, particularly with common imputation strategies for zero values.
  • Showed that this phenomenon, observed with CLR, serves as a counterexample to the claim of necessary information leakage.

Conclusions:

  • The appearance of phenotype-associated sparse features after sample-wise transformations like CLR does not independently prove information leakage in machine learning pipelines.
  • Such observations can arise from the nature of the transformation and data characteristics, not necessarily from pipeline artifacts.
  • Emphasized the need for cautious interpretation of individual features in multivariate microbiome data.