Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Background and Environment Affect Phenotype02:27

Background and Environment Affect Phenotype

6.5K
Although the genetic makeup of an organism plays a major role in determining the phenotype, there are also several environmental factors, such as temperature, oxygen availability, presence of mutagens, that can alter an organism’s phenotype.
An example of how genetic background affects phenotype can be seen in horses. The Extension gene in horses is responsible for their coat color. A wild-type gene (EE) produces black pigment in the coat, while a mutant gene (ee) produces red pigment. A...
6.5K
Genome-wide Association Studies-GWAS01:11

Genome-wide Association Studies-GWAS

13.2K
Genome-wide association studies or GWAS are used to identify whether common SNPs are associated with certain diseases. Suppose specific SNPs are more frequently observed in individuals with a particular disease than those without the disease. In that case, those SNPs are said to be associated with the disease. Chi-square analysis is performed to check the probability of the allele likely to be associated with the disease.
GWAS does not require the identification of the target gene involved in...
13.2K
Genome Size and the Evolution of New Genes03:21

Genome Size and the Evolution of New Genes

2.4K
2.4K
RNA-seq03:21

RNA-seq

9.8K
RNA sequencing, or RNA-Seq, is a high-throughput sequencing technology used to study the transcriptome of a cell. Transcriptomics helps to interpret the functional elements of a genome and identify the molecular constituents of an organism. Additionally, it also helps in understanding the development of an organism and the occurrence of diseases. 
Before the discovery of RNA-seq, microarray-based methods and Sanger sequencing were used for transcriptome analysis. However, while...
9.8K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Unifying non-Markovian dynamics and agent heterogeneity in scalable stochastic networks.

Nature communications·2026
Same author

Correction: Proteomic-based stratification of intermediate-risk prostate cancer patients.

Life science alliance·2025
Same author

Negative chemical data boosts language models in reaction outcome prediction.

Science advances·2025
Same author

Dissecting the role of CAR signaling architectures on T cell activation and persistence using pooled screens and single-cell sequencing.

Science advances·2025
Same author

BACH1 as a key driver in rheumatoid arthritis fibroblast-like synoviocytes identified through gene network analysis.

Life science alliance·2024
Same author

Identification of single-cell blasts in pediatric acute myeloid leukemia using an autoencoder.

Life science alliance·2024
Same journal

Correction to: Pathogenicity patterns in cytochrome P450 family.

Bioinformatics advances·2026
Same journal

Region-aware bridge modeling enables interpretable mesoscale representation of spatial transcriptomic tissue sections.

Bioinformatics advances·2026
Same journal

Microbiome differential abundance methodologies to detect relevant taxa associated with chemotherapy toxicity rate in colorectal cancer.

Bioinformatics advances·2026
Same journal

maldipickr dereplicates microbial MALDI-TOF spectra to facilitate multiplexed isolation.

Bioinformatics advances·2026
Same journal

RAM-MSA: an anytime memory-bounded method for exact multiple sequence alignment using path finding.

Bioinformatics advances·2026
Same journal

Interpretable machine learning for low-sample multi-omics: a case study of ferret vaccine response.

Bioinformatics advances·2026
See all related articles

Related Experiment Video

Updated: Jun 12, 2025

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

653

Phenotype driven data augmentation methods for transcriptomic data.

Nikita Janakarajan1,2, Mara Graziani1, María Rodríguez Martínez1

  • 1AI for Scientific Discovery, IBM Research Europe, Rüschlikon 8803, Switzerland.

Bioinformatics Advances
|June 9, 2025
PubMed
Summary
This summary is machine-generated.

We developed new phenotype-driven data augmentation methods for high-dimensional transcriptomic data, improving patient stratification in cancer studies by 5-15% and offering insights into optimal data augmentation strategies.

More Related Videos

Author Spotlight: Cost-Effective Transcriptomic Drug Screening - Unlocking New Targets
06:40

Author Spotlight: Cost-Effective Transcriptomic Drug Screening - Unlocking New Targets

Published on: February 23, 2024

1.2K
IR-TEx: An Open Source Data Integration Tool for Big Data Transcriptomics Designed for the Malaria Vector Anopheles gambiae
08:22

IR-TEx: An Open Source Data Integration Tool for Big Data Transcriptomics Designed for the Malaria Vector Anopheles gambiae

Published on: January 15, 2020

6.1K

Related Experiment Videos

Last Updated: Jun 12, 2025

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

653
Author Spotlight: Cost-Effective Transcriptomic Drug Screening - Unlocking New Targets
06:40

Author Spotlight: Cost-Effective Transcriptomic Drug Screening - Unlocking New Targets

Published on: February 23, 2024

1.2K
IR-TEx: An Open Source Data Integration Tool for Big Data Transcriptomics Designed for the Malaria Vector Anopheles gambiae
08:22

IR-TEx: An Open Source Data Integration Tool for Big Data Transcriptomics Designed for the Malaria Vector Anopheles gambiae

Published on: January 15, 2020

6.1K

Area of Science:

  • Biomedical data science
  • Computational biology
  • Machine learning in genomics

Background:

  • High-dimensional transcriptomic data presents challenges for supervised learning, including overfitting and poor generalization.
  • Existing data augmentation methods for transcriptomic data are often computationally intensive or produce limited sample diversity.
  • Class imbalance and low sample sizes are common issues in transcriptomic datasets, hindering model performance.

Purpose of the Study:

  • To introduce novel phenotype-driven data augmentation methods for transcriptomic data.
  • To address the challenges of high dimensionality, overfitting, and limited generalization in supervised learning tasks.
  • To improve patient stratification accuracy in cancer transcriptomic studies.

Main Methods:

  • Developed two classes of phenotype-driven data augmentation: signature-dependent and signature-independent methods.
  • Signature-dependent methods utilize gene signatures for non-parametric data augmentation.
  • Signature-independent methods adapt established Gamma-Poisson and Poisson sampling techniques for gene expression data.

Main Results:

  • Applied augmentation methods to colorectal and breast cancer transcriptomic data.
  • Demonstrated improved patient stratification by 5-15% compared to existing augmentation methods.
  • Showcased enhanced model generalization and reduced overfitting through discriminative and generative experiments with external validation.

Conclusions:

  • Phenotype-driven data augmentation effectively enhances supervised learning on transcriptomic data.
  • The proposed methods offer a computationally efficient and diverse approach to data augmentation.
  • Over-augmentation can yield limited benefits, highlighting the importance of strategic application.