Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Improving Translational Accuracy02:07

Improving Translational Accuracy

12.7K
Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
12.7K
Improving Translational Accuracy02:07

Improving Translational Accuracy

3.3K
3.3K
Deconvolution01:20

Deconvolution

419
Deconvolution, also known as inverse filtering, is the process of extracting the impulse response from known input and output signals. This technique is vital in scenarios where the system's characteristics are unknown, and they must be inferred from the observable signals.
Deconvolution involves several mathematical techniques to derive the impulse response. One common approach is polynomial division. In this method, the input and output sequences are treated as coefficients of...
419
Combinatorial Gene Control02:33

Combinatorial Gene Control

9.0K
Combinatorial gene control is the synergistic action of several transcriptional factors to regulate the expression of a single gene. The absence of one or more of these factors may lead to a significant difference in the level of gene expression or repression.
The expression of more than 30,000 genes is controlled by approximately 2000-3000 transcription factors. This is possible because a single transcription factor can recognize more than one regulatory sequence. The specificity in gene...
9.0K
DNA Microarrays02:34

DNA Microarrays

19.8K
Microarrays are high-throughput and relatively inexpensive assays that can be automated to analyze large quantities of data at a time. They are used in genome-wide studies to compare gene or protein expression under two varied conditions, such as healthy and diseased states. Microarrays consist of glass or silica slides on which probe molecules are covalently attached through surface functionalization. Most commonly, the slides are prepared through the chemisorption of silanes to silica...
19.8K
Regulation of Expression at Multiple Steps01:23

Regulation of Expression at Multiple Steps

1.2K
The gene expression in cells is regulated at different stages: (i) transcription, (ii) RNA processing, (iii) RNA localization, and (iv) translation. Transcriptional regulation is mediated by regulatory proteins such as transcription factors, activators, or repressors—these control gene expression by initiating or inhibiting the transcription of genes. Once a precursor or pre-mRNA is produced, it undergoes post-transcriptional modification, including 5' capping, splicing, and the...
1.2K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Dissecting and directing pathology foundation models.

bioRxiv : the preprint server for biology·2026
Same author

Transparency of medical artificial intelligence systems.

Nature reviews bioengineering·2026
Same author

DREAM: A framework for discovering mechanisms underlying AI prediction of protected attributes.

medRxiv : the preprint server for health sciences·2025
Same author

Deep profiling of gene expression across 18 human cancers.

Nature biomedical engineering·2024
Same author

Transparent medical image AI via an image-text foundation model grounded in medical literature.

Nature medicine·2024
Same author

A deep profile of gene expression across 18 human cancers.

bioRxiv : the preprint server for biology·2024
Same journal

3DICE: Interpretable 3D Cross-Modal Learning for Drug-Target Interaction Prediction and Large-Scale Drug Discovery.

Bioinformatics (Oxford, England)·2026
Same journal

KASSPer: Kinase Active Site Structure Prediction using Protein and Ligand Language Models and Its Application to Virtual Screening.

Bioinformatics (Oxford, England)·2026
Same journal

IDR searcher: a search engine solution for public image resources.

Bioinformatics (Oxford, England)·2026
Same journal

KCFtools: Rapid alignment-free method for introgression screening and GWAS using k-mer profiles.

Bioinformatics (Oxford, England)·2026
Same journal

Meta2DB: Curated shotgun metagenomic feature sets and metadata for health state prediction.

Bioinformatics (Oxford, England)·2026
Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026
See all related articles

Related Experiment Video

Updated: Nov 23, 2025

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches
09:47

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Published on: December 15, 2023

1.5K

Adversarial deconfounding autoencoder for learning robust gene expression embeddings.

Ayse B Dincer1, Joseph D Janizek1,2, Su-In Lee1

  • 1Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA 98195, USA.

Bioinformatics (Oxford, England)
|December 31, 2020
PubMed
Summary
This summary is machine-generated.

This paper presents a new machine learning method called the Adversarial Deconfounding AutoEncoder (AD-AE) to improve how we analyze gene expression data. Large datasets often contain unwanted noise from technical issues or biological factors like age, which can hide the true biological signals. The AD-AE model uses two neural networks working together: one to learn the data structure and another to ensure that unwanted noise is removed from the final results. By testing this approach on different datasets, the researchers show that it successfully separates useful information from interference, leading to more reliable and generalizable findings. This tool helps scientists better understand gene expression patterns across different experimental conditions.

Keywords:
neural networksgenomicsbatch effectslatent spacemachine learning

Frequently Asked Questions

More Related Videos

Decoding Natural Behavior from Neuroethological Embedding
08:00

Decoding Natural Behavior from Neuroethological Embedding

Published on: October 3, 2025

311
Analyzing Multifactorial RNA-Seq Experiments with DiCoExpress
05:22

Analyzing Multifactorial RNA-Seq Experiments with DiCoExpress

Published on: July 29, 2022

3.8K

Related Experiment Videos

Last Updated: Nov 23, 2025

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches
09:47

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Published on: December 15, 2023

1.5K
Decoding Natural Behavior from Neuroethological Embedding
08:00

Decoding Natural Behavior from Neuroethological Embedding

Published on: October 3, 2025

311
Analyzing Multifactorial RNA-Seq Experiments with DiCoExpress
05:22

Analyzing Multifactorial RNA-Seq Experiments with DiCoExpress

Published on: July 29, 2022

3.8K

Area of Science:

  • Computational biology and Adversarial Deconfounding AutoEncoder applications
  • Bioinformatics and machine learning within genomics

Background:

Large-scale gene expression datasets frequently contain significant noise from technical artifacts and irrelevant biological variables. These unwanted sources of variation often obscure the underlying biological signals researchers seek to identify. Prior research has shown that standard unsupervised neural networks struggle to distinguish these confounding factors from meaningful data. That uncertainty drove the development of methods aimed at creating more robust representations of gene expression profiles. No prior work had fully resolved the challenge of ensuring these latent spaces transfer effectively across different experimental domains. This gap motivated the exploration of new architectures capable of disentangling complex signals from systematic interference. Previous approaches often failed to generalize when applied to datasets with varying distributions of confounding variables. The field required a more sophisticated framework to isolate true biological information from pervasive technical batch effects.

Purpose Of The Study:

The aim of this study is to introduce the Adversarial Deconfounding AutoEncoder approach for improving the quality of gene expression latent spaces. Researchers face significant challenges when analyzing large datasets due to the presence of technical artifacts and uninteresting biological variables. These confounding factors often lead to embeddings that fail to generalize across different experimental domains. The authors seek to disentangle these unwanted signals from the true biological information contained within the profiles. This motivation stems from the need to create more robust and biologically informative representations of genomic data. The study addresses the limitation where embeddings learned from one dataset do not transfer effectively to others. By developing a specialized neural network architecture, the team intends to provide a solution for isolating meaningful signals. This work focuses on ensuring that the resulting latent spaces are free from systematic interference and highly reliable for downstream analysis.

Main Methods:

Review Approach framing involves evaluating the performance of the proposed neural network architecture on two distinct gene expression datasets. The researchers implement a dual-network system consisting of a primary autoencoder and a secondary adversary. The autoencoder functions to reconstruct the original input measurements from a compressed latent representation. Simultaneously, the adversary network attempts to predict specific confounding variables directly from that same latent space. The team employs joint training to optimize these two networks toward conflicting objectives. This design forces the encoder to retain only information that is independent of the identified confounders. The study compares the results of this new model against standard autoencoder benchmarks and other existing deconfounding strategies. All code and supplementary data are provided to ensure the reproducibility of the experimental findings.

Main Results:

Key Findings From the Literature indicate that the proposed model successfully generates embeddings that do not encode confounding information. The researchers demonstrate that their approach conserves the biological signals present in the original input space. The model achieves superior performance compared to standard autoencoder architectures in all tested scenarios. The team reports that the embeddings generalize successfully across different confounder domains, unlike previous methods. The results show that the adversary effectively prevents the latent space from capturing unwanted technical artifacts or biological variables. The study confirms that the model maintains high reconstruction accuracy while simultaneously removing the specified confounding influences. The authors provide quantitative evidence that their approach outperforms other existing deconfounding techniques on two distinct datasets. These findings highlight the robustness of the adversarial training process in isolating true biological signals.

Conclusions:

The authors propose that the Adversarial Deconfounding AutoEncoder effectively isolates biological signals from confounding variables. Synthesis and implications suggest this model produces representations that remain stable across diverse experimental settings. The researchers demonstrate that their framework outperforms traditional autoencoder architectures in maintaining data integrity. Findings indicate that the adversarial training process successfully prevents the encoding of unwanted technical or biological noise. The team reports that their approach maintains high reconstruction accuracy while simultaneously removing specific confounding influences. These results imply that the method enhances the generalizability of latent spaces derived from complex genomic profiles. The study confirms that joint training of the autoencoder and adversary achieves superior disentanglement compared to existing techniques. This work provides a robust tool for researchers seeking to extract reliable insights from heterogeneous gene expression data.

The model utilizes a dual-network architecture where an autoencoder reconstructs input measurements while an adversary attempts to predict confounders from the resulting latent space. This joint training forces the encoder to discard confounding signals while preserving essential biological information.

The framework incorporates an adversary network specifically tasked with identifying confounding variables from the latent embedding. This component acts as a filter, ensuring the primary autoencoder does not retain information related to batch effects or age.

The researchers indicate that joint training is necessary to achieve effective disentanglement. This approach allows the model to balance the competing goals of accurate data reconstruction and the complete removal of confounding influences.

The latent embedding serves as the central data structure. It acts as a compressed representation of the original gene expression profiles, which the model refines to maximize biological signal while minimizing interference.

The authors measure the success of their model by assessing its ability to generalize across different confounder domains. They compare this performance against standard autoencoders and other existing deconfounding methods to validate the improvements.

The researchers claim that their method enables the creation of biologically informative embeddings that transfer successfully between datasets. This improvement allows for more reliable analysis when dealing with diverse experimental sources.