Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

RNA-seq03:21

RNA-seq

12.1K
RNA sequencing, or RNA-Seq, is a high-throughput sequencing technology used to study the transcriptome of a cell. Transcriptomics helps to interpret the functional elements of a genome and identify the molecular constituents of an organism. Additionally, it also helps in understanding the development of an organism and the occurrence of diseases. 
Before the discovery of RNA-seq, microarray-based methods and Sanger sequencing were used for transcriptome analysis. However, while...
12.1K
RNA Structure01:23

RNA Structure

79.1K
Overview
The basic structure of RNA consists of a five-carbon sugar and one of four nitrogenous bases. Although most RNA is single-stranded, it can form complex secondary and tertiary structures. Such structures play essential roles in the regulation of transcription and translation.
Different Types of RNA Have the Same Basic Structure
There are three main types of ribonucleic acid (RNA): messenger RNA (mRNA), transfer RNA (tRNA), and ribosomal RNA (rRNA). All three RNA types consist of a...
79.1K
Ribosomal RNA Synthesis02:53

Ribosomal RNA Synthesis

14.8K
Ribosome synthesis is a highly complex and coordinated process involving more than 200 assembly factors. The synthesis and processing of ribosomal components occurs not only in the nucleolus but also in the nucleoplasm and the cytoplasm of eukaryotic cells.
Ribosome biogenesis begins with the synthesis of 5S and 45S pre-rRNAs by distinct RNA polymerases. The primary transcripts are extensively processed and modified before they are bound and folded by ribosomal proteins and assembly factors,...
14.8K
RNA Stability01:53

RNA Stability

35.7K
Intact DNA strands can be found in fossils, while scientists sometimes struggle to keep RNA intact under laboratory conditions. The structural variations between RNA and DNA underlie the differences in their stability and longevity. Because DNA is double-stranded, it is inherently more stable. The single-stranded structure of RNA is less stable but also more flexible and can form weak internal bonds. Additionally, most RNAs in the cell are relatively short, while DNA can be up to 250 million...
35.7K
RNA Interference01:23

RNA Interference

28.1K
RNA interference (RNAi) is a process in which a small non-coding RNA molecule blocks the post-transcriptional expression of a gene by binding to its messenger RNA (mRNA) and preventing the protein from being translated.
This process occurs naturally in cells, often through the activity of genomically-encoded microRNAs. Researchers can take advantage of this mechanism by introducing synthetic RNAs to deactivate specific genes for research or therapeutic purposes. For example, RNAi could be used...
28.1K
RNA Editing02:23

RNA Editing

9.9K
RNA editing is a post-transcriptional modification where a precursor mRNA (pre-mRNA) nucleotide sequence is changed by base insertion, deletion, or modification. The extent of RNA editing varies from a few hundred bases, in mitochondrial DNA of trypanosomes, to a just single base, in nuclear genes of mammals. Even a single base change in the pre-mRNA can convert a codon for one amino acid into the codon for another amino acid or a stop codon. This type of re-coding can significantly affect the...
9.9K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

MutAIverse: an AI-powered, mechanism-backed platform for discovering novel DNA adducts and their precursor genotoxins.

Journal of cheminformatics·2026
Same author

Gene dependency-informed inference of response to targeted cancer therapies.

Nature communications·2026
Same author

Optimizing genomics-aware clinical agents in precision oncology.

NPJ systems biology and applications·2026
Same author

Evolutionary-guided advanced deep-learning architecture powers mammalian GPCRome agonist predictions.

Cell reports·2026
Same author

Deep learning reveals endogenous sterols as allosteric modulators of the GPCR-Gα interface.

eLife·2025
Same author

Advancing automated cell type annotation with large language models and single-cell isoform sequencing.

Computational and structural biotechnology journal·2025
Same journal

Turbulent flow in a vortex separator with a directed pipe inlet.

Scientific reports·2026
Same journal

Systematic characteristic evaluation of clay-based cementitious material derived from calcium carbide residue and waste tile powder.

Scientific reports·2026
Same journal

Retraction Note: Improvement of a rapid diagnostic application of monoclonal antibodies against avian influenza H7 subtype virus using Europium nanoparticles.

Scientific reports·2026
Same journal

Applying large language models to spam detection in the Kazakh low-resource language setting.

Scientific reports·2026
Same journal

An open-source 3D printing system enabling in-situ freeze-thaw processing of hydrogels.

Scientific reports·2026
Same journal

An enhanced EfficientNet framework for automated waste classification using cosine annealing and label smoothing.

Scientific reports·2026
See all related articles

Related Experiment Video

Updated: Feb 3, 2026

Nuclei Isolation from Fresh Frozen Brain Tumors for Single-Nucleus RNA-seq and ATAC-seq
06:22

Nuclei Isolation from Fresh Frozen Brain Tumors for Single-Nucleus RNA-seq and ATAC-seq

Published on: August 25, 2020

13.5K

AutoImpute: Autoencoder based imputation of single-cell RNA-seq data.

Divyanshu Talwar1, Aanchal Mongia1, Debarka Sengupta2,3

  • 1Department of Computer Science and Engineering, Indraprastha Institute of Information Technology, Delhi, India.

Scientific Reports
|November 7, 2018
PubMed
Summary
This summary is machine-generated.

This article describes a new computational tool called AutoImpute, designed to fix missing data points in single-cell RNA sequencing experiments. These experiments often suffer from dropout events where gene expression levels appear as zero due to technical limitations. AutoImpute uses a machine learning approach to predict and fill in these missing values while preserving the biological integrity of the data. The authors demonstrate that their method improves data quality, leading to better cell classification and analysis results compared to existing techniques.

Keywords:
scRNA-seqdropout eventsgene expression matrixmachine learning

Frequently Asked Questions

More Related Videos

Identification of Alternative Splicing and Polyadenylation in RNA-seq Data
08:35

Identification of Alternative Splicing and Polyadenylation in RNA-seq Data

Published on: June 24, 2021

6.4K
Single-cell RNA-Seq of Defined Subsets of Retinal Ganglion Cells
11:26

Single-cell RNA-Seq of Defined Subsets of Retinal Ganglion Cells

Published on: May 22, 2017

14.4K

Related Experiment Videos

Last Updated: Feb 3, 2026

Nuclei Isolation from Fresh Frozen Brain Tumors for Single-Nucleus RNA-seq and ATAC-seq
06:22

Nuclei Isolation from Fresh Frozen Brain Tumors for Single-Nucleus RNA-seq and ATAC-seq

Published on: August 25, 2020

13.5K
Identification of Alternative Splicing and Polyadenylation in RNA-seq Data
08:35

Identification of Alternative Splicing and Polyadenylation in RNA-seq Data

Published on: June 24, 2021

6.4K
Single-cell RNA-Seq of Defined Subsets of Retinal Ganglion Cells
11:26

Single-cell RNA-Seq of Defined Subsets of Retinal Ganglion Cells

Published on: May 22, 2017

14.4K

Area of Science:

  • Bioinformatics and computational biology research within AutoImpute applications
  • Genomics and transcriptomics data analysis

Background:

Technical limitations in current sequencing protocols often hinder the accurate quantification of gene expression at the cellular level. Researchers frequently encounter significant dropout events that obscure the true biological signal within datasets. This phenomenon creates an abundance of zero values that complicate downstream statistical processing. No prior work had resolved the challenge of distinguishing these technical artifacts from genuine biological inactivity. Existing computational frameworks often struggle to balance noise reduction with the preservation of essential gene expression patterns. That uncertainty drove the need for more robust imputation strategies capable of handling sparse matrices. Prior research has shown that standard normalization techniques fail to fully address the unique distribution of single-cell measurements. This gap motivated the development of specialized algorithms designed to infer missing information without introducing systematic bias into the results.

Purpose Of The Study:

The aim of this work is to introduce a novel imputation method for correcting sparse gene expression data in single-cell sequencing. Researchers face a persistent challenge where insufficient RNA quantities lead to frequent dropout events. These events manifest as false zero counts that obscure the true expression profiles of individual cells. The authors sought to develop a solution that learns the inherent data distribution to fill in missing values accurately. They intended to create a tool that preserves the biological integrity of genes while reducing technical noise. This motivation stems from the need for more reliable downstream analysis in high-throughput genomic studies. The team addressed the problem by designing a framework that minimizes alterations to genes that show no activity. Their objective was to provide a robust computational alternative to existing methods for improving data quality.

Main Methods:

The investigators implemented a deep learning approach to address sparsity within gene expression matrices. Their review approach involved training an autoencoder to capture the underlying statistical distribution of the input data. This design allows the system to identify patterns associated with technical dropout events. The researchers utilized real-world sequencing datasets to evaluate the efficacy of their proposed model. They compared the performance of their tool against several established imputation algorithms currently used in the field. The assessment focused on the ability of the model to recover original expression values from artificially subsampled data. Furthermore, the team analyzed the impact of the imputation on downstream tasks such as cell-type identification. They specifically examined how the model influences clustering accuracy and the stabilization of variance across different samples.

Main Results:

The strongest finding indicates that the model achieves competitive performance across multiple benchmarks compared to existing imputation techniques. Quantitative assessments show that the tool successfully recovers gene expression values from subsampled datasets with high fidelity. The researchers observed significant improvements in cell-clustering accuracy after applying their imputation method to sparse matrices. Their results demonstrate effective variance stabilization, which is vital for reliable downstream statistical analysis. The study highlights that the method maintains high cell-type separability, allowing for clearer distinction between different cellular populations. Data analysis confirms that the autoencoder architecture minimizes modifications to genes that are biologically silent. The findings suggest that the approach provides a balanced solution for handling the high number of zero counts in sequencing data. These results confirm that the tool is well-suited for processing complex transcriptomic information at single-cell resolution.

Conclusions:

The authors propose that their machine learning framework effectively recovers gene expression values from sparse datasets. Their findings suggest that the tool maintains high accuracy during the reconstruction of subsampled information. The researchers indicate that the model enhances cell-clustering performance compared to alternative approaches. Synthesis and implications reveal that variance stabilization remains a key strength of this specific imputation strategy. The team reports that their method improves the separability of distinct cell types within complex samples. Their analysis demonstrates that the approach minimizes unnecessary alterations to genes that show no biological activity. The authors conclude that their system provides a competitive alternative for processing large-scale transcriptomic data. These results imply that autoencoder architectures offer a viable path for improving the reliability of single-cell sequencing interpretations.

The researchers propose that the mechanism utilizes an autoencoder to learn the inherent distribution of input data. By capturing these patterns, the tool predicts missing values while minimizing changes to biologically silent genes, thereby recovering expression levels from sparse matrices.

The tool employs an autoencoder-based architecture. This specific design allows the system to model complex data distributions, which distinguishes it from simpler statistical imputation methods that often rely on linear assumptions or basic neighborhood averaging.

The authors state that this architecture is necessary to handle the high-dimensional nature of single-cell data. Unlike traditional methods, this approach captures non-linear relationships between genes, which is required to accurately differentiate technical dropouts from true biological zeros.

The researchers use scRNA-seq expression matrices as the primary data type. This input is crucial because the model must learn the specific sparsity patterns inherent to these datasets to perform accurate imputation without distorting the underlying biological signals.

The team measures success through expression recovery from subsampled data, cell-clustering accuracy, and variance stabilization. These metrics demonstrate that the tool performs competitively against existing methods, specifically regarding the clear separation of different cell types.

The authors propose that their method offers a robust solution for improving the reliability of downstream transcriptomic analyses. They suggest that by stabilizing variance and enhancing cell-type separability, the tool facilitates more accurate biological insights from noisy single-cell datasets.