Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

RNA-seq

RNA-seq

RNA sequencing, or RNA-Seq, is a high-throughput sequencing technology used to study the transcriptome of a cell. Transcriptomics helps to interpret the functional elements of a genome and identify the molecular constituents of an organism. Additionally, it also helps in understanding the development of an organism and the occurrence of diseases.
Before the discovery of RNA-seq, microarray-based methods and Sanger sequencing were used for transcriptome analysis. However, while...

DNA Microarrays

DNA Microarrays

Microarrays are high-throughput and relatively inexpensive assays that can be automated to analyze large quantities of data at a time. They are used in genome-wide studies to compare gene or protein expression under two varied conditions, such as healthy and diseased states. Microarrays consist of glass or silica slides on which probe molecules are covalently attached through surface functionalization. Most commonly, the slides are prepared through the chemisorption of silanes to silica...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

On the state of protein function prediction: a report on the fourth CAFA challenge.

bioRxiv : the preprint server for biology·2026

Same author

Economic Burden of Overall and Advanced Light Chain Amyloidosis: Results from a Claims Linked Electronic Health Record Database Analysis.

ClinicoEconomics and outcomes research : CEOR·2026

Same author

Advances in Protein Function Prediction from the Fifth CAFA Challenge.

bioRxiv : the preprint server for biology·2026

Same author

Transcriptomic subtypes in high-grade serous ovarian cancer are driven by tumor cellular composition.

bioRxiv : the preprint server for biology·2026

Same author

The Common Fund Data Ecosystem (CFDE).

bioRxiv : the preprint server for biology·2026

Same author

Deconvolved tumor adipocyte proportions and high grade serous ovarian carcinoma survival.

bioRxiv : the preprint server for biology·2026

Same journal

Evaluation of cold resistance in pear (<i>Pyrus</i> L.) germplasms: integrating physiological and biochemical responses with anatomical traits under low temperature stress.

PeerJ·2026

Same journal

Evaluation of retinal and choroidal microvasculature parameters by OCTA in patients with premature ovarian insufficiency: a prospective case control study.

PeerJ·2026

Same journal

Development and prognostic evaluation of a combined SII-LNR score in resectable gastric and gastroesophageal junction adenocarcinoma treated with perioperative FLOT: a retrospective single-center study.

PeerJ·2026

Same journal

Validity and reliability evaluation of the Chinese version of the attention-deficit/hyperactivity disorder stigma questionnaire.

PeerJ·2026

Same journal

Relationship between mental disorders and non-traumatic cerebral hemorrhage: cross-sectional analysis and mendelian randomization.

PeerJ·2026

Same journal

Association between intestinal functional disorders and anal fistula: evidence from a retrospective case-control study.

PeerJ·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Mar 26, 2026

Analyzing Multifactorial RNA-Seq Experiments with DiCoExpress

Analyzing Multifactorial RNA-Seq Experiments with DiCoExpress

Published on: July 29, 2022

Cross-platform normalization of microarray and RNA-seq data for machine learning applications.

Jeffrey A Thompson¹, Jie Tan², Casey S Greene³

¹Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, United States of America; Quantitative Biomedical Sciences Program, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, United States of America.

|February 5, 2016

Summary

This summary is machine-generated.

Training Distribution Matching (TDM) adapts RNA-sequencing data for machine learning models trained on older microarray data. This enables larger training sets for improved gene expression analysis.

Keywords:

Cross-platform normalization Distribution Gene expression Machine learning Microarray Nonparanormal transformation Normalization Quantile normalization RNA-sequencing Training

More Related Videos

Optimization for Sequencing and Analysis of Degraded FFPE-RNA Samples

Optimization for Sequencing and Analysis of Degraded FFPE-RNA Samples

Published on: June 8, 2020

A Complete Pipeline for Isolating and Sequencing MicroRNAs, and Analyzing Them Using Open Source Tools

A Complete Pipeline for Isolating and Sequencing MicroRNAs, and Analyzing Them Using Open Source Tools

Published on: August 21, 2019

Related Experiment Videos

Last Updated: Mar 26, 2026

Analyzing Multifactorial RNA-Seq Experiments with DiCoExpress

Analyzing Multifactorial RNA-Seq Experiments with DiCoExpress

Published on: July 29, 2022

Optimization for Sequencing and Analysis of Degraded FFPE-RNA Samples

Optimization for Sequencing and Analysis of Degraded FFPE-RNA Samples

Published on: June 8, 2020

A Complete Pipeline for Isolating and Sequencing MicroRNAs, and Analyzing Them Using Open Source Tools

A Complete Pipeline for Isolating and Sequencing MicroRNAs, and Analyzing Them Using Open Source Tools

Published on: August 21, 2019

Area of Science:

Bioinformatics
Computational Biology
Genomics

Background:

Large-scale gene expression datasets are crucial for machine learning applications.
Microarray data represents a significant legacy resource, while RNA-sequencing is the current standard.
Integrating these diverse datasets is challenging due to technological differences.

Purpose of the Study:

To develop a method for harmonizing RNA-sequencing data with legacy microarray data for machine learning.
To enable the creation of larger, more diverse training datasets for gene expression analysis.
To facilitate the application of models trained on historical data to new RNA-sequencing data.

Main Methods:

Development of Training Distribution Matching (TDM) algorithm.
Evaluation of TDM against quantile normalization, nonparanormal transformation, and log 2 transformation.
Assessment using simulated and biological gene expression datasets.
Inclusion of both supervised and unsupervised machine learning approaches in the evaluation.

Main Results:

TDM demonstrated consistently strong performance across various settings.
Quantile normalization also showed good performance in many scenarios.
The study provides a TDM package for the R programming language for practical application.

Conclusions:

TDM is an effective method for integrating RNA-sequencing and microarray gene expression data.
The developed method enhances the utility of legacy data for contemporary machine learning analyses.
The availability of the R package promotes wider adoption and improved gene expression research.