Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

RNA-seq03:21

RNA-seq

12.4K
RNA sequencing, or RNA-Seq, is a high-throughput sequencing technology used to study the transcriptome of a cell. Transcriptomics helps to interpret the functional elements of a genome and identify the molecular constituents of an organism. Additionally, it also helps in understanding the development of an organism and the occurrence of diseases. 
Before the discovery of RNA-seq, microarray-based methods and Sanger sequencing were used for transcriptome analysis. However, while...
12.4K
DNA Microarrays02:34

DNA Microarrays

22.8K
Microarrays are high-throughput and relatively inexpensive assays that can be automated to analyze large quantities of data at a time. They are used in genome-wide studies to compare gene or protein expression under two varied conditions, such as healthy and diseased states. Microarrays consist of glass or silica slides on which probe molecules are covalently attached through surface functionalization. Most commonly, the slides are prepared through the chemisorption of silanes to silica...
22.8K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

On the state of protein function prediction: a report on the fourth CAFA challenge.

bioRxiv : the preprint server for biology·2026
Same author

Economic Burden of Overall and Advanced Light Chain Amyloidosis: Results from a Claims Linked Electronic Health Record Database Analysis.

ClinicoEconomics and outcomes research : CEOR·2026
Same author

Advances in Protein Function Prediction from the Fifth CAFA Challenge.

bioRxiv : the preprint server for biology·2026
Same author

Transcriptomic subtypes in high-grade serous ovarian cancer are driven by tumor cellular composition.

bioRxiv : the preprint server for biology·2026
Same author

The Common Fund Data Ecosystem (CFDE).

bioRxiv : the preprint server for biology·2026
Same author

Deconvolved tumor adipocyte proportions and high grade serous ovarian carcinoma survival.

bioRxiv : the preprint server for biology·2026
Same journal

Evaluation of cold resistance in pear (<i>Pyrus</i> L.) germplasms: integrating physiological and biochemical responses with anatomical traits under low temperature stress.

PeerJ·2026
Same journal

Evaluation of retinal and choroidal microvasculature parameters by OCTA in patients with premature ovarian insufficiency: a prospective case control study.

PeerJ·2026
Same journal

Development and prognostic evaluation of a combined SII-LNR score in resectable gastric and gastroesophageal junction adenocarcinoma treated with perioperative FLOT: a retrospective single-center study.

PeerJ·2026
Same journal

Validity and reliability evaluation of the Chinese version of the attention-deficit/hyperactivity disorder stigma questionnaire.

PeerJ·2026
Same journal

Relationship between mental disorders and non-traumatic cerebral hemorrhage: cross-sectional analysis and mendelian randomization.

PeerJ·2026
Same journal

Association between intestinal functional disorders and anal fistula: evidence from a retrospective case-control study.

PeerJ·2026
See all related articles

Related Experiment Video

Updated: Mar 26, 2026

Analyzing Multifactorial RNA-Seq Experiments with DiCoExpress
05:22

Analyzing Multifactorial RNA-Seq Experiments with DiCoExpress

Published on: July 29, 2022

4.1K

Cross-platform normalization of microarray and RNA-seq data for machine learning applications.

Jeffrey A Thompson1, Jie Tan2, Casey S Greene3

  • 1Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, United States of America; Quantitative Biomedical Sciences Program, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, United States of America.

Peerj
|February 5, 2016
PubMed
Summary
This summary is machine-generated.

Training Distribution Matching (TDM) adapts RNA-sequencing data for machine learning models trained on older microarray data. This enables larger training sets for improved gene expression analysis.

Keywords:
Cross-platform normalizationDistributionGene expressionMachine learningMicroarrayNonparanormal transformationNormalizationQuantile normalizationRNA-sequencingTraining

More Related Videos

Optimization for Sequencing and Analysis of Degraded FFPE-RNA Samples
07:30

Optimization for Sequencing and Analysis of Degraded FFPE-RNA Samples

Published on: June 8, 2020

12.9K
A Complete Pipeline for Isolating and Sequencing MicroRNAs, and Analyzing Them Using Open Source Tools
09:29

A Complete Pipeline for Isolating and Sequencing MicroRNAs, and Analyzing Them Using Open Source Tools

Published on: August 21, 2019

8.0K

Related Experiment Videos

Last Updated: Mar 26, 2026

Analyzing Multifactorial RNA-Seq Experiments with DiCoExpress
05:22

Analyzing Multifactorial RNA-Seq Experiments with DiCoExpress

Published on: July 29, 2022

4.1K
Optimization for Sequencing and Analysis of Degraded FFPE-RNA Samples
07:30

Optimization for Sequencing and Analysis of Degraded FFPE-RNA Samples

Published on: June 8, 2020

12.9K
A Complete Pipeline for Isolating and Sequencing MicroRNAs, and Analyzing Them Using Open Source Tools
09:29

A Complete Pipeline for Isolating and Sequencing MicroRNAs, and Analyzing Them Using Open Source Tools

Published on: August 21, 2019

8.0K

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • Large-scale gene expression datasets are crucial for machine learning applications.
  • Microarray data represents a significant legacy resource, while RNA-sequencing is the current standard.
  • Integrating these diverse datasets is challenging due to technological differences.

Purpose of the Study:

  • To develop a method for harmonizing RNA-sequencing data with legacy microarray data for machine learning.
  • To enable the creation of larger, more diverse training datasets for gene expression analysis.
  • To facilitate the application of models trained on historical data to new RNA-sequencing data.

Main Methods:

  • Development of Training Distribution Matching (TDM) algorithm.
  • Evaluation of TDM against quantile normalization, nonparanormal transformation, and log 2 transformation.
  • Assessment using simulated and biological gene expression datasets.
  • Inclusion of both supervised and unsupervised machine learning approaches in the evaluation.

Main Results:

  • TDM demonstrated consistently strong performance across various settings.
  • Quantile normalization also showed good performance in many scenarios.
  • The study provides a TDM package for the R programming language for practical application.

Conclusions:

  • TDM is an effective method for integrating RNA-sequencing and microarray gene expression data.
  • The developed method enhances the utility of legacy data for contemporary machine learning analyses.
  • The availability of the R package promotes wider adoption and improved gene expression research.