Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Calculating and Interpreting the Linear Correlation Coefficient01:11

Calculating and Interpreting the Linear Correlation Coefficient

6.4K
The correlation coefficient, r, developed by Karl Pearson in the early 1900s, is numerical and provides a measure of strength and direction of the linear association between the independent variable, x, and the dependent variable, y. Hence, it is also known as the Pearson product-moment correlation coefficient. It can be calculated using the following equation:
6.4K
DNA Microarrays02:34

DNA Microarrays

16.5K
Microarrays are high-throughput and relatively inexpensive assays that can be automated to analyze large quantities of data at a time. They are used in genome-wide studies to compare gene or protein expression under two varied conditions, such as healthy and diseased states. Microarrays consist of glass or silica slides on which probe molecules are covalently attached through surface functionalization. Most commonly, the slides are prepared through the chemisorption of silanes to silica...
16.5K
RACE - Rapid Amplification of cDNA Ends02:35

RACE - Rapid Amplification of cDNA Ends

5.9K
Rapid Amplification of cDNA Ends, or RACE, is one of the most effective methods to obtain a full-length cDNA from an mRNA sequence between a known internal region to the unknown sequence at the 5’ or 3’ end. The unknown region is cloned in the cDNA by a gene-specific primer that binds the known end, and a hybrid primer that attaches a predefined anchor sequence to the unknown end of the cDNA. The sequence in between is amplified by PCR with an anchor primer and a gene-specific...
5.9K
Extraction: Partition and Distribution Coefficients01:14

Extraction: Partition and Distribution Coefficients

4.3K
The distribution law or Nernst's distribution law is the law that governs the distribution of a solute between two immiscible solvents. This law, also known as the partition law, states that if a solute is added to the mixture of two immiscible solvents at a constant temperature, the solute is distributed between the two solvents in such a way that the ratio of solute concentrations in the solvents remains constant at equilibrium.
For extracting a solute from an aqueous phase into an...
4.3K
Parallel Processing01:20

Parallel Processing

925
The brain processes sensory information rapidly due to parallel processing, which involves sending data across multiple neural pathways at the same time. This method allows the brain to manage various sensory qualities, such as shapes, colors, movements, and locations, all concurrently. For instance, when observing a forest landscape, the brain simultaneously processes the movement of leaves, the shapes of trees, the depth between them, and the various shades of green. This enables a quick and...
925
Correlation of Experimental Data01:23

Correlation of Experimental Data

547
Dimensional analysis simplifies complex physical problems and guides experimental investigations, but it does not provide complete solutions. It identifies the dimensionless groups that influence a phenomenon, but experimental data is needed to establish the specific relationships and validate theoretical predictions.
For example, a spherical particle moving through a viscous fluid experiences drag. Dimensional analysis shows that the drag force depends on the particle's diameter, velocity,...
547

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Biomimetic sponges improve muscle structure and function following volumetric muscle loss.

Journal of biomedical materials research. Part A·2021
Same author

Neurodevelopmental, Cognitive, and Psychosocial Outcomes for Individuals With Pathogenic Variants in the TCF12 Gene and Associated Craniosynostosis.

The Journal of craniofacial surgery·2021
Same author

Plant communities affect arbuscular mycorrhizal fungal diversity and community composition in grassland microcosms.

The New phytologist·2021
Same author

Prevention of docetaxel-associated febrile neutropenia with primary granulocyte colony-stimulating factor in Chinese metastatic hormone-sensitive and castration-resistant prostate cancer patients.

Asia-Pacific journal of clinical oncology·2021
Same author

The effects of maturity matched and un-matched opposition on physical performance and spatial exploration behavior during youth basketball matches.

PloS one·2021
Same author

Psychometric Validation of the FACE-Q Craniofacial Module for Facial Nerve Paralysis.

Facial plastic surgery & aesthetic medicine·2021
Same journal

SNPio: a Python interface for population genomic data processing.

BMC bioinformatics·2026
Same journal

SpaHNR: a spatial domain identification method via sparse attention-based hierarchical node representation and multi-view contrastive learning.

BMC bioinformatics·2026
Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026
Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026
Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026
Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026
See all related articles

Related Experiment Video

Updated: Apr 21, 2026

Rup (RNA-seq Usability Assessment Pipeline) - Quality Control for Bulk RNA-seq Experiments in Eukaryotes
05:07

Rup (RNA-seq Usability Assessment Pipeline) - Quality Control for Bulk RNA-seq Experiments in Eukaryotes

Published on: November 7, 2025

546

Optimising parallel R correlation matrix calculations on gene expression data using MapReduce.

Shicai Wang1, Ioannis Pandis2, David Johnson3

  • 1Data Science Institute, Imperial College London, London, UK. s.wang11@imperial.ac.uk.

BMC Bioinformatics
|November 6, 2014
PubMed
Summary
This summary is machine-generated.

We developed a MapReduce-based algorithm to speed up correlation calculations for large molecular datasets. This approach significantly improves the efficiency of analyzing high-throughput sequencing data for better clinical decision-making.

More Related Videos

Comprehensive Workflow for the Genome-wide Identification and Expression Meta-analysis of the ATL E3 Ubiquitin Ligase Gene Family in Grapevine
10:40

Comprehensive Workflow for the Genome-wide Identification and Expression Meta-analysis of the ATL E3 Ubiquitin Ligase Gene Family in Grapevine

Published on: December 22, 2017

9.8K
Analyzing Multifactorial RNA-Seq Experiments with DiCoExpress
05:22

Analyzing Multifactorial RNA-Seq Experiments with DiCoExpress

Published on: July 29, 2022

3.4K

Related Experiment Videos

Last Updated: Apr 21, 2026

Rup (RNA-seq Usability Assessment Pipeline) - Quality Control for Bulk RNA-seq Experiments in Eukaryotes
05:07

Rup (RNA-seq Usability Assessment Pipeline) - Quality Control for Bulk RNA-seq Experiments in Eukaryotes

Published on: November 7, 2025

546
Comprehensive Workflow for the Genome-wide Identification and Expression Meta-analysis of the ATL E3 Ubiquitin Ligase Gene Family in Grapevine
10:40

Comprehensive Workflow for the Genome-wide Identification and Expression Meta-analysis of the ATL E3 Ubiquitin Ligase Gene Family in Grapevine

Published on: December 22, 2017

9.8K
Analyzing Multifactorial RNA-Seq Experiments with DiCoExpress
05:22

Analyzing Multifactorial RNA-Seq Experiments with DiCoExpress

Published on: July 29, 2022

3.4K

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • High-throughput molecular profiling generates large datasets crucial for clinical decision-making through subject stratification.
  • Unsupervised clustering algorithms are used for stratification, but their speed is limited by inefficient correlation matrix calculations.
  • The increasing scale of molecular data necessitates optimized algorithms to maintain performance.

Purpose of the Study:

  • To evaluate current parallel correlation calculation methods.
  • To introduce an efficient MapReduce-based algorithm for optimizing correlation calculations in large-scale molecular data analysis.
  • To improve the speed and scalability of analyzing high-throughput genomic data.

Main Methods:

  • Developed a data distribution and parallel calculation algorithm using the MapReduce framework.
  • Implemented the algorithm using the R package RHIPE.
  • Evaluated performance using micro- and macro-benchmarks with gene expression data and the TCGA dataset.

Main Results:

  • The MapReduce-based RHIPE implementation showed significant speedups (3.26-5.83x) over default Snowfall and basic RHIPE for Euclidean, Pearson, and Spearman correlations.
  • In macro-benchmarks, optimized RHIPE was 2.03-16.56x faster than vanilla R and 1.22-1.71x faster than optimized Snowfall, including faster data preparation.
  • Both optimized RHIPE and Snowfall completed Kendall correlation on the TCGA dataset within 7 hours, over 30x faster than estimated vanilla R.

Conclusions:

  • The MapReduce algorithm implemented in RHIPE outperforms vanilla R and conventional parallel algorithms like R Snowfall for molecular data analysis.
  • The MapReduce framework is highly promising for analyzing large, high-dimensional genomic datasets.
  • This algorithm serves as a foundation for optimizing correlation calculations in Big Data from high-throughput molecular profiling.