Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Simple integrative preprocessing preserves what is shared in data sources.

Abhishek Tripathi1, Arto Klami, Samuel Kaski

  • 1Department of Computer Science, P,O, Box 68, FI-00014, University of Helsinki, Finland. abhishek@cs.helsinki.fi

BMC Bioinformatics
|February 23, 2008
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Comprehensive Profiling of Cell Surface Proteins in Testicular Germ Cell Tumors.

Cancer research communications·2026
Same author

Searching for BBB permeable phytochemical inhibitors for targeting brain cancer associated HER2 protein through free energy calculations and pharmacokinetic analysis.

Scientific reports·2026
Same author

Evaluating Worldwide Disparities in Bladder Cancer Clinical Trial Availability.

Cancers·2026
Same author

Outcomes for females versus males treated with eribulin mesylate for advanced urothelial carcinoma.

Current problems in cancer·2026
Same author

Characterization of aberrant alternative splicing landscape in patients with metastatic renal cell carcinoma.

Journal for immunotherapy of cancer·2026
Same author

Disparities in Outcomes in Latino Subpopulations with Localized Prostate Cancer Undergoing Radical Prostatectomy: A Population-Based Analysis.

Cancers·2026
Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026
Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026
Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026
Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026
Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026
Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026
See all related articles

This study introduces a novel data fusion method using Canonical Correlation Analysis (CCA) for bioinformatics. The approach integrates multiple data sources, preserving shared properties while discarding source-specific noise for faster, interpretable analysis.

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Data Science

Background:

  • Bioinformatics tools require fast, interpretable data integration methods for exploratory analysis.
  • Existing methods like Principal Components Analysis (PCA) and Canonical Correlation Analysis (CCA) have limitations in handling multi-source data.
  • Focus is on vector-valued data where source-specific variation is considered noise.

Purpose of the Study:

  • To develop a general-purpose, fast, and interpretable preprocessing tool for data integration in bioinformatics.
  • To fuse multiple data sources by preserving shared properties and discarding source-specific variation.
  • To enable effective exploratory data analysis through a novel application of CCA.

Main Methods:

  • A novel data fusion method is proposed by combining components from Canonical Correlation Analysis (CCA).

Related Experiment Videos

  • The method performs linear feature extraction, preserving shared variation across data sources.
  • Source-specific variation is identified and discarded as uninteresting noise.
  • Main Results:

    • The developed method is linear, fast, and easily interpretable.
    • It successfully fuses multiple data sources, retaining essential shared characteristics.
    • Demonstrated effectiveness on gene expression data for yeast cell cycle, leukemia gene expression, and yeast stress response classification.

    Conclusions:

    • A new method for data fusion in exploratory data analysis is introduced.
    • The method leverages Canonical Correlation Analysis (CCA) for dimensionality reduction in a novel way.
    • The approach offers simplicity, speed, and interpretability as a linear projection for multi-source data analysis.