Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Principal component analysis for clustering gene expression data.

K Y Yeung1, W L Ruzzo

  • 1Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195, USA. kayee@cs.washington.edu

Bioinformatics (Oxford, England)
|October 9, 2001
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Acute poisoning by dexmedetomidine-containing chewing gum in a child.

Pathology·2021
Same author

A prospective interventional study to examine the effect of a silver alloy and hydrogel-coated catheter on the incidence of catheter-associated urinary tract infection.

Hong Kong medical journal = Xianggang yi xue za zhi·2017
Same author

The regulation of mitochondrial DNA copy number in glioblastoma cells.

Cell death and differentiation·2013
Same author

Predicting relapse prior to transplantation in chronic myeloid leukemia by integrating expert knowledge and expression data.

Bioinformatics (Oxford, England)·2012
Same author

Context-specific infinite mixtures for clustering gene expression profiles across diverse microarray dataset.

Bioinformatics (Oxford, England)·2006
Same author

Bayesian mixture model based clustering of replicated microarray data.

Bioinformatics (Oxford, England)·2004
Same journal

Biomedical Concept Recognition with Error-aware Negative-enhanced Ranking Framework.

Bioinformatics (Oxford, England)·2026
Same journal

TEDLH: Domain HMMs for sensitive detection of remote homologues.

Bioinformatics (Oxford, England)·2026
Same journal

PLNFGL: Joint Estimation of Multi-Condition Gene Networks from Single-cell RNA-seq Data.

Bioinformatics (Oxford, England)·2026
Same journal

MCFST: Spatial domain identification method based on multi-view graph convolutional network and graph fusion network.

Bioinformatics (Oxford, England)·2026
Same journal

SpaBiT: Enhancing Spatial Transcriptomics Resolution via Bidirectional Attention Transformers.

Bioinformatics (Oxford, England)·2026
Same journal

EDEL: Enhancing Dense Retrievers for Curation of Biomedical Knowledge Bases.

Bioinformatics (Oxford, England)·2026
See all related articles

Principal Component Analysis (PCA) does not always improve gene expression data clustering. Using principal components (PCs) can degrade cluster quality, especially when using the first few PCs, which may not capture essential cluster structures.

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • Gene expression data analysis requires robust methodologies due to high dimensionality and complex biological networks.
  • Clustering and Principal Component Analysis (PCA) are common exploratory techniques for gene expression data.
  • Different analytical approaches can yield divergent conclusions from the same dataset.

Purpose of the Study:

  • To evaluate the effectiveness of principal components (PCs) in preserving cluster structures within gene expression data.
  • To compare clustering quality on original data versus data projected onto principal component axes.

Main Methods:

  • Utilized both real and synthetic gene expression datasets.
  • Performed clustering on original data and compared it to clustering on data reduced to principal component subspaces.

Related Experiment Videos

  • Assessed cluster quality across different clustering algorithms and similarity metrics.
  • Main Results:

    • Clustering using PCs often degrades, rather than improves, cluster quality compared to using original variables.
    • The leading principal components, while capturing data variance, do not necessarily capture the underlying cluster structure.
    • The impact of PCA on clustering varies significantly depending on the algorithm and similarity metric employed.

    Conclusions:

    • Principal Component Analysis (PCA) is generally not recommended as a pre-processing step before clustering gene expression data.
    • Exceptions may exist, but careful evaluation is needed for specific applications.
    • The study highlights the importance of validating analytical choices in bioinformatics.