Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Videos

Principal component analysis for clustering gene expression data.

K Y Yeung¹, W L Ruzzo

¹Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195, USA. kayee@cs.washington.edu

Bioinformatics (Oxford, England)

|October 9, 2001

Summary

This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Acute poisoning by dexmedetomidine-containing chewing gum in a child.

Pathology·2021

Same author

A prospective interventional study to examine the effect of a silver alloy and hydrogel-coated catheter on the incidence of catheter-associated urinary tract infection.

Hong Kong medical journal = Xianggang yi xue za zhi·2017

Same author

The regulation of mitochondrial DNA copy number in glioblastoma cells.

Cell death and differentiation·2013

Same author

Predicting relapse prior to transplantation in chronic myeloid leukemia by integrating expert knowledge and expression data.

Bioinformatics (Oxford, England)·2012

Same author

Context-specific infinite mixtures for clustering gene expression profiles across diverse microarray dataset.

Bioinformatics (Oxford, England)·2006

Same author

Bayesian mixture model based clustering of replicated microarray data.

Bioinformatics (Oxford, England)·2004

Same journal

Biomedical Concept Recognition with Error-aware Negative-enhanced Ranking Framework.

Bioinformatics (Oxford, England)·2026

Same journal

TEDLH: Domain HMMs for sensitive detection of remote homologues.

Bioinformatics (Oxford, England)·2026

Same journal

PLNFGL: Joint Estimation of Multi-Condition Gene Networks from Single-cell RNA-seq Data.

Bioinformatics (Oxford, England)·2026

Same journal

MCFST: Spatial domain identification method based on multi-view graph convolutional network and graph fusion network.

Bioinformatics (Oxford, England)·2026

Same journal

SpaBiT: Enhancing Spatial Transcriptomics Resolution via Bidirectional Attention Transformers.

Bioinformatics (Oxford, England)·2026

Same journal

EDEL: Enhancing Dense Retrievers for Curation of Biomedical Knowledge Bases.

Bioinformatics (Oxford, England)·2026

See all related articles

Principal Component Analysis (PCA) does not always improve gene expression data clustering. Using principal components (PCs) can degrade cluster quality, especially when using the first few PCs, which may not capture essential cluster structures.

Area of Science:

Bioinformatics
Computational Biology
Genomics

Background:

Gene expression data analysis requires robust methodologies due to high dimensionality and complex biological networks.
Clustering and Principal Component Analysis (PCA) are common exploratory techniques for gene expression data.
Different analytical approaches can yield divergent conclusions from the same dataset.

Purpose of the Study:

To evaluate the effectiveness of principal components (PCs) in preserving cluster structures within gene expression data.
To compare clustering quality on original data versus data projected onto principal component axes.

Main Methods:

Utilized both real and synthetic gene expression datasets.
Performed clustering on original data and compared it to clustering on data reduced to principal component subspaces.

Related Experiment Videos

Assessed cluster quality across different clustering algorithms and similarity metrics.

Main Results:

Clustering using PCs often degrades, rather than improves, cluster quality compared to using original variables.
The leading principal components, while capturing data variance, do not necessarily capture the underlying cluster structure.
The impact of PCA on clustering varies significantly depending on the algorithm and similarity metric employed.

Conclusions:

Principal Component Analysis (PCA) is generally not recommended as a pre-processing step before clustering gene expression data.
Exceptions may exist, but careful evaluation is needed for specific applications.
The study highlights the importance of validating analytical choices in bioinformatics.