Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Efficient clustering of large EST data sets on parallel computers.

Anantharaman Kalyanaraman1, Srinivas Aluru, Suresh Kothari

  • 1Department of Computer Science, Iowa State University, Ames, IA 50011, USA.

Nucleic Acids Research
|May 29, 2003
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Disambiguating a Soft Metagenomic Clustering.

Journal of computational biology : a journal of computational molecular cell biology·2025
Same author

SCEMENT: scalable and memory efficient integration of large-scale single-cell RNA-sequencing data.

Bioinformatics (Oxford, England)·2025
Same author

GraphSlimmer: Preserving Read Mappability with the Minimum Number of Variants.

Journal of computational biology : a journal of computational molecular cell biology·2024
Same author

MCPNet: a parallel maximum capacity-based genome-scale gene network construction framework.

Bioinformatics (Oxford, England)·2023
Same author

On the Hardness of Sequence Alignment on De Bruijn Graphs.

Journal of computational biology : a journal of computational molecular cell biology·2022
Same author

GRNUlar: A Deep Learning Framework for Recovering Single-Cell Gene Regulatory Networks.

Journal of computational biology : a journal of computational molecular cell biology·2022
Same journal

Correction to 'New origin firing is inhibited by APC/CCdh1 activation in S-phase after severe replication stress'.

Nucleic acids research·2026
Same journal

VeloRM: disentangling pre- and post-splicing RNA modification dynamics at single-cell resolution.

Nucleic acids research·2026
Same journal

Accessibility of telomeric overhangs to stabilizing small-molecule ligands.

Nucleic acids research·2026
Same journal

Multivalent interactions mediate SNAIL transcription factor stimulation of the nucleosome deacetylase activity of the CoREST complex.

Nucleic acids research·2026
Same journal

Genome-wide mapping of DNA G-quadruplexes in Trypanosoma brucei chromatin reveals enrichment in coding regions and transcription start sites.

Nucleic acids research·2026
Same journal

Correction to 'The Gene Ontology knowledgebase in 2026'.

Nucleic acids research·2026
See all related articles

PaCE software efficiently clusters expressed sequence tags (ESTs) on parallel computers, enabling faster gene identification and analysis of large datasets. This tool significantly reduces computational time and memory requirements for EST clustering.

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • Clustering expressed sequence tags (ESTs) is crucial for gene identification, expression studies, and genetic variation discovery.
  • Existing software often struggles with the computational demands of large-scale EST datasets.

Purpose of the Study:

  • To develop and evaluate PaCE (Parallel Clustering of ESTs), a software program designed for rapid and efficient EST clustering on parallel systems.
  • To address limitations in memory usage and computational time associated with large-scale EST data analysis.

Main Methods:

  • Developed memory-efficient algorithms to achieve linear memory complexity relative to input size.
  • Integrated algorithmic techniques to optimize computational work without compromising clustering quality.

Related Experiment Videos

  • Utilized parallel processing to accelerate runtime and enable the analysis of larger EST datasets.
  • Main Results:

    • Successfully clustered 168,200 Arabidopsis ESTs in 15 minutes on a 30-node IBM xSeries cluster.
    • Clustered 327,632 rat ESTs in 47 minutes and 420,694 Triticum aestivum ESTs in 3.25 hours.
    • Demonstrated high-quality clustering comparable to CAP3 and enabled analysis of significantly larger datasets.

    Conclusions:

    • PaCE offers a significant advancement in the speed and scalability of EST clustering.
    • The software facilitates multiple analysis runs, empowering biologists with enhanced EST data interpretation tools.
    • PaCE has been successfully applied to EST data from 23 plant species, with results available on the PlantGDB website.