Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Videos

Accelerating protein classification using suffix trees.

B Dorohonceanu¹, C G Nevill-Manning

¹Computer Science Department, Rutgers University, Piscataway, NJ 07310, USA. dbogdan@cs.rutgers.edu

Proceedings. International Conference on Intelligent Systems for Molecular Biology

|September 8, 2000

Summary

This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Fast probabilistic analysis of sequence function using scoring matrices.

Bioinformatics (Oxford, England)·2000

Same author

Minimal-risk scoring matrices for sequence analysis.

Journal of computational biology : a journal of computational molecular cell biology·1999

Same author

Highly specific protein sequence motifs for genome analysis.

Proceedings of the National Academy of Sciences of the United States of America·1998

Same author

Enumerating and ranking discrete motifs.

Proceedings. International Conference on Intelligent Systems for Molecular Biology·1997

Same journal

Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB 2000). San Diego, California, USA. August 19-23, 2000.

Proceedings. International Conference on Intelligent Systems for Molecular Biology·2001

Same journal

Analysis of gene expression data with pathway scores.

Proceedings. International Conference on Intelligent Systems for Molecular Biology·2000

Same journal

Towards a complete map of the protein space based on a unified sequence and structure analysis of all known proteins.

Proceedings. International Conference on Intelligent Systems for Molecular Biology·2000

Same journal

Mining for putative regulatory elements in the yeast genome using gene expression data.

Proceedings. International Conference on Intelligent Systems for Molecular Biology·2000

Same journal

A multiple alignment algorithm for metabolic pathway analysis using enzyme hierarchy.

Proceedings. International Conference on Intelligent Systems for Molecular Biology·2000

Same journal

Sequence database search using jumping alignments.

Proceedings. International Conference on Intelligent Systems for Molecular Biology·2000

See all related articles

This study introduces a novel suffix tree method to accelerate protein region searches. This approach significantly speeds up analyses by efficiently excluding large protein segments, improving computational efficiency.

Area of Science:

Bioinformatics
Computational Biology
Genomics and Proteomics

Background:

Position-specific scoring matrices (PSSMs) are crucial for identifying conserved protein regions.
Current methods for PSSM searches can be computationally intensive.
The need for faster algorithms to analyze large protein sequence datasets is growing.

Purpose of the Study:

To develop and present an accelerated method for searching protein sequences using PSSMs.
To leverage suffix tree data structures for enhanced search efficiency.
To reduce the computational resources required for identifying conserved protein regions.

Main Methods:

Implemented a search acceleration technique utilizing a suffix tree data structure.
Integrated early termination strategies for scoring matrix evaluation within the suffix tree framework.

Related Experiment Videos

Optimized suffix tree node storage to minimize memory usage (17 bytes per input symbol).

Main Results:

The suffix tree-based method significantly accelerates PSSM searches, achieving speedups of up to tenfold.
The method efficiently prunes large portions of the sequence space, reducing unnecessary computations.
Memory requirements for the suffix tree were optimized, making the approach practical for large datasets.

Conclusions:

The proposed suffix tree method offers a substantial improvement in the speed of PSSM searches.
This technique provides a computationally efficient solution for identifying conserved protein regions.
The optimized memory footprint makes this method suitable for large-scale bioinformatics analyses.