Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

High similarity sequence comparison in clustering large sequence databases.

Lorie Dudoignon1, Eric Glemet, Hendrik Cornelis Heus

  • 1IMT, INRIA, Marseille Cedex 20, 13451, France. Lorie.Dudoignon@sophia.inria.fr

Proceedings. IEEE Computer Society Bioinformatics Conference
|April 20, 2005
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Easy identification of generalized common and conserved nested intervals.

Journal of computational biology : a journal of computational molecular cell biology·2014
Same author

On the identification of conflicting contiguities in ancestral genome reconstruction.

Journal of computational biology : a journal of computational molecular cell biology·2013
Same author

Identification of genomic features using microsyntenies of domains: domain teams.

Genome research·2005
Same author

Approximate matching of structured motifs in DNA sequences.

Journal of bioinformatics and computational biology·2005
Same author

Balancing protein similarity and gene co-expression reveals new links between genetic conservation and developmental diversity in invertebrates.

Bioinformatics (Oxford, England)·2004
Same author

Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching.

Journal of computational biology : a journal of computational molecular cell biology·2004
Same journal

Epitope prediction algorithms for peptide-based vaccine design.

Proceedings. IEEE Computer Society Bioinformatics Conference·2006
Same journal

Keynote address: the role of algorithmic research in computational genomics.

Proceedings. IEEE Computer Society Bioinformatics Conference·2006
Same journal

Stepping up the pace of discovery: the genomes to life program.

Proceedings. IEEE Computer Society Bioinformatics Conference·2006
Same journal

Prokaryote phylogeny without sequence alignment: from avoidance signature to composition distance.

Proceedings. IEEE Computer Society Bioinformatics Conference·2006
Same journal

Efficient reconstruction of phylogenetic networks with constrained recombination.

Proceedings. IEEE Computer Society Bioinformatics Conference·2006
Same journal

A new approach for gene annotation using unambiguous sequence joining.

Proceedings. IEEE Computer Society Bioinformatics Conference·2006
See all related articles

We developed a fast algorithm for clustering and searching large sequence databases, outperforming existing methods. This approach efficiently handles diverse sequence types, including entire chromosomes.

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • Sequence clustering and searching are fundamental in bioinformatics.
  • Existing algorithms face challenges with large datasets and diverse sequence types.

Purpose of the Study:

  • To introduce a novel, fast algorithm for sequence clustering and searching.
  • To address limitations of conventional approaches for large-scale sequence data analysis.

Main Methods:

  • The algorithm utilizes a strictly defined similarity measure.
  • Its computational complexity is proportional to shared subwords, not database size.
  • It is applicable to both nucleotide and proteic sequences.

Main Results:

Related Experiment Videos

  • Demonstrated superior speed compared to conventional EST clustering methods.
  • Successfully processed large sequence databases, including entire chromosomes.
  • Validated the algorithm's efficiency and scalability through theoretical analysis and experiments.
  • Conclusions:

    • The proposed algorithm offers a significant advancement in handling large sequence data.
    • Its subword-based approach provides a scalable and versatile solution for sequence analysis.
    • This method has broad implications for genomics and proteomics research.