Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Videos

High similarity sequence comparison in clustering large sequence databases.

Lorie Dudoignon¹, Eric Glemet, Hendrik Cornelis Heus

¹IMT, INRIA, Marseille Cedex 20, 13451, France. Lorie.Dudoignon@sophia.inria.fr

Proceedings. IEEE Computer Society Bioinformatics Conference

|April 20, 2005

Summary

This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Easy identification of generalized common and conserved nested intervals.

Journal of computational biology : a journal of computational molecular cell biology·2014

Same author

On the identification of conflicting contiguities in ancestral genome reconstruction.

Journal of computational biology : a journal of computational molecular cell biology·2013

Same author

Identification of genomic features using microsyntenies of domains: domain teams.

Genome research·2005

Same author

Approximate matching of structured motifs in DNA sequences.

Journal of bioinformatics and computational biology·2005

Same author

Balancing protein similarity and gene co-expression reveals new links between genetic conservation and developmental diversity in invertebrates.

Bioinformatics (Oxford, England)·2004

Same author

Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching.

Journal of computational biology : a journal of computational molecular cell biology·2004

Same journal

Epitope prediction algorithms for peptide-based vaccine design.

Proceedings. IEEE Computer Society Bioinformatics Conference·2006

Same journal

Keynote address: the role of algorithmic research in computational genomics.

Proceedings. IEEE Computer Society Bioinformatics Conference·2006

Same journal

Stepping up the pace of discovery: the genomes to life program.

Proceedings. IEEE Computer Society Bioinformatics Conference·2006

Same journal

Prokaryote phylogeny without sequence alignment: from avoidance signature to composition distance.

Proceedings. IEEE Computer Society Bioinformatics Conference·2006

Same journal

Efficient reconstruction of phylogenetic networks with constrained recombination.

Proceedings. IEEE Computer Society Bioinformatics Conference·2006

Same journal

A new approach for gene annotation using unambiguous sequence joining.

Proceedings. IEEE Computer Society Bioinformatics Conference·2006

See all related articles

We developed a fast algorithm for clustering and searching large sequence databases, outperforming existing methods. This approach efficiently handles diverse sequence types, including entire chromosomes.

Area of Science:

Bioinformatics
Computational Biology
Genomics

Background:

Sequence clustering and searching are fundamental in bioinformatics.
Existing algorithms face challenges with large datasets and diverse sequence types.

Purpose of the Study:

To introduce a novel, fast algorithm for sequence clustering and searching.
To address limitations of conventional approaches for large-scale sequence data analysis.

Main Methods:

The algorithm utilizes a strictly defined similarity measure.
Its computational complexity is proportional to shared subwords, not database size.
It is applicable to both nucleotide and proteic sequences.

Main Results:

Related Experiment Videos

Demonstrated superior speed compared to conventional EST clustering methods.

Successfully processed large sequence databases, including entire chromosomes.

Validated the algorithm's efficiency and scalability through theoretical analysis and experiments.

Conclusions:

The proposed algorithm offers a significant advancement in handling large sequence data.
Its subword-based approach provides a scalable and versatile solution for sequence analysis.
This method has broad implications for genomics and proteomics research.