Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Phylogenetic Trees03:21

Phylogenetic Trees

Phylogenetic trees come in many forms. It matters in which sequence the organisms are arranged from the bottom to the top of the tree, but the branches can rotate at their nodes without altering the information. The lines connecting individual nodes can be straight, angled, or even curved.The length of the branches can depict time or the relative amount of change among organisms. For instance, the branch length might indicate the number of amino acid changes in the sequence that underlies the...
Evolutionary Relationships through Genome Comparisons02:54

Evolutionary Relationships through Genome Comparisons

Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...
Microbial Phylogeny01:28

Microbial Phylogeny

Understanding the evolutionary relationships among microorganisms is fundamental to microbial ecology and taxonomy. Phylogenetic trees are essential tools for inferring these relationships, relying primarily on comparative analyses of molecular sequences such as DNA, RNA, or proteins. In microbial studies, these trees typically depict the evolutionary paths of diverse bacterial and archaeal species by mapping genetic differences accumulated over time.Phylogenetic trees are composed of tips,...
Genome Annotation and Assembly03:36

Genome Annotation and Assembly

The genome refers to all of the genetic material in an organism. It can range from a few million base pairs in microbial cells to several billion base pairs in many eukaryotic organisms. Genome assembly refers to the process of taking the DNA sequencing data and putting it all back together in a correct order to create a close representation of the original genome. This is followed by the identification of functional elements on the newly assembled genome, a process called genome annotation.
Multi-species Conserved Sequences02:51

Multi-species Conserved Sequences

Next-generation sequencing technologies have created large genomic databases of a variety of animals and plants. Ever since the human genome project was completed, scientists studied the genome of primates, mammals, and other phylogenetically distant living beings. Such large-scaleĀ  studies have provided new insights into the evolutionary relationship between organisms.
Although the genome of each species varies greatly from each other, a few sequences are highly conserved. Such conserved DNA...
Gene Evolution - Fast or Slow?02:05

Gene Evolution - Fast or Slow?

The genomes of eukaryotes are punctuated by long stretches of sequence which do not code for proteins or RNAs. Although some of these regions do contain crucial regulatory sequences, the vast majority of this DNA serves no known function. Typically, these regions of the genome are the ones in which the fastest change, in evolutionary terms, is observed, because there is typically little to no selection pressure acting on these regions to preserve their sequences.
In contrast, regions which code...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

A lightweight YOLOv8n-based method for rail surface defect detection in complex railway environments.

Scientific reportsĀ·2026
Same author

Spatiotemporal dynamics of the unprecedented chikungunya outbreak in Guangdong, China.

Emerging microbes & infectionsĀ·2026
Same author

Ratio-Tunable Dual-Peptide and Ultrasound-Assisted Nanoplatform for Enhancing Personalized Antitumor Immunotherapy.

Advanced materials (Deerfield Beach, Fla.)Ā·2026
Same author

Rapid and sensitive SYBR Green-based RT-qPCR assays for the detection and quantification of mink coronavirus.

Journal of virological methodsĀ·2026
Same author

Superior spinal accessory nerve function after robotic versus open selective neck dissection in papillary thyroid carcinoma: a retrospective and prospective cohort study.

Journal of robotic surgeryĀ·2026
Same author

SFTS exerts an underrecognized disease burden and socioeconomic effect in East Asia.

Biosafety and healthĀ·2026
Same journal

Haplotype-aware long-read error correction.

Algorithms for molecular biology : AMBĀ·2026
Same journal

Extension of partial atom-to-atom maps: uniqueness and algorithms.

Algorithms for molecular biology : AMBĀ·2026
Same journal

Lossless pangenome indexing using tag arrays.

Algorithms for molecular biology : AMBĀ·2026
Same journal

Dolphyin: a combinatorial algorithm for identifying 1-Dollo phylogenies in cancer.

Algorithms for molecular biology : AMBĀ·2026
Same journal

Probing transcription factor subsets in gene regulatory networks.

Algorithms for molecular biology : AMBĀ·2026
Same journal

Comparing the ability of embedding methods on metabolic hypergraphs for capturing taxonomy-based features.

Algorithms for molecular biology : AMBĀ·2026
See all related articles

Related Experiment Video

Updated: Jun 13, 2026

Using Phylogenetic Analysis to Investigate Eukaryotic Gene Origin
08:57

Using Phylogenetic Analysis to Investigate Eukaryotic Gene Origin

Published on: August 14, 2018

Sequence embedding for fast construction of guide trees for multiple sequence alignment.

Gordon Blackshields1, Fabian Sievers, Weifeng Shi

  • 1UCD Conway Institute of Biomolecular and Biomedical Sciences, University College Dublin, Dublin 4, Ireland. fabian.sievers@ucd.ie.

Algorithms for Molecular Biology : AMB
|May 18, 2010
PubMed
Summary
This summary is machine-generated.

This study introduces efficient embedding methods for clustering large biological sequence datasets. These techniques significantly reduce computational demands for multiple sequence alignment.

More Related Videos

A Practical Guide to Phylogenetics for Nonexperts
12:00

A Practical Guide to Phylogenetics for Nonexperts

Published on: February 5, 2014

A Bioinformatics Pipeline for Investigating Molecular Evolution and Gene Expression using RNA-seq
07:09

A Bioinformatics Pipeline for Investigating Molecular Evolution and Gene Expression using RNA-seq

Published on: May 28, 2021

Related Experiment Videos

Last Updated: Jun 13, 2026

Using Phylogenetic Analysis to Investigate Eukaryotic Gene Origin
08:57

Using Phylogenetic Analysis to Investigate Eukaryotic Gene Origin

Published on: August 14, 2018

A Practical Guide to Phylogenetics for Nonexperts
12:00

A Practical Guide to Phylogenetics for Nonexperts

Published on: February 5, 2014

A Bioinformatics Pipeline for Investigating Molecular Evolution and Gene Expression using RNA-seq
07:09

A Bioinformatics Pipeline for Investigating Molecular Evolution and Gene Expression using RNA-seq

Published on: May 28, 2021

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • Multiple sequence alignment (MSA) relies on initial sequence clustering.
  • Traditional clustering methods require N2 computation for N sequences, becoming prohibitive for large datasets (>10,000).
  • High computational and memory costs limit large-scale MSA.

Purpose of the Study:

  • To develop and test novel embedding methods for efficient clustering of large sequence sets.
  • To overcome the computational barriers of traditional pairwise distance calculations in sequence clustering.

Main Methods:

  • Utilized embedding methods to approximate sequence similarities in a reduced space.
  • Avoided the need for computing a full pairwise distance matrix for all sequences.

Main Results:

  • Demonstrated significant reductions in computation time and memory requirements for large-scale sequence clustering.
  • Validated the quality of generated clusters by using them as guide trees for multiple sequence alignment.

Conclusions:

  • The proposed embedding approach offers a scalable solution for clustering massive sequence datasets.
  • This method facilitates large-scale multiple sequence alignments by providing efficient and accurate clustering.
  • Source code is available for download.