Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed

Shixiang Wan1, Quan Zou1,2

  • 1School of Computer Science and Technology, Tianjin University, Tianjin, China.

Algorithms for Molecular Biology : AMB
|October 14, 2017
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

NeuroCL: A deep learning approach for identifying neuropeptides based on contrastive learning.

Analytical biochemistry·2025
Same author

Classification of Acid and Alkaline Enzymes Based on Normalized Van der Waals Volume Features.

Proteomics. Clinical applications·2025
Same author

GCNLA: Inferring Cell-Cell Interactions From Spatial Transcriptomics With Long Short-Term Memory and Graph Convolutional Networks.

IEEE journal of biomedical and health informatics·2025
Same author

Benchmarking of methods that identify alternative polyadenylation events in single-/multiple-polyadenylation site genes.

NAR genomics and bioinformatics·2025
Same author

Interpretable multi-instance heterogeneous graph network learning modelling CircRNA-drug sensitivity association prediction.

BMC biology·2025
Same author

Identifying the DNA methylation preference of transcription factors using ProtBERT and SVM.

PLoS computational biology·2025
Same journal

Haplotype-aware long-read error correction.

Algorithms for molecular biology : AMB·2026
Same journal

Extension of partial atom-to-atom maps: uniqueness and algorithms.

Algorithms for molecular biology : AMB·2026
Same journal

Lossless pangenome indexing using tag arrays.

Algorithms for molecular biology : AMB·2026
Same journal

Dolphyin: a combinatorial algorithm for identifying 1-Dollo phylogenies in cancer.

Algorithms for molecular biology : AMB·2026
Same journal

Probing transcription factor subsets in gene regulatory networks.

Algorithms for molecular biology : AMB·2026
Same journal

Comparing the ability of embedding methods on metabolic hypergraphs for capturing taxonomy-based features.

Algorithms for molecular biology : AMB·2026
See all related articles

HAlign-II accelerates ultra-large multiple sequence alignment and phylogenetic tree construction using distributed computing. This efficient tool saves time and space for large biological datasets, outperforming existing software.

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • Multiple sequence alignment (MSA) is vital for biological sequence analysis and phylogenetic tree construction.
  • The rapid growth of next-generation sequencing data necessitates efficient methods for handling ultra-large biological sequence files.
  • Current alignment approaches struggle with the scale and diversity of modern biological sequence data.

Purpose of the Study:

  • To develop a cost-efficient and time-efficient tool for ultra-large multiple biological sequence alignment and phylogenetic tree construction.
  • To address the limitations of existing software in handling massive biological sequence datasets.

Main Methods:

  • Implementation of HAlign-II, a tool based on the HAlign and Spark distributed computing system.
Keywords:
Distributed computingMultiple sequence alignmentPhylogenetic treesSpark

Related Experiment Videos

  • Utilizing distributed and parallel computing techniques to accelerate sequence analysis.
  • Testing HAlign-II on large-scale DNA and protein datasets exceeding 1 GB.
  • Main Results:

    • HAlign-II demonstrated significant savings in time and space for ultra-large datasets.
    • The tool outperformed existing software in efficiency and performance.
    • HAlign-II exhibits high memory efficiency and excellent scalability with increased computing resources.
    • Successful execution of MSA and phylogenetic tree construction for ultra-large sequence sets.

    Conclusions:

    • HAlign-II provides a user-friendly web server built upon a distributed computing infrastructure.
    • The open-source codes and datasets for HAlign-II are publicly available for research use.