Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Videos

HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed

Shixiang Wan¹, Quan Zou^1,2

¹School of Computer Science and Technology, Tianjin University, Tianjin, China.

Algorithms for Molecular Biology : AMB

|October 14, 2017

Summary

This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

NeuroCL: A deep learning approach for identifying neuropeptides based on contrastive learning.

Analytical biochemistry·2025

Same author

Classification of Acid and Alkaline Enzymes Based on Normalized Van der Waals Volume Features.

Proteomics. Clinical applications·2025

Same author

GCNLA: Inferring Cell-Cell Interactions From Spatial Transcriptomics With Long Short-Term Memory and Graph Convolutional Networks.

IEEE journal of biomedical and health informatics·2025

Same author

Benchmarking of methods that identify alternative polyadenylation events in single-/multiple-polyadenylation site genes.

NAR genomics and bioinformatics·2025

Same author

Interpretable multi-instance heterogeneous graph network learning modelling CircRNA-drug sensitivity association prediction.

BMC biology·2025

Same author

Identifying the DNA methylation preference of transcription factors using ProtBERT and SVM.

PLoS computational biology·2025

Same journal

Haplotype-aware long-read error correction.

Algorithms for molecular biology : AMB·2026

Same journal

Extension of partial atom-to-atom maps: uniqueness and algorithms.

Algorithms for molecular biology : AMB·2026

Same journal

Lossless pangenome indexing using tag arrays.

Algorithms for molecular biology : AMB·2026

Same journal

Dolphyin: a combinatorial algorithm for identifying 1-Dollo phylogenies in cancer.

Algorithms for molecular biology : AMB·2026

Same journal

Probing transcription factor subsets in gene regulatory networks.

Algorithms for molecular biology : AMB·2026

Same journal

Comparing the ability of embedding methods on metabolic hypergraphs for capturing taxonomy-based features.

Algorithms for molecular biology : AMB·2026

See all related articles

HAlign-II accelerates ultra-large multiple sequence alignment and phylogenetic tree construction using distributed computing. This efficient tool saves time and space for large biological datasets, outperforming existing software.

Area of Science:

Bioinformatics
Computational Biology
Genomics

Background:

Multiple sequence alignment (MSA) is vital for biological sequence analysis and phylogenetic tree construction.
The rapid growth of next-generation sequencing data necessitates efficient methods for handling ultra-large biological sequence files.
Current alignment approaches struggle with the scale and diversity of modern biological sequence data.

Purpose of the Study:

To develop a cost-efficient and time-efficient tool for ultra-large multiple biological sequence alignment and phylogenetic tree construction.
To address the limitations of existing software in handling massive biological sequence datasets.

Main Methods:

Implementation of HAlign-II, a tool based on the HAlign and Spark distributed computing system.

Keywords:

Distributed computing Multiple sequence alignment Phylogenetic trees Spark

Related Experiment Videos

Utilizing distributed and parallel computing techniques to accelerate sequence analysis.

Testing HAlign-II on large-scale DNA and protein datasets exceeding 1 GB.

Main Results:

HAlign-II demonstrated significant savings in time and space for ultra-large datasets.
The tool outperformed existing software in efficiency and performance.
HAlign-II exhibits high memory efficiency and excellent scalability with increased computing resources.
Successful execution of MSA and phylogenetic tree construction for ultra-large sequence sets.

Conclusions:

HAlign-II provides a user-friendly web server built upon a distributed computing infrastructure.
The open-source codes and datasets for HAlign-II are publicly available for research use.