Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Multi-species Conserved Sequences02:51

Multi-species Conserved Sequences

4.3K
Next-generation sequencing technologies have created large genomic databases of a variety of animals and plants. Ever since the human genome project was completed, scientists studied the genome of primates, mammals, and other phylogenetically distant living beings. Such large-scale  studies have provided new insights into the evolutionary relationship between organisms.
Although the genome of each species varies greatly from each other, a few sequences are highly conserved. Such conserved...
4.3K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

ClairS: a deep-learning method for long-read tumor-normal pair somatic small variant calling.

Nature methods·2026
Same author

Correction: Protein domain-specific genotype-phenotype correlation study of neurofibromatosis type 1.

Scientific reports·2026
Same author

Protein domain-specific genotype-phenotype correlation study of neurofibromatosis type 1.

Scientific reports·2025
Same author

Primary prevention cardiovascular disease risk prediction model for contemporary Chinese (1°P-CARDIAC): Model derivation and validation using a hybrid statistical and machine-learning approach.

PloS one·2025
Same author

AutoPM3: enhancing variant interpretation via LLM-driven PM3 evidence extraction from scientific literature.

Bioinformatics (Oxford, England)·2025
Same author

Repun: an accurate small variant representation unification method for multiple sequencing platforms.

Briefings in bioinformatics·2024
Same journal

circ2DGNN: circRNA-Disease Association Prediction via Transformer-Based Graph Neural Network.

IEEE/ACM transactions on computational biology and bioinformatics·2024
Same journal

Hierarchical Hypergraph Learning in Association- Weighted Heterogeneous Network for miRNA- Disease Association Identification.

IEEE/ACM transactions on computational biology and bioinformatics·2024
Same journal

Discriminative Domain Adaption Network for Simultaneously Removing Batch Effects and Annotating Cell Types in Single-Cell RNA-Seq.

IEEE/ACM transactions on computational biology and bioinformatics·2024
Same journal

MLW-BFECF: A Multi-Weighted Dynamic Cascade Forest Based on Bilinear Feature Extraction for Predicting the Stage of Kidney Renal Clear Cell Carcinoma on Multi-Modal Gene Data.

IEEE/ACM transactions on computational biology and bioinformatics·2024
Same journal

An End-to-End Knowledge Graph Fused Graph Neural Network for Accurate Protein-Protein Interactions Prediction.

IEEE/ACM transactions on computational biology and bioinformatics·2024
Same journal

Generative Biomedical Event Extraction With Constrained Decoding Strategy.

IEEE/ACM transactions on computational biology and bioinformatics·2024
See all related articles

Related Experiment Video

Updated: Oct 4, 2025

A Practical Guide to Phylogenetics for Nonexperts
12:00

A Practical Guide to Phylogenetics for Nonexperts

Published on: February 5, 2014

35.5K

MLProbs: A Data-Centric Pipeline for Better Multiple Sequence Alignment.

Mengmeng Kuang, Yong Zhang, Tak-Wah Lam

    IEEE/ACM Transactions on Computational Biology and Bioinformatics
    |February 4, 2022
    PubMed
    Summary
    This summary is machine-generated.

    This study introduces MLProbs, a novel data-centric pipeline for Multiple Sequence Alignment (MSA). MLProbs utilizes machine learning to outperform existing tools, especially for low-similarity protein families.

    More Related Videos

    An Integrated Approach for Microprotein Identification and Sequence Analysis
    09:37

    An Integrated Approach for Microprotein Identification and Sequence Analysis

    Published on: July 12, 2022

    3.6K
    A Concoction Pipeline for Generating Molecular Operational Taxonomic Units (MOTUs) Among Riparian and Aquatic Beetles
    10:23

    A Concoction Pipeline for Generating Molecular Operational Taxonomic Units (MOTUs) Among Riparian and Aquatic Beetles

    Published on: July 11, 2025

    271

    Related Experiment Videos

    Last Updated: Oct 4, 2025

    A Practical Guide to Phylogenetics for Nonexperts
    12:00

    A Practical Guide to Phylogenetics for Nonexperts

    Published on: February 5, 2014

    35.5K
    An Integrated Approach for Microprotein Identification and Sequence Analysis
    09:37

    An Integrated Approach for Microprotein Identification and Sequence Analysis

    Published on: July 12, 2022

    3.6K
    A Concoction Pipeline for Generating Molecular Operational Taxonomic Units (MOTUs) Among Riparian and Aquatic Beetles
    10:23

    A Concoction Pipeline for Generating Molecular Operational Taxonomic Units (MOTUs) Among Riparian and Aquatic Beetles

    Published on: July 11, 2025

    271

    Area of Science:

    • Bioinformatics
    • Computational Biology
    • Machine Learning

    Background:

    • Traditional Multiple Sequence Alignment (MSA) construction relies on algorithm-centric approaches, often reducing the problem to complex combinatorial optimization.
    • These methods may not optimally handle the inherent variability and complexity of biological sequence data.
    • A data-centric approach, leveraging machine learning on benchmark datasets, offers a promising alternative.

    Purpose of the Study:

    • To develop and evaluate a novel data-centric pipeline for Multiple Sequence Alignment (MSA) construction.
    • To demonstrate the efficacy of shallow machine learning models in guiding MSA tool selection and realignment decisions.
    • To improve MSA accuracy, particularly for challenging datasets like low-similarity protein families.

    Main Methods:

    • Developed MLProbs, a new MSA pipeline based on a data-centric approach.
    • Trained shallow machine learning classification models on benchmark data to guide alignment tool choice and realignment.
    • Evaluated MLProbs against 10 popular MSA tools using four benchmark databases (BAliBASE, OXBench, OXBench-X, SABMark).

    Main Results:

    • MLProbs consistently achieved the highest TC score across benchmark databases.
    • Demonstrated significant improvement for protein families with low similarity (≤ 50%), outperforming top competitors by over 1.8%.
    • MLProbs exhibited superior performance in real-life applications, including phylogenetic tree construction and protein secondary structure prediction.

    Conclusions:

    • The data-centric approach, powered by shallow machine learning, offers a robust and effective strategy for Multiple Sequence Alignment.
    • MLProbs provides a significant advancement in MSA accuracy, especially for evolutionarily distant sequences.
    • Future research could explore deep learning methods to further enhance MSA construction capabilities.