Large-scale structure-informed multiple sequence alignment of proteins with SIMSApiper
- Charlotte Crauwels 1,2,3, Sophie-Luise Heidig 1,3,4, Adrián Díaz 1,2,3, Wim F Vranken 1,2,3
- Charlotte Crauwels 1,2,3, Sophie-Luise Heidig 1,3,4, Adrián Díaz 1,2,3
- 1Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, 1050, Belgium.
- 2Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, 1050, Belgium.
- 3AI Lab, Vrije Universiteit Brussel, Brussels, 1050, Belgium.
- 4Evolutionary Biology & Ecology, Université libre de Bruxelles, Brussels, 1050, Belgium.
- 0Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, 1050, Belgium.
Related Experiment Videos
Contact us if these videos are not relevant.
Contact us if these videos are not relevant.
View abstract on PubMed
Summary
This summary is machine-generated.SIMSApiper is a novel Nextflow pipeline for creating reliable, structure-informed multiple sequence alignments (MSAs) of thousands of protein sequences. It significantly speeds up alignment by using structural information and parallelization, reducing gaps with conserved secondary structures.
Area Of Science
- Bioinformatics
- Computational Biology
- Structural Biology
Background
- Multiple sequence alignment (MSA) is crucial for understanding protein function and evolution.
- Existing structure-based alignment methods can be computationally intensive and slow for large datasets.
- Integrating structural information can improve MSA accuracy and reliability.
Purpose Of The Study
- To develop a fast and reliable pipeline for structure-informed multiple sequence alignment.
- To enable the alignment of thousands of protein sequences efficiently.
- To reduce the number of gaps in MSAs by leveraging structural data.
Main Methods
- Developed SIMSApiper, a Nextflow pipeline utilizing Python3 and Bash.
- Incorporated user-provided or automatically retrieved structural information.
- Implemented parallelization strategies based on sequence identity subsets.
- Utilized conserved secondary structure elements to minimize gaps.
Main Results
- SIMSApiper generates reliable, structure-informed MSAs.
- The pipeline significantly outperforms standard structure-based alignment methods in speed.
- Achieved substantial speed-up through parallelization techniques.
- Reduced the number of gaps in alignments by effectively using secondary structure information.
Conclusions
- SIMSApiper offers a highly efficient and accurate solution for large-scale protein sequence alignment.
- The pipeline's ability to integrate structural data enhances MSA quality.
- Its speed and reliability make it a valuable tool for bioinformatics research.
Related Experiment Videos
Contact us if these videos are not relevant.
Contact us if these videos are not relevant.

