Pairing interacting protein sequences using masked language modeling
View abstract on PubMed
Summary
This summary is machine-generated.We developed Differentiable Pairing using Alignment-based Language Models (DiffPALM) to predict interacting protein sequences. DiffPALM leverages protein language models and outperforms existing methods, improving protein complex structure prediction.
Area Of Science
- Computational biology
- Protein bioinformatics
- Machine learning in structural biology
Background
- Predicting protein-protein interactions from amino acid sequences is crucial for understanding biological functions.
- Existing methods often struggle with shallow multiple sequence alignments (MSAs).
- Accurate pairing of interacting protein sequences is essential for predicting protein complex structures.
Purpose Of The Study
- To develop a novel method for predicting interacting protein sequences using protein language models.
- To formulate the protein pairing problem in a differentiable manner.
- To improve the accuracy of protein complex structure prediction.
Main Methods
- Developed Differentiable Pairing using Alignment-based Language Models (DiffPALM).
- Leveraged MSA Transformer, a protein language model trained on MSAs, to predict interacting partners.
- Exploited MSA Transformer's ability to infer coevolutionary signals for inter-chain interactions.
- Formulated the pairing of paralogs from two protein families in a differentiable way.
Main Results
- DiffPALM outperforms existing coevolution-based pairing methods on challenging benchmarks with shallow MSAs.
- DiffPALM surpasses methods using single-sequence trained protein language models.
- Using DiffPALM-paired sequences significantly enhances the structure prediction of eukaryotic protein complexes with AlphaFold-Multimer.
- Achieved competitive performance compared to orthology-based pairing.
Conclusions
- DiffPALM offers a powerful new approach for predicting interacting protein sequences, particularly from limited MSA data.
- The method demonstrates the utility of MSA-based language models for capturing inter-chain coevolution.
- DiffPALM-generated sequence pairs are valuable inputs for deep learning-based protein structure prediction tools.
Related Concept Videos
An organism can have thousands of different proteins, and these proteins must cooperate to ensure the health of an organism. Proteins bind to other proteins and form complexes to carry out their functions. Many proteins interact with multiple other proteins creating a complex network of protein interactions.
These interactions can be represented through maps depicting protein-protein interaction networks, represented as nodes and edges. Nodes are circles that are representative of a protein,...
Many proteins form complexes to carry out their functions, making protein-protein interactions (PPIs) essential for an organism's survival. Most PPIs are stabilized by numerous weak noncovalent chemical forces. The physical shape of the interfaces determines the way two proteins interact. Many globular proteins have closely-matching shapes on their surfaces, which form a large number of weak bonds. Additionally, many PPIs occur between two helices or between a surface cleft and a...

