Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Ribosome Profiling02:24

Ribosome Profiling

3.6K
Ribosome profiling or ribo-sequencing is a deep sequencing technique that produces a snapshot of active translation in a cell. It selectively sequences the mRNAs protected by ribosomes to get an insight into a cell’s translation landscape at any given point in time.
Applications of ribosome profiling
Ribosome profiling has many applications, including in vivo monitoring of translation inside a particular organ or tissue type and quantifying new protein synthesis levels.
The technique...
3.6K
Protein Networks02:26

Protein Networks

4.0K
An organism can have thousands of different proteins, and these proteins must cooperate to ensure the health of an organism. Proteins bind to other proteins and form complexes to carry out their functions. Many proteins interact with multiple other proteins creating a complex network of protein interactions.
These interactions can be represented through maps depicting protein-protein interaction networks, represented as nodes and edges. Nodes are circles that are representative of a protein,...
4.0K
Conservation of Protein Domains Over Different Proteins02:26

Conservation of Protein Domains Over Different Proteins

11.0K
Protein domains are small structurally independent units that are part of a single amino acid chain.  Although these domains are often structurally independent, they may rely on synergistic effects to perform their functions as part of a larger protein. Protein domains may be conserved within the same organism, as well as across different organisms.
A limited set of protein domains often duplicate and recombine during evolution. These domains can be organized in different combinations to...
11.0K
Improving Translational Accuracy02:07

Improving Translational Accuracy

11.7K
Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
11.7K
Signal Sequences and Sorting Receptors01:41

Signal Sequences and Sorting Receptors

5.5K
Signal sequences are short amino acid sequences that guide newly synthesized proteins to their proper location within the cell. Classical signal sequences are fifteen to sixty amino acids long and present at the N-terminus of a polypeptide chain. Each signal sequence has a conserved segment of basic residues towards their N terminus, a hydrophobic core, and a C-terminus rich in polar residues. The C-terminus also contains a signal cleavage site and features a -3 -1 sequence motif. The -3-1...
5.5K
Protein Families02:47

Protein Families

15.5K
Protein families are groups of homologous proteins; that is, they have similarities in amino acid sequences and three-dimensional structures. Protein families usually occur because of gene duplication, where an additional copy of a gene is inserted into the genome of an organism.   Mutations that change the amino acids but still allow the protein to be properly synthesized, will lead to new protein family members.   If these new proteins contain similar amino acids in key...
15.5K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

How to think about designing smart antibodies in the age of genAI: integrating biology, technology, and experience.

mAbs·2025
Same author

Recent advances in generative biology for biotherapeutic discovery.

Trends in pharmacological sciences·2024
Same author

Development of in silico models to predict viscosity and mouse clearance using a comprehensive analytical data set collected on 83 scaffold-consistent monoclonal antibodies.

mAbs·2023
Same author

Building the foundation for a community-generated national research blueprint for inherited bleeding disorders: facilitating research through infrastructure, workforce, resources and funding.

Expert review of hematology·2023
Same author

Unifying cardiovascular modelling with deep reinforcement learning for uncertainty aware control of sepsis treatment.

PLOS digital health·2023
Same author

Asynchronous parallel Bayesian optimization for AI-driven cloud laboratories.

Bioinformatics (Oxford, England)·2021
Same journal

Engineered HSP90-MP65 Bivalent Fusion Antigen: A Novel Vaccine Candidate Against Invasive Candidiasis.

Proteins·2026
Same journal

Physics-Based Energy Functions for Computational Protein Design.

Proteins·2026
Same journal

Impact of Stabilizing Osmolytes on the Conformational Dynamics of Human and Rat Islet Amyloid Polypeptides.

Proteins·2026
Same journal

Stabilization of Bone Morphogenetic Protein-2 at Physiological pH: Contrasting Roles of CHAPS and Arginine in Aggregation Inhibition.

Proteins·2026
Same journal

Structural Insights Into the Function of Leishmania major Adenylosuccinate Lyase.

Proteins·2026
Same journal

Generalizing the Gaussian Network Model: Spanning-Tree Thermodynamics Shows Entropy-Driven KRAS Activation.

Proteins·2026
See all related articles

Related Experiment Video

Updated: Jul 26, 2025

Application of I TASSER, trRosetta, UCSF Chimera, HADDOCK server, and HEX loria for De Novo and In Silico Design of Proteins
05:08

Application of I TASSER, trRosetta, UCSF Chimera, HADDOCK server, and HEX loria for De Novo and In Silico Design of Proteins

Published on: July 8, 2025

143

Identifying promising sequences for protein engineering using a deep transformer protein language model.

Trevor S Frisby1, Christopher James Langmead1

  • 1Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA.

Proteins
|June 20, 2023
PubMed
Summary
This summary is machine-generated.

This study introduces a novel Promise Score, derived from deep transformer protein language models, to efficiently identify promising protein sequences for engineering. The score aids in discovering novel nanobodies and optimizing existing proteins by predicting binding interactions.

Keywords:
attentionfine-tuningprotein designprotein engineeringprotein language modeltransfer learningtransformer

More Related Videos

A Protocol for Computer-Based Protein Structure and Function Prediction
16:41

A Protocol for Computer-Based Protein Structure and Function Prediction

Published on: November 3, 2011

68.8K
An Integrated Approach for Microprotein Identification and Sequence Analysis
09:37

An Integrated Approach for Microprotein Identification and Sequence Analysis

Published on: July 12, 2022

3.5K

Related Experiment Videos

Last Updated: Jul 26, 2025

Application of I TASSER, trRosetta, UCSF Chimera, HADDOCK server, and HEX loria for De Novo and In Silico Design of Proteins
05:08

Application of I TASSER, trRosetta, UCSF Chimera, HADDOCK server, and HEX loria for De Novo and In Silico Design of Proteins

Published on: July 8, 2025

143
A Protocol for Computer-Based Protein Structure and Function Prediction
16:41

A Protocol for Computer-Based Protein Structure and Function Prediction

Published on: November 3, 2011

68.8K
An Integrated Approach for Microprotein Identification and Sequence Analysis
09:37

An Integrated Approach for Microprotein Identification and Sequence Analysis

Published on: July 12, 2022

3.5K

Area of Science:

  • Computational Biology
  • Protein Engineering
  • Artificial Intelligence in Biochemistry

Background:

  • Discovering novel protein sequences with desired properties is challenging due to the vast sequence space.
  • Identifying promising sequences for applications like nanobody discovery and protein optimization is often costly and time-consuming.

Purpose of the Study:

  • To develop and validate a deep learning-based method for efficiently identifying high-potential protein sequences.
  • To introduce a 'Promise Score' to guide protein engineering efforts in sequence discovery and optimization.

Main Methods:

  • Utilized a deep transformer protein language model and its self-attention map to calculate a Promise Score.
  • Applied the Promise Score to nanobody (Nb) discovery and protein optimization workflows.
  • Analyzed self-attention maps to identify key protein regions involved in intermolecular interactions.
  • Explored fine-tuning strategies for the protein language model for predictive property modeling.

Main Results:

  • The Promise Score effectively selects promising lead sequences from nanobody repertoires.
  • The Promise Score guides site-specific mutagenesis experiments, identifying a high percentage of improved protein sequences.
  • Self-attention maps provide insights into protein regions driving specific interactions.
  • Demonstrated the utility of fine-tuning protein language models for targeted protein engineering tasks.

Conclusions:

  • The Promise Score offers a computationally efficient approach to accelerate protein engineering.
  • Deep learning models, particularly transformer-based language models, are powerful tools for predicting and designing functional proteins.
  • Understanding protein interactions through self-attention maps enhances the design process.