Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Protein Organization01:24

Protein Organization

7.9K
Proteins are polymers of amino acid residues. They are versatile and responsible for different cellular functions, including DNA replication, molecular transport, catalysis, and structural support. Proteins have a hierarchical structure comprising at least three levels of organization: primary, secondary, and tertiary structure. Some large proteins have a quaternary structure where individual protein subunits are linked together.
The primary structure of a protein is its amino acid sequence....
7.9K
Conservation of Protein Domains Over Different Proteins02:26

Conservation of Protein Domains Over Different Proteins

13.1K
Protein domains are small structurally independent units that are part of a single amino acid chain.  Although these domains are often structurally independent, they may rely on synergistic effects to perform their functions as part of a larger protein. Protein domains may be conserved within the same organism, as well as across different organisms.
A limited set of protein domains often duplicate and recombine during evolution. These domains can be organized in different combinations to...
13.1K
Protein Folding01:22

Protein Folding

123.2K
Overview
123.2K
Protein and Protein Structure02:15

Protein and Protein Structure

82.8K
Proteins are one of the most abundant organic molecules in living systems and have the most diverse range of functions of all macromolecules. Proteins may be structural, regulatory, contractile, or protective. They may serve in transport, storage, or membranes; or they may be toxins or enzymes. Their structures, like their functions, vary greatly. They are all, however, amino acid polymers arranged in a linear sequence.
A protein's shape is critical to its function. For example, an enzyme...
82.8K
Conservation of Protein Domains02:26

Conservation of Protein Domains

3.3K
3.3K
Protein Folding Quality Check in the RER01:29

Protein Folding Quality Check in the RER

4.1K
ER is the primary site for the maturation and folding of soluble and transmembrane secretory proteins. The calnexin cycle is a specific chaperone system that folds and assesses the confirmation of N-glycosylated proteins before they can exit the ER lumen. The primary players of this quality check pipeline are the lectins, ER-resident chaperones, and a glucosyl transferase enzyme. In case the calnexin system in the lumen fails to salvage a misfolded protein, it is transported to the cytoplasm...
4.1K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

TikTok is a valuable data source for tracking the opioid crisis.

NPJ digital medicine·2026
Same author

Drug-Target Interaction Prediction with PIGLET.

bioRxiv : the preprint server for biology·2026
Same author

GATSBI: Improving context-aware protein embeddings through biologically motivated data splits.

bioRxiv : the preprint server for biology·2026
Same author

Biological data governance in an age of AI.

Science (New York, N.Y.)·2026
Same author

The Human Omnibus of Targetable Pockets.

Journal of cheminformatics·2025
Same author

Publisher Correction: CRISPR-GPT for agentic automation of gene-editing experiments.

Nature biomedical engineering·2025
Same journal

Trust, Reproducibility, and Progress: The Roles of Independent Blind Prediction and Assessment and Benchmarking in Computational Biology.

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing·2026
Same journal

The Evolving Cyberinfrastructure at the National Institutes of Health to Support Data and AI in Biomedical Research.

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing·2026
Same journal

Applications of AI & ML in Biomanufacturing of Cell and Gene Therapies.

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing·2026
Same journal

AI for Health: Leveraging Artificial Intelligence to Revolutionize Healthcare.

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing·2026
Same journal

Workshop Introduction: Advances of AI Methods in Single Cell Spatial Omics.

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing·2026
Same journal

DRIVE-KG: Enhancing variant-phenotype association discovery in understudied complex diseases using heterogeneous knowledge graphs.

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing·2026
See all related articles

Related Experiment Video

Updated: Oct 10, 2025

A Protocol for Computer-Based Protein Structure and Function Prediction
16:41

A Protocol for Computer-Based Protein Structure and Function Prediction

Published on: November 3, 2011

69.1K

Training data composition affects performance of protein structure analysis algorithms.

Alexander Derry1, Kristy A Carpenter, Russ B Altman

  • 1Biomedical Informatics Training Program, Stanford University, Stanford, CA 94305, USA.

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
|December 10, 2021
PubMed
Summary
This summary is machine-generated.

Machine learning models for protein structure prediction perform best when trained on diverse data. Including X-ray crystallography, NMR, and cryo-EM data improves accuracy across tasks without harming performance on X-ray data alone.

More Related Videos

Application of I TASSER, trRosetta, UCSF Chimera, HADDOCK server, and HEX loria for De Novo and In Silico Design of Proteins
05:08

Application of I TASSER, trRosetta, UCSF Chimera, HADDOCK server, and HEX loria for De Novo and In Silico Design of Proteins

Published on: July 8, 2025

445
Protein WISDOM: A Workbench for In silico De novo Design of BioMolecules
10:58

Protein WISDOM: A Workbench for In silico De novo Design of BioMolecules

Published on: July 25, 2013

17.2K

Related Experiment Videos

Last Updated: Oct 10, 2025

A Protocol for Computer-Based Protein Structure and Function Prediction
16:41

A Protocol for Computer-Based Protein Structure and Function Prediction

Published on: November 3, 2011

69.1K
Application of I TASSER, trRosetta, UCSF Chimera, HADDOCK server, and HEX loria for De Novo and In Silico Design of Proteins
05:08

Application of I TASSER, trRosetta, UCSF Chimera, HADDOCK server, and HEX loria for De Novo and In Silico Design of Proteins

Published on: July 8, 2025

445
Protein WISDOM: A Workbench for In silico De novo Design of BioMolecules
10:58

Protein WISDOM: A Workbench for In silico De novo Design of BioMolecules

Published on: July 25, 2013

17.2K

Area of Science:

  • Structural biology
  • Computational biology
  • Biochemistry

Background:

  • Accurate protein three-dimensional structures are vital for understanding molecular mechanisms and interactions.
  • Machine learning models for protein structure prediction are crucial for protein engineering and drug development.
  • Training data quality significantly impacts the performance of these machine learning models.

Purpose of the Study:

  • To evaluate the impact of different experimental methods (X-ray crystallography, NMR, cryo-EM) on machine learning model training data bias.
  • To assess how this bias affects model performance in tasks like accuracy estimation, protein sequence design, and catalytic residue prediction.
  • To determine optimal strategies for composing training datasets to mitigate bias and enhance model generalizability.

Main Methods:

  • Trained machine learning models on datasets comprising either all three structure types (X-ray, NMR, cryo-EM) or only X-ray data.
  • Evaluated model performance on test sets derived from each experimental method.
  • Analyzed the relationship between model performance and the biophysical properties of each structure determination method.

Main Results:

  • Models trained solely on X-ray data performed worse on NMR and cryo-EM test sets.
  • Including NMR and cryo-EM structures in training datasets mitigated performance disparities across structure types.
  • Training on all three structure types did not degrade, and sometimes improved, performance on X-ray test sets.

Conclusions:

  • The choice of experimental methods for protein structure determination introduces bias into machine learning training data.
  • Diverse training datasets incorporating X-ray, NMR, and cryo-EM data enhance model robustness and generalizability.
  • Consideration of the biophysical properties and the specific biochemical task is recommended when curating training sets for protein structure prediction models.