Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Protein-protein Interfaces02:04

Protein-protein Interfaces

14.5K
Many proteins form complexes to carry out their functions, making protein-protein interactions (PPIs) essential for an organism's survival. Most PPIs are stabilized by numerous weak noncovalent chemical forces. The physical shape of the interfaces determines the way two proteins interact. Many globular proteins have closely-matching shapes on their surfaces, which form a large number of weak bonds. Additionally, many PPIs occur between two helices or between a surface cleft and a...
14.5K
Genome Annotation and Assembly03:36

Genome Annotation and Assembly

20.5K
The genome refers to all of the genetic material in an organism. It can range from a few million base pairs in microbial cells to several billion base pairs in many eukaryotic organisms. Genome assembly refers to the process of taking the DNA sequencing data and putting it all back together in a correct order to create a close representation of the original genome. This is followed by the identification of functional elements on the newly assembled genome, a process called genome annotation.
20.5K
Protein Networks02:26

Protein Networks

4.5K
An organism can have thousands of different proteins, and these proteins must cooperate to ensure the health of an organism. Proteins bind to other proteins and form complexes to carry out their functions. Many proteins interact with multiple other proteins creating a complex network of protein interactions.
These interactions can be represented through maps depicting protein-protein interaction networks, represented as nodes and edges. Nodes are circles that are representative of a protein,...
4.5K
Protein Families02:47

Protein Families

16.7K
Protein families are groups of homologous proteins; that is, they have similarities in amino acid sequences and three-dimensional structures. Protein families usually occur because of gene duplication, where an additional copy of a gene is inserted into the genome of an organism.   Mutations that change the amino acids but still allow the protein to be properly synthesized, will lead to new protein family members.   If these new proteins contain similar amino acids in key...
16.7K
Conserved Binding Sites01:49

Conserved Binding Sites

5.0K
Many proteins’ biological role depends on their interactions with their ligands, small molecules that bind to specific locations on the protein known as ligand-binding sites. Ligand-binding sites are often conserved among homologous proteins as these sites are critical for protein function.
Binding sites are often located in large pockets, and if their location on a protein’s surface is unknown, it can be predicted using various approaches. The energetic method computationally...
5.0K
Conservation of Protein Domains Over Different Proteins02:26

Conservation of Protein Domains Over Different Proteins

14.1K
Protein domains are small structurally independent units that are part of a single amino acid chain.  Although these domains are often structurally independent, they may rely on synergistic effects to perform their functions as part of a larger protein. Protein domains may be conserved within the same organism, as well as across different organisms.
A limited set of protein domains often duplicate and recombine during evolution. These domains can be organized in different combinations to...
14.1K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

A Wireless Photoplethysmography Chest Patch for Continuous Vital Sign Monitoring: A Clinical Validation Study in Intensive Care Patients.

Acta anaesthesiologica Scandinavica·2026
Same author

Wearable and wireless continuous monitoring for early detection of clinical deterioration in high-risk inpatients: a scoping review.

Intensive & critical care nursing·2026
Same author

Artificial intelligence for precision medicine.

Therapie·2025
Same author

Correction: Development and validation of a machine learning model for early prediction of intensive care unit acquired weakness.

Intensive care medicine experimental·2025
Same author

Development and validation of a machine learning model for early prediction of intensive care unit acquired weakness.

Intensive care medicine experimental·2025
Same author

Association Between Thrombus Composition and Etiology in Patients With Acute Ischemic Stroke Treated by Thrombectomy.

Stroke·2025
Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026
Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026
Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026
Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026
Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026
Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026
See all related articles

Related Experiment Video

Updated: Jan 19, 2026

A Protocol for Computer-Based Protein Structure and Function Prediction
16:41

A Protocol for Computer-Based Protein Structure and Function Prediction

Published on: November 3, 2011

69.7K

Machine learning for discovering missing or wrong protein function annotations : A comparison using updated benchmark

Felipe Kenji Nakano1,2, Mathias Lietaert3, Celine Vens4,5

  • 1KU Leuven, Campus KULAK - Department of Public Health and Primary Care, Etienne Sabbelaan 53, Kortrijk, 8500, Belgium. felipekenji.nakano@kuleuven.be.

BMC Bioinformatics
|September 25, 2019
PubMed
Summary
This summary is machine-generated.

Machine learning models for protein function prediction are improved using updated datasets. The Clus-Ensemble method shows superior performance in discovering new protein annotations, highlighting the need for current data in bioinformatics research.

Keywords:
Benchmark datasetsHierarchical multi-label classificationProtein function prediction

More Related Videos

An Integrated Approach for Microprotein Identification and Sequence Analysis
09:37

An Integrated Approach for Microprotein Identification and Sequence Analysis

Published on: July 12, 2022

3.9K
Navigating the Mass Spectrometry-Based Proteomic Data Using Free Computational Tools
07:01

Navigating the Mass Spectrometry-Based Proteomic Data Using Free Computational Tools

Published on: August 19, 2025

875

Related Experiment Videos

Last Updated: Jan 19, 2026

A Protocol for Computer-Based Protein Structure and Function Prediction
16:41

A Protocol for Computer-Based Protein Structure and Function Prediction

Published on: November 3, 2011

69.7K
An Integrated Approach for Microprotein Identification and Sequence Analysis
09:37

An Integrated Approach for Microprotein Identification and Sequence Analysis

Published on: July 12, 2022

3.9K
Navigating the Mass Spectrometry-Based Proteomic Data Using Free Computational Tools
07:01

Navigating the Mass Spectrometry-Based Proteomic Data Using Free Computational Tools

Published on: August 19, 2025

875

Area of Science:

  • Bioinformatics and Computational Biology
  • Machine Learning in Proteomics
  • Functional Genomics

Background:

  • Massive proteomic data generation necessitates automated protein function annotation.
  • Existing machine learning studies often use outdated datasets (Functional Catalogue [FunCat], Gene Ontology [GO]), limiting predictive accuracy.
  • This study addresses the need for updated benchmark datasets for hierarchical multi-label classification (HMC) in protein function prediction.

Purpose of the Study:

  • To create and provide updated FunCat and GO yeast annotation datasets.
  • To establish baseline performance results for four HMC methods on these new datasets.
  • To evaluate the ability of predictive models to discover novel or incorrect annotations using updated versus old data.

Main Methods:

  • Generated 24 new datasets by querying recent versions of FunCat and GO yeast annotations.
  • Compared the performance of four HMC methods, including Clus-Ensemble (predictive clustering trees) and HMC-GA (genetic algorithms).
  • Trained models on old data and evaluated against recent annotations to assess discovery capabilities.

Main Results:

  • Clus-Ensemble outperformed more recent methods on standard evaluation tasks using the updated datasets.
  • Clus-Ensemble excelled at discovering new FunCat annotations, while HMC-GA was better at detecting removed annotations.
  • Similar trends were observed for GO datasets, though differences between methods were less pronounced for detecting removed annotations.

Conclusions:

  • Protein function prediction remains a challenging area requiring further investigation.
  • The provided updated datasets and baseline results serve as essential guidelines for future HMC research.
  • Old datasets retain value for other machine learning tasks, emphasizing the importance of data versioning.