Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Conservation of Protein Domains Over Different Proteins02:26

Conservation of Protein Domains Over Different Proteins

11.0K
Protein domains are small structurally independent units that are part of a single amino acid chain.  Although these domains are often structurally independent, they may rely on synergistic effects to perform their functions as part of a larger protein. Protein domains may be conserved within the same organism, as well as across different organisms.
A limited set of protein domains often duplicate and recombine during evolution. These domains can be organized in different combinations to...
11.0K
Genome Annotation and Assembly03:36

Genome Annotation and Assembly

19.1K
The genome refers to all of the genetic material in an organism. It can range from a few million base pairs in microbial cells to several billion base pairs in many eukaryotic organisms. Genome assembly refers to the process of taking the DNA sequencing data and putting it all back together in a correct order to create a close representation of the original genome. This is followed by the identification of functional elements on the newly assembled genome, a process called genome annotation.
19.1K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

1.8K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
1.8K
Conservation of Protein Domains02:26

Conservation of Protein Domains

3.2K
3.2K
Genome-wide Association Studies-GWAS01:11

Genome-wide Association Studies-GWAS

13.8K
Genome-wide association studies or GWAS are used to identify whether common SNPs are associated with certain diseases. Suppose specific SNPs are more frequently observed in individuals with a particular disease than those without the disease. In that case, those SNPs are said to be associated with the disease. Chi-square analysis is performed to check the probability of the allele likely to be associated with the disease.
GWAS does not require the identification of the target gene involved in...
13.8K
Protein Folding Quality Check in the RER01:29

Protein Folding Quality Check in the RER

3.8K
ER is the primary site for the maturation and folding of soluble and transmembrane secretory proteins. The calnexin cycle is a specific chaperone system that folds and assesses the confirmation of N-glycosylated proteins before they can exit the ER lumen. The primary players of this quality check pipeline are the lectins, ER-resident chaperones, and a glucosyl transferase enzyme. In case the calnexin system in the lumen fails to salvage a misfolded protein, it is transported to the cytoplasm...
3.8K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

On the state of protein function prediction: a report on the fourth CAFA challenge.

bioRxiv : the preprint server for biology·2026
Same author

How Not to be Seen: Predicting Unseen Enzyme Functions using Contrastive Learning.

bioRxiv : the preprint server for biology·2026
Same author

A PLUM Job: Peptide modeLs for Understanding and engineering antiMicrobial therapeutics.

bioRxiv : the preprint server for biology·2026
Same author

Advances in Protein Function Prediction from the Fifth CAFA Challenge.

bioRxiv : the preprint server for biology·2026
Same author

EpicTope: predicting and validating non-disruptive epitope tagging sites.

Development (Cambridge, England)·2026
Same author

Limitations of current machine learning models in predicting enzymatic functions for uncharacterized proteins.

G3 (Bethesda, Md.)·2025
Same journal

3DICE: Interpretable 3D Cross-Modal Learning for Drug-Target Interaction Prediction and Large-Scale Drug Discovery.

Bioinformatics (Oxford, England)·2026
Same journal

KASSPer: Kinase Active Site Structure Prediction using Protein and Ligand Language Models and Its Application to Virtual Screening.

Bioinformatics (Oxford, England)·2026
Same journal

IDR searcher: a search engine solution for public image resources.

Bioinformatics (Oxford, England)·2026
Same journal

KCFtools: Rapid alignment-free method for introgression screening and GWAS using k-mer profiles.

Bioinformatics (Oxford, England)·2026
Same journal

Meta2DB: Curated shotgun metagenomic feature sets and metadata for health state prediction.

Bioinformatics (Oxford, England)·2026
Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026
See all related articles

Related Experiment Video

Updated: Aug 13, 2025

A Fast and Quantitative Method for Post-translational Modification and Variant Enabled Mapping of Peptides to Genomes
09:10

A Fast and Quantitative Method for Post-translational Modification and Variant Enabled Mapping of Peptides to Genomes

Published on: May 22, 2018

9.3K

GOThresher: a program to remove annotation biases from protein function annotation datasets.

Parnal Joshi1,2, Sagnik Banerjee1,3, Xiao Hu2

  • 1Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA.

Bioinformatics (Oxford, England)
|January 23, 2023
PubMed
Summary
This summary is machine-generated.

GOThresher is a new Python tool that identifies and removes biases in gene function annotation databases. This ensures more accurate protein function prediction for machine learning and a better understanding of the annotation landscape.

More Related Videos

Development of Compendium for Esophageal Squamous Cell Carcinoma
03:36

Development of Compendium for Esophageal Squamous Cell Carcinoma

Published on: April 12, 2024

507
A Clinical Metaproteomics Workflow Implemented within Galaxy Bioinformatics Platform to Analyze Host-Microbiome Interactions Underlying Human Disease
09:52

A Clinical Metaproteomics Workflow Implemented within Galaxy Bioinformatics Platform to Analyze Host-Microbiome Interactions Underlying Human Disease

Published on: January 10, 2025

698

Related Experiment Videos

Last Updated: Aug 13, 2025

A Fast and Quantitative Method for Post-translational Modification and Variant Enabled Mapping of Peptides to Genomes
09:10

A Fast and Quantitative Method for Post-translational Modification and Variant Enabled Mapping of Peptides to Genomes

Published on: May 22, 2018

9.3K
Development of Compendium for Esophageal Squamous Cell Carcinoma
03:36

Development of Compendium for Esophageal Squamous Cell Carcinoma

Published on: April 12, 2024

507
A Clinical Metaproteomics Workflow Implemented within Galaxy Bioinformatics Platform to Analyze Host-Microbiome Interactions Underlying Human Disease
09:52

A Clinical Metaproteomics Workflow Implemented within Galaxy Bioinformatics Platform to Analyze Host-Microbiome Interactions Underlying Human Disease

Published on: January 10, 2025

698

Area of Science:

  • Genomics
  • Bioinformatics
  • Computational Biology

Background:

  • High-throughput sequencing generates vast genomic data, but gene product functions remain largely unknown.
  • Current experimental methods cannot keep pace with the data influx, leading to biased annotations.
  • Biases in Gene Ontology (GO) terms affect protein function understanding and machine learning model training.

Purpose of the Study:

  • To address the challenge of biased protein function annotations.
  • To introduce a computational tool for identifying and removing biases in annotation databases.

Main Methods:

  • Development of GOThresher, a Python-based software tool.
  • Implementation of algorithms to detect and correct biases in Gene Ontology annotations.

Main Results:

  • GOThresher effectively identifies and removes biases from protein function annotation databases.
  • The tool facilitates the creation of more accurate and representative annotation datasets.

Conclusions:

  • Removing biases is crucial for accurate protein function prediction and understanding.
  • GOThresher provides a valuable resource for the bioinformatics community to improve annotation quality.