Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

From DNA to Protein

From DNA to Protein

The flow of genetic information in cells from DNA to mRNA to protein is described by the central dogma, which states that genes specify the sequence of mRNAs, which in turn specify the sequence of amino acids making up all proteins. The decoding of one molecule to another is performed by specific proteins and RNAs. Because the information stored in DNA is so central to cellular function, it makes intuitive sense that the cell would make mRNA copies of this information for protein synthesis...

Conservation of Protein Domains Over Different Proteins

Conservation of Protein Domains Over Different Proteins

Protein domains are small structurally independent units that are part of a single amino acid chain. Although these domains are often structurally independent, they may rely on synergistic effects to perform their functions as part of a larger protein. Protein domains may be conserved within the same organism, as well as across different organisms.
A limited set of protein domains often duplicate and recombine during evolution. These domains can be organized in different combinations to form...

Conservation of Protein Domains

Conservation of Protein Domains

Protein domains are small structurally independent units that are part of a single amino acid chain. Although these domains are often structurally independent, they may rely on synergistic effects to perform their functions as part of a larger protein. Protein domains may be conserved within the same organism, as well as across different organisms.
A limited set of protein domains often duplicate and recombine during evolution. These domains can be organized in different combinations to form...

Tagging and Fusion Proteins

Tagging and Fusion Proteins

Proteins are involved in several cellular processes and biochemical reactions. Analyzing a specific protein of interest requires it to be isolated from the other proteins in the cell. This is achieved by overexpressing the specific gene in a suitable host to produce large quantities of the target protein. A tag or label is recombined with the gene to produce a fusion protein containing the target protein and the tag. The tags on these fusion proteins can then be used for easy detection and...

Peptide Identification Using Tandem Mass Spectrometry

Peptide Identification Using Tandem Mass Spectrometry

Tandem mass spectrometry, also known as MS/MS or MS2, is an analytical technique that employs two mass analyzers. Essentially it is a series of mass spectrometers that helps isolate a particular biomolecule and then helps study its chemical properties.
This technique helps gather information regarding the protein from which the peptide was obtained and to study the peptides’ amino acid sequence. Identifying peptides from a complex mixture is an important component of the growing field of...

Proteomics

Proteomics

A proteome is the entire set of proteins that a cell type produces. We can study proteomes using the knowledge of genomes because genes code for mRNAs, and the mRNAs encode proteins. Although mRNA analysis is a step in the right direction, not all mRNAs are translated into proteins.
Proteomics is the study of proteomes' function. It involves the large-scale systematic study of the proteome to denote the protein complement expressed by a genome. Scientist Mark Wilkins coined the term proteomics...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Knowledge Graph-Driven AI in Biohealth: From Biomedical Discovery to Health Risk Prediction.

Delaware journal of public health·2026

Same author

Enhanced Adverse-Event Detection and Drug-Event Relation Extraction from Clinical Notes.

medRxiv : the preprint server for health sciences·2026

Same author

A network-centric approach reveals novel pathways impacted by Prader-Willi Syndrome.

PloS one·2026

Same author

The Common Fund Data Ecosystem (CFDE).

bioRxiv : the preprint server for biology·2026

Same author

Desiderata for a biomedical knowledge network: opportunities, challenges and future directions.

Bioinformatics advances·2026

Same author

KSMoFinder-knowledge graph embedding of proteins and motifs for predicting kinases of human phosphosites.

Bioinformatics advances·2025

Same journal

Screen for Footprints of Selection during Domestication/Captive Breeding of Atlantic Salmon.

Comparative and functional genomics·2013

Same journal

Gemi: PCR primers prediction from multiple alignments.

Comparative and functional genomics·2013

Same journal

TnpPred: A Web Service for the Robust Prediction of Prokaryotic Transposases.

Comparative and functional genomics·2012

Same journal

The α(1)AT and TIMP-1 Gene Polymorphism in the Development of Asthma.

Comparative and functional genomics·2012

Same journal

Comparative Analysis of MicroRNAs between Sporophyte and Gametophyte of Porphyra yezoensis.

Comparative and functional genomics·2012

Same journal

Correlation of aquaporins and transmembrane solute transporters revealed by genome-wide analysis in developing maize leaf.

Comparative and functional genomics·2012

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 26, 2026

Creating and Applying a Reference to Facilitate the Discussion and Classification of Proteins in a Diverse Group

Creating and Applying a Reference to Facilitate the Discussion and Classification of Proteins in a Diverse Group

Published on: August 16, 2017

Protein name tagging guidelines: lessons learned.

Inderjeet Mani¹, Zhangzhi Hu, Seok Bae Jang

¹Georgetown University, 37th and O Streets NW, Washington, DC 20057, USA. im5@georgetown.edu

Comparative and Functional Genomics

|July 17, 2008

Summary

This summary is machine-generated.

Developing standardized protein name tagging guidelines improves information extraction from biomedical literature. This enhances structured database creation for genes and proteins.

More Related Videos

TMT Sample Preparation for Proteomics Facility Submission and Subsequent Data Analysis

TMT Sample Preparation for Proteomics Facility Submission and Subsequent Data Analysis

Published on: June 8, 2020

An Integrated Approach for Microprotein Identification and Sequence Analysis

An Integrated Approach for Microprotein Identification and Sequence Analysis

Published on: July 12, 2022

Related Experiment Videos

Last Updated: Jun 26, 2026

Creating and Applying a Reference to Facilitate the Discussion and Classification of Proteins in a Diverse Group

Creating and Applying a Reference to Facilitate the Discussion and Classification of Proteins in a Diverse Group

Published on: August 16, 2017

TMT Sample Preparation for Proteomics Facility Submission and Subsequent Data Analysis

TMT Sample Preparation for Proteomics Facility Submission and Subsequent Data Analysis

Published on: June 8, 2020

An Integrated Approach for Microprotein Identification and Sequence Analysis

An Integrated Approach for Microprotein Identification and Sequence Analysis

Published on: July 12, 2022

Area of Science:

Biomedical Informatics
Natural Language Processing
Computational Biology

Background:

Automated information extraction from biomedical literature is crucial for building structured databases.
A lack of standardized definitions for protein name tagging hinders accurate data extraction.
Existing methods struggle with ambiguous gene/protein names and determining exact name boundaries.

Purpose of the Study:

To address the lack of a standard definition for protein name tagging.
To develop guidelines for consistent protein named entity recognition.
To present initial inter-coder reliability results as a performance benchmark.

Main Methods:

Defined tagging targets as protein named entities, including related objects like domains, pathways, and genes.
Introduced two tag types: standard protein tags and optional long-form tags for extended boundaries.
Evaluated inter-coder consistency using three annotators on 300 MEDLINE abstracts.

Main Results:

Achieved an F-measure of 0.868 for inter-coder consistency on protein tags.
Identified key challenges including name ambiguity and boundary determination.
Developed and refined guidelines to address these challenges.

Conclusions:

The developed guidelines provide a standardized approach to protein name tagging.
High inter-coder consistency suggests the guidelines are effective and reliable.
The guidelines, datasets, and tools are available for research to advance biomedical information extraction.