Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Videos

High-recall protein entity recognition using a dictionary.

Zhenzhen Kou¹, William W Cohen, Robert F Murphy

¹Center for Automated Learning and Discovery, Carnegie Mellon University Pittsburgh, PA 15213, USA. zkou@andrew.cmu.edu

Bioinformatics (Oxford, England)

|June 18, 2005

Summary

This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Big1 is a cell-cycle regulator linking cell size to basal body number in Tetrahymena thermophila.

Current biology : CB·2026

Same author

SPRM: spatial process and relationship modeling for multiplexed images.

Bioinformatics advances·2026

Same author

Leave it alone: the natural history of growth-friendly graduates without a final fusion.

Spine deformity·2026

Same author

The Age of Definitive Fusion Surgery for Early Onset Scoliosis Has Remained Constant Over the Past 2 Decades.

Journal of pediatric orthopedics·2026

Same author

Flexible and robust cell-type annotation for highly multiplexed tissue images.

Cell systems·2025

Same author

CytoSpatio: Learning cell type spatial relationships using multirange, multitype point process models.

PLoS computational biology·2025

Same journal

MCFST: Spatial domain identification method based on multi-view graph convolutional network and graph fusion network.

Bioinformatics (Oxford, England)·2026

Same journal

SpaBiT: Enhancing Spatial Transcriptomics Resolution via Bidirectional Attention Transformers.

Bioinformatics (Oxford, England)·2026

Same journal

EDEL: Enhancing Dense Retrievers for Curation of Biomedical Knowledge Bases.

Bioinformatics (Oxford, England)·2026

Same journal

Informative Relational Learning for Adverse Reaction Prediction with Enhanced Generalization to Novel Drugs.

Bioinformatics (Oxford, England)·2026

Same journal

An interpretable deep learning framework uncovers features governing CRISPR-Cas9 genome-editing efficiency.

Bioinformatics (Oxford, England)·2026

Same journal

3DICE: Interpretable 3D Cross-Modal Learning for Drug-Target Interaction Prediction and Large-Scale Drug Discovery.

Bioinformatics (Oxford, England)·2026

See all related articles

We developed semi-conditional random fields (semiCRFs) and dictionary hidden Markov models (HMMs) for protein name extraction from biological literature. These methods improve upon existing techniques for identifying protein entities.

Area of Science:

Bioinformatics
Computational Biology
Natural Language Processing

Background:

Protein name extraction is crucial for biological literature mining.
Existing methods have limitations in utilizing dictionary information effectively.

Purpose of the Study:

To introduce and evaluate two novel methods for protein name extraction: semiCRFs and dictionary HMMs.
To compare their performance against established methods like Maximum Entropy and standard CRFs.

Main Methods:

SemiCRFs: An extension of CRFs incorporating dictionary information as features.
Dictionary HMMs: Converting dictionaries into HMMs to recognize phrases and their variations.
Comparative analysis on three datasets using F-measure and dictionary match metrics.

Related Experiment Videos

Main Results:

Both semiCRFs and dictionary HMMs demonstrated improved performance over previous best results on two datasets.
CRFs and semiCRFs achieved the highest overall performance based on the F-measure.
Dictionary HMMs excelled in identifying entities present in the dictionary.

Conclusions:

SemiCRFs and dictionary HMMs represent significant advancements in automated protein name extraction.
These methods enhance the ability to mine biological literature for protein-related information.
The developed algorithms are available via the MINORTHIRD package.