Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

High-recall protein entity recognition using a dictionary.

Zhenzhen Kou1, William W Cohen, Robert F Murphy

  • 1Center for Automated Learning and Discovery, Carnegie Mellon University Pittsburgh, PA 15213, USA. zkou@andrew.cmu.edu

Bioinformatics (Oxford, England)
|June 18, 2005
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Big1 is a cell-cycle regulator linking cell size to basal body number in Tetrahymena thermophila.

Current biology : CB·2026
Same author

SPRM: spatial process and relationship modeling for multiplexed images.

Bioinformatics advances·2026
Same author

Leave it alone: the natural history of growth-friendly graduates without a final fusion.

Spine deformity·2026
Same author

The Age of Definitive Fusion Surgery for Early Onset Scoliosis Has Remained Constant Over the Past 2 Decades.

Journal of pediatric orthopedics·2026
Same author

Flexible and robust cell-type annotation for highly multiplexed tissue images.

Cell systems·2025
Same author

CytoSpatio: Learning cell type spatial relationships using multirange, multitype point process models.

PLoS computational biology·2025
Same journal

MCFST: Spatial domain identification method based on multi-view graph convolutional network and graph fusion network.

Bioinformatics (Oxford, England)·2026
Same journal

SpaBiT: Enhancing Spatial Transcriptomics Resolution via Bidirectional Attention Transformers.

Bioinformatics (Oxford, England)·2026
Same journal

EDEL: Enhancing Dense Retrievers for Curation of Biomedical Knowledge Bases.

Bioinformatics (Oxford, England)·2026
Same journal

Informative Relational Learning for Adverse Reaction Prediction with Enhanced Generalization to Novel Drugs.

Bioinformatics (Oxford, England)·2026
Same journal

An interpretable deep learning framework uncovers features governing CRISPR-Cas9 genome-editing efficiency.

Bioinformatics (Oxford, England)·2026
Same journal

3DICE: Interpretable 3D Cross-Modal Learning for Drug-Target Interaction Prediction and Large-Scale Drug Discovery.

Bioinformatics (Oxford, England)·2026
See all related articles

We developed semi-conditional random fields (semiCRFs) and dictionary hidden Markov models (HMMs) for protein name extraction from biological literature. These methods improve upon existing techniques for identifying protein entities.

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Natural Language Processing

Background:

  • Protein name extraction is crucial for biological literature mining.
  • Existing methods have limitations in utilizing dictionary information effectively.

Purpose of the Study:

  • To introduce and evaluate two novel methods for protein name extraction: semiCRFs and dictionary HMMs.
  • To compare their performance against established methods like Maximum Entropy and standard CRFs.

Main Methods:

  • SemiCRFs: An extension of CRFs incorporating dictionary information as features.
  • Dictionary HMMs: Converting dictionaries into HMMs to recognize phrases and their variations.
  • Comparative analysis on three datasets using F-measure and dictionary match metrics.

Related Experiment Videos

Main Results:

  • Both semiCRFs and dictionary HMMs demonstrated improved performance over previous best results on two datasets.
  • CRFs and semiCRFs achieved the highest overall performance based on the F-measure.
  • Dictionary HMMs excelled in identifying entities present in the dictionary.

Conclusions:

  • SemiCRFs and dictionary HMMs represent significant advancements in automated protein name extraction.
  • These methods enhance the ability to mine biological literature for protein-related information.
  • The developed algorithms are available via the MINORTHIRD package.