Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Automatic term list generation for entity tagging.

Ted Sandler1, Andrew I Schein, Lyle H Ungar

  • 1Department of Computer and Information Science, University of Pennsylvania 3330 Walnut Street, Philadelphia, 19104, USA. tsandler@seas.upenn.edu

Bioinformatics (Oxford, England)
|October 27, 2005
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Machine-learning identified suicide risk and emergency department inpatient admission.

General hospital psychiatry·2026
Same author

Comparing Risk Prediction for Suicide Attempts and Deaths After Emergency Department Visits for Individuals With Mental Health Disorders.

The Journal of clinical psychiatry·2026
Same author

Evaluating AI-based comprehensive clinical decision support for sepsis and ARDS: protocol for a Clinician Turing Test.

BMJ open·2025
Same author

Comparing patterns of recent mental health service use for predicting suicidal events following emergency department mental health visits in the United States: A national cohort study.

Social psychiatry and psychiatric epidemiology·2025
Same author

Quantifying generalized trust in individuals and counties using language.

Frontiers in social psychology·2025
Same author

Differential associations of passively sensed behaviors with in-vivo depression symptoms.

Journal of psychopathology and clinical science·2025
Same journal

3DICE: Interpretable 3D Cross-Modal Learning for Drug-Target Interaction Prediction and Large-Scale Drug Discovery.

Bioinformatics (Oxford, England)·2026
Same journal

KASSPer: Kinase Active Site Structure Prediction using Protein and Ligand Language Models and Its Application to Virtual Screening.

Bioinformatics (Oxford, England)·2026
Same journal

IDR searcher: a search engine solution for public image resources.

Bioinformatics (Oxford, England)·2026
Same journal

KCFtools: Rapid alignment-free method for introgression screening and GWAS using k-mer profiles.

Bioinformatics (Oxford, England)·2026
Same journal

Meta2DB: Curated shotgun metagenomic feature sets and metadata for health state prediction.

Bioinformatics (Oxford, England)·2026
Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026
See all related articles

Distributional clustering methods can automatically generate term lists for entity recognition, improving gene tagger performance. These lists enhance precision and recall, complementing supervised methods for better results.

Area of Science:

  • Computational linguistics
  • Bioinformatics
  • Natural Language Processing

Background:

  • Entity recognition systems and information extraction rely on curated term lists.
  • Manual construction of these lists is time-consuming and labor-intensive.
  • Distributional clustering offers an automated approach to term list generation.

Purpose of the Study:

  • To investigate the utility of distributional clustering for creating term lists.
  • To evaluate the impact of automatically generated term lists on gene entity recognition.
  • To compare the performance of automated lists with manually curated and supervised methods.

Main Methods:

  • Utilized distributional clustering based on word context and shallow parsing.
  • Applied shallow parsing to extract syntactic relations.

Related Experiment Videos

  • Integrated automatically generated term lists into a Conditional Random Field (CRF)-based gene tagger.
  • Main Results:

    • Automatically generated term lists significantly improved the precision of a state-of-the-art gene tagger.
    • Recall was boosted beyond that achieved with hand-curated lists.
    • Distributional clustering lists complemented supervised techniques for enhanced performance.

    Conclusions:

    • Distributional clustering is an effective method for aiding the construction of entity term lists.
    • Automated term lists can significantly enhance the performance of gene taggers.
    • Combining distributional clustering with supervised methods yields superior results.