Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Videos

Automatic term list generation for entity tagging.

Ted Sandler¹, Andrew I Schein, Lyle H Ungar

¹Department of Computer and Information Science, University of Pennsylvania 3330 Walnut Street, Philadelphia, 19104, USA. tsandler@seas.upenn.edu

Bioinformatics (Oxford, England)

|October 27, 2005

Summary

This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Machine-learning identified suicide risk and emergency department inpatient admission.

General hospital psychiatry·2026

Same author

Comparing Risk Prediction for Suicide Attempts and Deaths After Emergency Department Visits for Individuals With Mental Health Disorders.

The Journal of clinical psychiatry·2026

Same author

Evaluating AI-based comprehensive clinical decision support for sepsis and ARDS: protocol for a Clinician Turing Test.

BMJ open·2025

Same author

Comparing patterns of recent mental health service use for predicting suicidal events following emergency department mental health visits in the United States: A national cohort study.

Social psychiatry and psychiatric epidemiology·2025

Same author

Quantifying generalized trust in individuals and counties using language.

Frontiers in social psychology·2025

Same author

Differential associations of passively sensed behaviors with in-vivo depression symptoms.

Journal of psychopathology and clinical science·2025

Same journal

3DICE: Interpretable 3D Cross-Modal Learning for Drug-Target Interaction Prediction and Large-Scale Drug Discovery.

Bioinformatics (Oxford, England)·2026

Same journal

KASSPer: Kinase Active Site Structure Prediction using Protein and Ligand Language Models and Its Application to Virtual Screening.

Bioinformatics (Oxford, England)·2026

Same journal

IDR searcher: a search engine solution for public image resources.

Bioinformatics (Oxford, England)·2026

Same journal

KCFtools: Rapid alignment-free method for introgression screening and GWAS using k-mer profiles.

Bioinformatics (Oxford, England)·2026

Same journal

Meta2DB: Curated shotgun metagenomic feature sets and metadata for health state prediction.

Bioinformatics (Oxford, England)·2026

Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026

See all related articles

Distributional clustering methods can automatically generate term lists for entity recognition, improving gene tagger performance. These lists enhance precision and recall, complementing supervised methods for better results.

Area of Science:

Computational linguistics
Bioinformatics
Natural Language Processing

Background:

Entity recognition systems and information extraction rely on curated term lists.
Manual construction of these lists is time-consuming and labor-intensive.
Distributional clustering offers an automated approach to term list generation.

Purpose of the Study:

To investigate the utility of distributional clustering for creating term lists.
To evaluate the impact of automatically generated term lists on gene entity recognition.
To compare the performance of automated lists with manually curated and supervised methods.

Main Methods:

Utilized distributional clustering based on word context and shallow parsing.
Applied shallow parsing to extract syntactic relations.

Related Experiment Videos

Integrated automatically generated term lists into a Conditional Random Field (CRF)-based gene tagger.

Main Results:

Automatically generated term lists significantly improved the precision of a state-of-the-art gene tagger.
Recall was boosted beyond that achieved with hand-curated lists.
Distributional clustering lists complemented supervised techniques for enhanced performance.

Conclusions:

Distributional clustering is an effective method for aiding the construction of entity term lists.
Automated term lists can significantly enhance the performance of gene taggers.
Combining distributional clustering with supervised methods yields superior results.