Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Corpus refactoring: a feasibility study.

Helen L Johnson1, William A Baumgartner, Martin Krallinger

  • 1Center for Computational Pharmacology, University of Colorado School of Medicine, Aurora, CO, USA. helen.johnson@uchsc.edu.

Journal of Biomedical Discovery and Collaboration
|September 15, 2007
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

The future of fundamental science led by generative closed-loop artificial intelligence.

Frontiers in artificial intelligence·2026
Same author

In Memoriam: Celebrating the Life of Dr. John R. Benfield (1931-2025).

The Annals of thoracic surgery·2026
Same author

Improving biomedical entity linking with generative relevance feedback.

Bioinformatics (Oxford, England)·2026
Same author

Desiderata for a biomedical knowledge network: opportunities, challenges and future Directions.

ArXiv·2025
Same author

Atlantification drives recent strengthening of the Arctic overturning circulation.

Science advances·2025
Same author

A textual dataset of de-identified health records in Spanish and Catalan for medical entity recognition and anonymization.

Scientific data·2025
Same journal

Two Similarity Metrics for Medical Subject Headings (MeSH): An Aid to Biomedical Text Mining and Author Name Disambiguation.

Journal of biomedical discovery and collaboration·2016
Same journal

The language of discovery.

Journal of biomedical discovery and collaboration·2011
Same journal

Bias associated with mining electronic health records.

Journal of biomedical discovery and collaboration·2011
Same journal

Literature-based Resurrection of Neglected Medical Discoveries.

Journal of biomedical discovery and collaboration·2011
Same journal

A cognitive task analysis of a visual analytic workflow: Exploring molecular interaction networks in systems biology.

Journal of biomedical discovery and collaboration·2011
Same journal

NEMO: Extraction and normalization of organization names from PubMed affiliations.

Journal of biomedical discovery and collaboration·2010
See all related articles

Refactoring biomedical corpora into new formats is feasible and cost-effective. This process increases the usability of valuable evaluation data for biomedical text mining advancements.

Area of Science:

  • Biomedical Natural Language Processing
  • Computational Linguistics
  • Bioinformatics

Background:

  • Biomedical corpora are underutilized due to distribution format limitations.
  • Corpus accessibility is a bottleneck for progress in biomedical text mining.
  • Refactoring aims to improve corpus usability without semantic alteration.

Purpose of the Study:

  • To test the feasibility of corpus refactoring.
  • To demonstrate a semi-automatable and time-efficient refactoring process.
  • To increase the accessibility and utility of biomedical evaluation data.

Main Methods:

  • Applied simple text processing techniques.
  • Utilized limited human validation for accuracy.
  • Converted the Protein Design Group corpus into WordFreak and embedded XML formats.

Related Experiment Videos

Main Results:

  • The refactored corpus is available via BioNLP SourceForge.
  • Total effort involved approximately three person-weeks (102 hours programming, 20 hours validation).
  • The refactoring process was demonstrated to be time-efficient.

Conclusions:

  • Corpus refactoring is technically and economically viable.
  • This method enhances the use of existing evaluation data.
  • Increased corpus usability will accelerate biomedical language processing research.