Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Transformer-Based Multilabel NER Using Wikipedia Corpora in Multiple Languages.

Yelyzaveta Ahapova1, Johann Frei1, Frank Kramer1

  • 1IT-Infrastructure for Translational Medical Research, University of Augsburg, Germany.

Studies in Health Technology and Informatics
|May 17, 2025
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Overcoming Domain Shift in Atypical Mitotic Figure Detection with Deep Ensemble Learning.

Studies in health technology and informatics·2026
Same author

Comparison of Loss Functions for Fibroglandular Tissue Segmentation in MRI.

Studies in health technology and informatics·2026
Same author

A Web Application for Structured Management and Reuse of Electronic Case Report Forms in REDCap.

Studies in health technology and informatics·2026
Same author

Context-Free Grammar-Guided Generation of FHIR Resources Using Large Language Models.

Studies in health technology and informatics·2026
Same author

Development and External Validation of a Deep Learning Model to Predict Mortality in Aneurysmal Subarachnoid Hemorrhage Using Admission Computed Tomography.

Neurosurgery·2026
Same author

Putting Theory into Practice by Developing a Novel Digital Health Technology-Derived Endpoint in Sleep Quality.

Digital biomarkers·2026
Same journal

The Essential Components and Critical Conditions for Success in a Learning Health System in Oncology.

Studies in health technology and informatics·2026
Same journal

Use of Artificial Intelligence in Screening for Adolescent Idiopathic Scoliosis: A Scoping Review.

Studies in health technology and informatics·2026
Same journal

Movement Related Biomechanics in Adolescent Idiopathic Scoliosis: A Review of Reviews.

Studies in health technology and informatics·2026
Same journal

The Impact of Surgical Correction of Adolescent Idiopathic Scoliosis Using Posterior Spinal Fusion on Selected Radiological Parameters and Respiratory Function.

Studies in health technology and informatics·2026
Same journal

Acute Effect of Physio-logic® Exercises on Muscle Tone and Stiffness in Adolescent Idiopathic Scoliosis Patients: A Preliminary Study.

Studies in health technology and informatics·2026
Same journal

Effects of Integrated Music and Occupational Therapy on Motor and Autonomic Function in Children with Neurogenic Scoliosis.

Studies in health technology and informatics·2026
See all related articles

This study introduces an unsupervised method to create medical text datasets for named entity recognition (NER) in multiple languages. The approach improves German medication recognition, especially with limited data.

Area of Science:

  • Natural Language Processing
  • Medical Informatics
  • Computational Linguistics

Background:

  • Manual data labeling for medical texts is expensive and raises privacy issues, leading to a lack of non-English medical annotations.
  • Existing methods often require extensive labeled data, which is scarce for many languages.
  • Ontology-based corpus construction offers a potential solution to data scarcity.

Purpose of the Study:

  • To evaluate an unsupervised approach for creating ontology-annotated corpora from Wikipedia for medical Named Entity Recognition (NER).
  • To assess the effectiveness of this approach across English, German, Spanish, and French.
  • To improve medication and diagnosis entity recognition in low-resource medical text scenarios.

Main Methods:

  • An unsupervised method was used to construct ontology-annotated corpora from Wikipedia (Wikidata).
Keywords:
diagnosis extractionmedical NERmedication extractionnamed entity recognitionnatural language processing

Related Experiment Videos

  • The approach was applied to generate multilabel corpora for English, German, Spanish, and French.
  • The generated corpora were used to train and evaluate models for medication and diagnosis entity recognition.
  • Main Results:

    • The unsupervised approach yielded notable improvements in German medication entity detection, particularly under sparse annotation conditions.
    • Consistent performance was observed across English, German, Spanish, and French for entity recognition tasks.
    • The generated multilabel corpora demonstrated effectiveness in enhancing NER performance compared to baseline methods.

    Conclusions:

    • Unsupervised ontology-based corpus construction is a viable strategy to address the scarcity of medical annotations in non-English languages.
    • This method offers a cost-effective and privacy-preserving alternative to manual data labeling for medical NER.
    • The approach shows promise for improving cross-lingual medical information extraction and analysis.