Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Naming Enantiomers

Naming Enantiomers

The naming of enantiomers employs the Cahn–Ingold–Prelog rules that involve assigning priorities to different substituent groups at a chiral center. Each enantiomer, being a distinct molecule, is assigned a unique name by the Cahn–Ingold–Prelog (CIP) rules, also called the R–S system. The prefix R- or S- attached to the chiral centers in an enantiomer is dependent on the spatial arrangement of the four substituents on the chiral center. The R–S system essentially comprises three...

GIS Software, Hardware, and Sources of GIS Data

GIS Software, Hardware, and Sources of GIS Data

A Geographic Information System (GIS) combines specialized software and hardware to effectively manage, analyze, and present spatial and related data. GIS software includes critical functionalities such as a user interface for easy navigation, database management tools for handling spatial and attribute data, and data retrieval features for efficient access. Analytical tools transform raw data into insights, while display functions produce maps and reports in various formats for effective...

Naming Skeletal Muscles

Naming Skeletal Muscles

The naming of the approximately 700 muscles in the human body is based on a set of criteria designed to provide descriptive information about each muscle, making it easier to identify and remember them.
The key factors used in naming muscles include:

Assessment of the Gastrointestinal System I: Subjective Data

Assessment of the Gastrointestinal System I: Subjective Data

Assessing the gastrointestinal (GI) system is a complex process that begins with collecting subjective data. This data, collected through patient interviews, provides crucial insights into the patient's health history, perception patterns, and lifestyle habits, all contributing significantly to GI health.
Health History
The initial step in assessing the GI system is obtaining a comprehensive health history. This includes inquiring about the patient's history or presence of problems...

Assessment of the Cardiovascular System I: Subjective Data

Assessment of the Cardiovascular System I: Subjective Data

A thorough health history and physical assessment are essential for identifying cardiovascular disease (CVD) symptoms and distinguishing them from other health issues.
Initial Enquiry
Ask the patient about their primary concern and thoroughly explore all reported symptoms.
Medical History
Investigate past illnesses affecting the cardiovascular system, such as angina, anemia, rheumatic fever, congenital heart disease, stroke, thrombophlebitis, dysrhythmias, varicosities
Inquire about symptoms...

Common Names of Aldehydes and Ketones

Common Names of Aldehydes and Ketones

Some common aldehydes and ketones are popularly known by their common names used historically and predate the IUPAC nomenclature.
Common names of aldehydes are derived from the names of their corresponding acid. For instance, the two-carbon aldehyde–acetaldehyde derives its name from the corresponding acid–acetic acid. Similarly, formaldehyde derives its name from formic acid and benzaldehyde from benzoic acid.
Aliphatic ketones are named by suffixing the word “ketone” to the...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Identifying nutraceutical targets to treat polycystic ovary syndrome using graph representation learning.

npj women's health·2025

Same author

Combinatorial prediction of therapeutic perturbations using causally inspired neural networks.

Nature biomedical engineering·2025

Same author

Smart CAR-T Nanosymbionts: archetypes and proto-models.

Frontiers in immunology·2025

Same author

Development and validation of a diagnostic prediction model for pancreatic ductal adenocarcinoma: VAPOR 1, protocol for a prospective multicentre case-control study.

BMJ open·2025

Same author

Non-invasive breath testing to detect colorectal cancer: protocol for a multicentre, case-control development and validation study (COBRA2 study).

BMC cancer·2025

Same author

The Helicobacter pylori AI-clinician harnesses artificial intelligence to personalise H. pylori treatment recommendations.

Nature communications·2025

Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026

Same journal

SpaMFG: a Spatial Multi-omics Integration Method based on Feature Grouping.

Bioinformatics (Oxford, England)·2026

Same journal

CSCN: Inference of Cell-Specific Causal Networks Using Single-Cell RNA-Seq Data.

Bioinformatics (Oxford, England)·2026

Same journal

Sparse CCA-Based Mediation Analysis with High-Dimensional Exposures and Mediators.

Bioinformatics (Oxford, England)·2026

Same journal

Enhancing Cross-Context Generalization in Drug Perturbation Prediction with a Multimodal Conditional Diffusion Framework.

Bioinformatics (Oxford, England)·2026

Same journal

Primer Design through Submodular Function Estimation.

Bioinformatics (Oxford, England)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Feb 13, 2026

Analyzing Mitochondrial Morphology Through Simulation Supervised Learning

Analyzing Mitochondrial Morphology Through Simulation Supervised Learning

Published on: March 3, 2023

Exploiting and assessing multi-source data for supervised biomedical named entity recognition.

Dieter Galea¹, Ivan Laponogov¹, Kirill Veselkov¹

¹Computational and Systems Medicine, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, UK.

Bioinformatics (Oxford, England)

|March 15, 2018

Summary

This summary is machine-generated.

Combining multiple training datasets improves biomolecular named entity recognition (NER) model generalizability. Models trained on diverse sources show better performance across independent corpora, addressing overtraining and annotation differences.

More Related Videos

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Published on: February 23, 2019

A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts

A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts

Published on: September 20, 2018

Related Experiment Videos

Last Updated: Feb 13, 2026

Analyzing Mitochondrial Morphology Through Simulation Supervised Learning

Analyzing Mitochondrial Morphology Through Simulation Supervised Learning

Published on: March 3, 2023

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Published on: February 23, 2019

A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts

A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts

Published on: September 20, 2018

Area of Science:

Biomedical Natural Language Processing
Bioinformatics
Computational Biology

Background:

Named entity recognition (NER) is crucial for extracting biomedical information from scientific text.
Supervised machine learning models for NER heavily depend on annotated training data.
Models trained and tested on the same data may overestimate performance and fail to generalize to diverse corpora.

Purpose of the Study:

To evaluate the generalizability of biomolecular NER models across independent corpora.
To investigate the impact of single-source versus multi-source training data on model performance.
To identify factors contributing to model overtraining and performance discrepancies.

Main Methods:

Aggregated published corpora for biomolecular entity recognition (genes, RNA, proteins, etc.).
Employed a leave-corpus-out cross-validation strategy to assess model performance on independent datasets.
Investigated orthographic features and annotation standard differences to explain performance variations.
Conducted learning-curve-based power analysis to evaluate data quantity limitations.

Main Results:

Model accuracies significantly decreased when tested on independent corpora, indicating poor generalizability.
Combined use of multi-source training corpora resulted in more generalizable NER models.
Achieved comparable performance with multi-source models as with single-source models, but with enhanced robustness.
Found that model performance was often not limited by the quantity of annotated data.

Conclusions:

Single-source training leads to overfitted models with limited generalizability in biomolecular NER.
Aggregating diverse training corpora enhances model robustness and performance across different datasets.
Future efforts should focus on creating diverse, multi-source training datasets for improved biomedical information extraction.