Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Naming Enantiomers02:21

Naming Enantiomers

26.2K
The naming of enantiomers employs the Cahn–Ingold–Prelog rules that involve assigning priorities to different substituent groups at a chiral center. Each enantiomer, being a distinct molecule, is assigned a unique name by the Cahn–Ingold–Prelog (CIP) rules, also called the R–S system. The prefix R- or S- attached to the chiral centers in an enantiomer is dependent on the spatial arrangement of the four substituents on the chiral center. The R–S system essentially comprises three...
26.2K
GIS Software, Hardware, and Sources of GIS Data01:23

GIS Software, Hardware, and Sources of GIS Data

817
A Geographic Information System (GIS) combines specialized software and hardware to effectively manage, analyze, and present spatial and related data. GIS software includes critical functionalities such as a user interface for easy navigation, database management tools for handling spatial and attribute data, and data retrieval features for efficient access. Analytical tools transform raw data into insights, while display functions produce maps and reports in various formats for effective...
817
Naming Skeletal Muscles01:19

Naming Skeletal Muscles

4.1K
The naming of the approximately 700 muscles in the human body is based on a set of criteria designed to provide descriptive information about each muscle, making it easier to identify and remember them.
The key factors used in naming muscles include:
4.1K
Assessment of the Gastrointestinal System I: Subjective Data01:17

Assessment of the Gastrointestinal System I: Subjective Data

690
Assessing the gastrointestinal (GI) system is a complex process that begins with collecting subjective data. This data, collected through patient interviews, provides crucial insights into the patient's health history, perception patterns, and lifestyle habits, all contributing significantly to GI health.
Health History
The initial step in assessing the GI system is obtaining a comprehensive health history. This includes inquiring about the patient's history or presence of problems...
690
Assessment of the Cardiovascular System I: Subjective Data01:23

Assessment of the Cardiovascular System I: Subjective Data

927
A thorough health history and physical assessment are essential for identifying cardiovascular disease (CVD) symptoms and distinguishing them from other health issues.
Initial Enquiry
Ask the patient about their primary concern and thoroughly explore all reported symptoms.
Medical History
Investigate past illnesses affecting the cardiovascular system, such as angina, anemia, rheumatic fever, congenital heart disease, stroke, thrombophlebitis, dysrhythmias, varicosities
Inquire about symptoms...
927
Common Names of Aldehydes and Ketones01:11

Common Names of Aldehydes and Ketones

5.1K
Some common aldehydes and ketones are popularly known by their common names used historically and predate the IUPAC nomenclature.   
Common names of aldehydes are derived from the names of their corresponding acid. For instance, the two-carbon aldehyde–acetaldehyde derives its name from the corresponding acid–acetic acid. Similarly, formaldehyde derives its name from formic acid and benzaldehyde from benzoic acid.
Aliphatic ketones are named by suffixing the word “ketone” to the...
5.1K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Identifying nutraceutical targets to treat polycystic ovary syndrome using graph representation learning.

npj women's health·2025
Same author

Combinatorial prediction of therapeutic perturbations using causally inspired neural networks.

Nature biomedical engineering·2025
Same author

Smart CAR-T Nanosymbionts: archetypes and proto-models.

Frontiers in immunology·2025
Same author

Development and validation of a diagnostic prediction model for pancreatic ductal adenocarcinoma: VAPOR 1, protocol for a prospective multicentre case-control study.

BMJ open·2025
Same author

Non-invasive breath testing to detect colorectal cancer: protocol for a multicentre, case-control development and validation study (COBRA2 study).

BMC cancer·2025
Same author

The Helicobacter pylori AI-clinician harnesses artificial intelligence to personalise H. pylori treatment recommendations.

Nature communications·2025
Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026
Same journal

SpaMFG: a Spatial Multi-omics Integration Method based on Feature Grouping.

Bioinformatics (Oxford, England)·2026
Same journal

CSCN: Inference of Cell-Specific Causal Networks Using Single-Cell RNA-Seq Data.

Bioinformatics (Oxford, England)·2026
Same journal

Sparse CCA-Based Mediation Analysis with High-Dimensional Exposures and Mediators.

Bioinformatics (Oxford, England)·2026
Same journal

Enhancing Cross-Context Generalization in Drug Perturbation Prediction with a Multimodal Conditional Diffusion Framework.

Bioinformatics (Oxford, England)·2026
Same journal

Primer Design through Submodular Function Estimation.

Bioinformatics (Oxford, England)·2026
See all related articles

Related Experiment Video

Updated: Feb 13, 2026

Analyzing Mitochondrial Morphology Through Simulation Supervised Learning
12:06

Analyzing Mitochondrial Morphology Through Simulation Supervised Learning

Published on: March 3, 2023

4.8K

Exploiting and assessing multi-source data for supervised biomedical named entity recognition.

Dieter Galea1, Ivan Laponogov1, Kirill Veselkov1

  • 1Computational and Systems Medicine, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, UK.

Bioinformatics (Oxford, England)
|March 15, 2018
PubMed
Summary
This summary is machine-generated.

Combining multiple training datasets improves biomolecular named entity recognition (NER) model generalizability. Models trained on diverse sources show better performance across independent corpora, addressing overtraining and annotation differences.

More Related Videos

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications
09:20

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Published on: February 23, 2019

9.2K
A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts
07:50

A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts

Published on: September 20, 2018

16.5K

Related Experiment Videos

Last Updated: Feb 13, 2026

Analyzing Mitochondrial Morphology Through Simulation Supervised Learning
12:06

Analyzing Mitochondrial Morphology Through Simulation Supervised Learning

Published on: March 3, 2023

4.8K
Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications
09:20

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Published on: February 23, 2019

9.2K
A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts
07:50

A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts

Published on: September 20, 2018

16.5K

Area of Science:

  • Biomedical Natural Language Processing
  • Bioinformatics
  • Computational Biology

Background:

  • Named entity recognition (NER) is crucial for extracting biomedical information from scientific text.
  • Supervised machine learning models for NER heavily depend on annotated training data.
  • Models trained and tested on the same data may overestimate performance and fail to generalize to diverse corpora.

Purpose of the Study:

  • To evaluate the generalizability of biomolecular NER models across independent corpora.
  • To investigate the impact of single-source versus multi-source training data on model performance.
  • To identify factors contributing to model overtraining and performance discrepancies.

Main Methods:

  • Aggregated published corpora for biomolecular entity recognition (genes, RNA, proteins, etc.).
  • Employed a leave-corpus-out cross-validation strategy to assess model performance on independent datasets.
  • Investigated orthographic features and annotation standard differences to explain performance variations.
  • Conducted learning-curve-based power analysis to evaluate data quantity limitations.

Main Results:

  • Model accuracies significantly decreased when tested on independent corpora, indicating poor generalizability.
  • Combined use of multi-source training corpora resulted in more generalizable NER models.
  • Achieved comparable performance with multi-source models as with single-source models, but with enhanced robustness.
  • Found that model performance was often not limited by the quantity of annotated data.

Conclusions:

  • Single-source training leads to overfitted models with limited generalizability in biomolecular NER.
  • Aggregating diverse training corpora enhances model robustness and performance across different datasets.
  • Future efforts should focus on creating diverse, multi-source training datasets for improved biomedical information extraction.