Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Improving Translational Accuracy02:07

Improving Translational Accuracy

3.5K
3.5K
Improving Translational Accuracy02:07

Improving Translational Accuracy

14.0K
Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
14.0K
The Scientific Method02:40

The Scientific Method

64.6K
Research is what makes the difference between facts and opinions. Facts are observable realities, and opinions are personal judgments, conclusions, or attitudes that may or may not be accurate. In the scientific community, facts can be established only using evidence collected through empirical research.
64.6K
The Scientific Method01:32

The Scientific Method

255.2K
The scientific method is a detailed, empirical problem-solving process used by biologists and other scientists. This iterative approach involves formulating a question based on observation, developing a testable potential explanation for the observation (called a hypothesis), making and testing predictions based on the hypothesis, and using the findings to create new hypotheses and predictions.
Generally, predictions are tested using carefully-designed experiments. Based on the outcome of these...
255.2K
CRISPR and crRNAs02:53

CRISPR and crRNAs

18.6K
Bacteria and archaea are susceptible to viral infections just like eukaryotes; therefore, they have developed a unique adaptive immune system to protect themselves. Clustered regularly interspaced short palindromic repeats and CRISPR-associated proteins (CRISPR-Cas) are present in more than 45% of known bacteria and 90% of known archaea.
The CRISPR-Cas system stores a copy of foreign DNA in the host genome and uses it to identify the foreign DNA upon reinfection. CRISPR-Cas has three different...
18.6K
Nature and Nurture01:10

Nature and Nurture

22.1K
Many human characteristics, like height, are shaped by both nature—in other words, by our genes—and by nurture, or our environment. For example, chronic stress during childhood inhibits the production of growth hormones and consequently reduces bone growth and height. Scientists estimate that 70-90% of variation in height is due to genetic differences among individuals, and 10-30% of variation in height is due to differences in the environments that individuals experience,...
22.1K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Creating a smoother path to scientific publication.

Molecular oncology·2026
Same author

An Artificial Intelligence-Driven Multimorbidity Framework Reveals a Shared Metabolic and Immune Core Across Alzheimer's Disease, Amyotrophic Lateral Sclerosis, and Frontotemporal Dementia.

Biomedicines·2026
Same author

How do authors want to use AI for review? : A survey to assess the perception of scientists who received both AI and human reviews of their manuscripts.

EMBO reports·2026
Same author

Artificial Intelligence-Based Analysis of Central Nervous System Vasculopathy in Pediatric Sickle Cell Anemia.

American journal of hematology·2026
Same author

CervSpineNet: a hybrid deep learning-based approach for the segmentation of cervical spinous processes.

Frontiers in bioengineering and biotechnology·2026
Same author

Towards an AI biomedical scientist: Accelerating discoveries in neurodegenerative disease.

The journal of prevention of Alzheimer's disease·2025
Same journal

3DICE: Interpretable 3D Cross-Modal Learning for Drug-Target Interaction Prediction and Large-Scale Drug Discovery.

Bioinformatics (Oxford, England)·2026
Same journal

KASSPer: Kinase Active Site Structure Prediction using Protein and Ligand Language Models and Its Application to Virtual Screening.

Bioinformatics (Oxford, England)·2026
Same journal

IDR searcher: a search engine solution for public image resources.

Bioinformatics (Oxford, England)·2026
Same journal

KCFtools: Rapid alignment-free method for introgression screening and GWAS using k-mer profiles.

Bioinformatics (Oxford, England)·2026
Same journal

Meta2DB: Curated shotgun metagenomic feature sets and metadata for health state prediction.

Bioinformatics (Oxford, England)·2026
Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026
See all related articles

Related Experiment Video

Updated: Jan 7, 2026

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems
05:47

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

1.2K

Integrating curation into scientific publishing to train AI models.

Jorge Abreu-Vicente1, Hannah Sonntag1, Thomas Eidens1

  • 1EMBO, Heidelberg 69117, Germany.

Bioinformatics (Oxford, England)
|December 27, 2025
PubMed
Summary
This summary is machine-generated.

This study integrates data curation into academic publishing, creating a large dataset (SourceData-NLP) for machine learning. This enables better analysis of biomedical research figures and text.

More Related Videos

Artificial Intelligence Approaches to Assessing Primary Cilia
08:58

Artificial Intelligence Approaches to Assessing Primary Cilia

Published on: May 1, 2021

4.1K
Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

983

Related Experiment Videos

Last Updated: Jan 7, 2026

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems
05:47

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

1.2K
Artificial Intelligence Approaches to Assessing Primary Cilia
08:58

Artificial Intelligence Approaches to Assessing Primary Cilia

Published on: May 1, 2021

4.1K
Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

983

Area of Science:

  • Biomedical Informatics
  • Computational Biology
  • Scientific Publishing

Background:

  • High-throughput data extraction and structured labeling from academic articles are vital for machine learning and secondary analyses.
  • Existing methods lack integration with the publishing workflow and comprehensive annotation of experimental roles and methodologies.
  • There is a need for advanced bioentity recognition and annotation within the scientific literature.

Purpose of the Study:

  • To embed multimodal data curation into the academic publishing process.
  • To create a comprehensive dataset for training AI models in biomedical research.
  • To improve the annotation accuracy of figure panels and captions.

Main Methods:

  • Integrated multimodal data curation into the academic publishing workflow.
  • Utilized natural language processing and author feedback for annotation.
  • Annotated segmented figure panels and captions from molecular and cell biology articles.
  • Developed new AI tasks for evaluating dataset utility, including named-entity recognition and context-dependent semantic analysis.

Main Results:

  • Created the SourceData-NLP dataset with over 620,000 annotated biomedical entities from 18,689 figures across 3,223 articles.
  • Annotations include eight bioentity classes and experimental roles/methodologies.
  • Demonstrated the dataset's utility for AI model training in named-entity recognition, figure caption segmentation, and novel semantic tasks.
  • Showcased multi-modal applications for segmenting figures into panels and captions.

Conclusions:

  • The SourceData-NLP dataset significantly enhances machine learning applications in biomedical research.
  • Integrating data curation into publishing streamlines the creation of valuable, structured datasets.
  • The developed models and dataset facilitate advanced analysis of scientific figures and text.