Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Genome Annotation and Assembly

Genome Annotation and Assembly

The genome refers to all of the genetic material in an organism. It can range from a few million base pairs in microbial cells to several billion base pairs in many eukaryotic organisms. Genome assembly refers to the process of taking the DNA sequencing data and putting it all back together in a correct order to create a close representation of the original genome. This is followed by the identification of functional elements on the newly assembled genome, a process called genome annotation.

Mechanistic Models: Compartment Models in Individual and Population Analysis

Mechanistic Models: Compartment Models in Individual and Population Analysis

Mechanistic models are utilized in individual analysis using single-source data, but imperfections arise due to data collection errors, preventing perfect prediction of observed data. The mathematical equation involves known values (Xi), observed concentrations (Ci), measurement errors (εi), model parameters (ϕj), and the related function (ƒi) for i number of values. Different least-squares metrics quantify differences between predicted and observed values. The ordinary least squares (OLS)...

Nonsense-mediated mRNA Decay

Nonsense-mediated mRNA Decay

The Upf proteins that carry out nonsense-mediated decay (NMD) are found in all eukaryotic organisms, including humans. Each protein has an individual role, but they need to work in collaboration. Upf1 is an ATP-dependent RNA helicase that unwinds the RNA helix. Because Upf1 can unwind any RNA, Upf2 and Upf3 are required to help Upf1 discriminate between nonsense and normal mRNAs.
Usually, Upf3 binds to an Exon Junction Complex (EJC) at mRNA splice sites. If a ribosome fully translates the mRNA,...

Mismatch Repair

Mismatch Repair

Molecular Models

Molecular Models

Physical models representing molecular architectures of chemical compounds play essential roles in understanding chemistry. The use of molecular models makes it easier to visualize the structures and shapes of atoms and molecules.

Mechanistic Models: Overview of Compartment Models

Mechanistic Models: Overview of Compartment Models

Mechanistic models, a category encompassing both physiological and compartmental modeling, differ from empirical models' approaches to incorporating known factors about the systems being modeled. Empirical models describe data with minimal assumptions, while mechanistic models aim to provide a robust description of available data by specifying assumptions and integrating known factors about the system. Compartmental analysis is a key example of a mechanistic model in pharmacokinetics and...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Association of Geriatric Emergency Department Care With Hospitalization and Mortality in Older Adults.

Journal of the American Geriatrics Society·2026

Same author

Computer models predict differential dendritic vulnerability with ischemia and spreading depression.

bioRxiv : the preprint server for biology·2025

Same author

The Importance of Geriatric Emergency Department Assessments: Recognizing Patient Risks and Value of Data in Research-A Reply.

Academic emergency medicine : official journal of the Society for Academic Emergency Medicine·2025

Same author

Detection of emergency department patients at risk of dementia through artificial intelligence.

Alzheimer's & dementia : the journal of the Alzheimer's Association·2025

Same author

The PRO-AGE Tool and Its Association With Post Discharge Outcomes in Older Adults Admitted From the Emergency Department.

Journal of the American Geriatrics Society·2025

Same author

Alterations in Gut Microbiome-Host Relationships After Immune Perturbation in Patients With Multiple Sclerosis.

Neurology(R) neuroimmunology & neuroinflammation·2025

Same journal

Layered social competition coordinates reproductive hierarchy formation in ants.

bioRxiv : the preprint server for biology·2026

Same journal

Combination epigenetic-targeted therapy increases the immunogenicity of poorly immunogenic sarcomas.

bioRxiv : the preprint server for biology·2026

Same journal

Loss of LanC-like proteins delays post-injury regeneration of aging skeletal muscles.

bioRxiv : the preprint server for biology·2026

Same journal

Integrative Transfer Network: Deep Transfer Learning Across Populations and Prediction Targets.

bioRxiv : the preprint server for biology·2026

Same journal

Confidence-supported label-free metabolic imaging with FPhaS phase autofluorescence microscopy.

bioRxiv : the preprint server for biology·2026

Same journal

Sequence-encoded autoinhibition couples mRNA decapping activity to phase separation.

bioRxiv : the preprint server for biology·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 8, 2026

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved (Non-model) Organisms

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved (Non-model) Organisms

Published on: May 9, 2017

All Models are Wrong, Some are Annotated: Automating Metadata in Biomedical Repositories.

Inessa Cohen¹, Hongyi Yu², Robert A McDougal^1,2,3,4,5

¹Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06510, United States.

Biorxiv : the Preprint Server for Biology

|May 7, 2026

Summary

This summary is machine-generated.

Large language models (LLMs) can automatically generate metadata for scientific models from source code, improving discovery. These AI tools show promise for annotating large biomedical repositories more efficiently than traditional methods.

Keywords:

computational biology large language models machine learning metadata natural language processing

More Related Videos

A Protocol for Using Gene Set Enrichment Analysis to Identify the Appropriate Animal Model for Translational Research

A Protocol for Using Gene Set Enrichment Analysis to Identify the Appropriate Animal Model for Translational Research

Published on: August 16, 2017

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Related Experiment Videos

Last Updated: May 8, 2026

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved (Non-model) Organisms

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved (Non-model) Organisms

Published on: May 9, 2017

A Protocol for Using Gene Set Enrichment Analysis to Identify the Appropriate Animal Model for Translational Research

A Protocol for Using Gene Set Enrichment Analysis to Identify the Appropriate Animal Model for Translational Research

Published on: August 16, 2017

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Area of Science:

Computational Neuroscience
Bioinformatics
Artificial Intelligence

Background:

High-quality metadata is crucial for scientific discovery but is often sparse in large data repositories.
Manually annotating complex biological details, such as ion channel and receptor subtypes, from source code is time-consuming and challenging.

Purpose of the Study:

To evaluate the effectiveness of large language models (LLMs) in automatically inferring ion channel and receptor subtype metadata directly from source code.
To compare LLM performance against a traditional feature-engineered baseline model.

Main Methods:

Extracted 5,133 model files from the ModelDB repository.
Manually annotated 1,100 models, with 253 reserved for testing.
Evaluated LLM approaches (GPT-5.2, GPT-mini) using zero-shot and heuristic-augmented prompting.
Compared LLM performance (accuracy, precision, recall, F1 score) against an XGBoost baseline model.

Main Results:

LLMs significantly outperformed the XGBoost baseline model in metadata annotation.
Heuristically augmented GPT-mini achieved 96.0% accuracy at the type level and 88.1% accuracy at the subtype level.
LLM outputs were consistent and errors were generally limited to related biological families.

Conclusions:

LLMs show strong potential for scalable metadata generation directly from scientific source code, requiring minimal tuning.
While effective, LLM performance can vary across subtypes, necessitating domain-specific validation and careful evaluation.
The approach is promising for enhancing biomedical repositories and may generalize to other scientific code repositories.