Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Genome Annotation and Assembly03:36

Genome Annotation and Assembly

The genome refers to all of the genetic material in an organism. It can range from a few million base pairs in microbial cells to several billion base pairs in many eukaryotic organisms. Genome assembly refers to the process of taking the DNA sequencing data and putting it all back together in a correct order to create a close representation of the original genome. This is followed by the identification of functional elements on the newly assembled genome, a process called genome annotation.
Mechanistic Models: Compartment Models in Individual and Population Analysis01:23

Mechanistic Models: Compartment Models in Individual and Population Analysis

Mechanistic models are utilized in individual analysis using single-source data, but imperfections arise due to data collection errors, preventing perfect prediction of observed data. The mathematical equation involves known values (Xi), observed concentrations (Ci), measurement errors (εi), model parameters (ϕj), and the related function (ƒi) for i number of values. Different least-squares metrics quantify differences between predicted and observed values. The ordinary least squares (OLS)...
Nonsense-mediated mRNA Decay02:27

Nonsense-mediated mRNA Decay

The Upf proteins that carry out nonsense-mediated decay (NMD) are found in all eukaryotic organisms, including humans. Each protein has an individual role, but they need to work in collaboration. Upf1 is an ATP-dependent RNA helicase that unwinds the RNA helix. Because Upf1 can unwind any RNA, Upf2 and Upf3 are required to help Upf1 discriminate between nonsense and normal mRNAs.
Usually, Upf3 binds to an Exon Junction Complex (EJC) at mRNA splice sites. If a ribosome fully translates the mRNA,...
Mismatch Repair01:36

Mismatch Repair

Overview
Molecular Models02:00

Molecular Models

Physical models representing molecular architectures of chemical compounds play essential roles in understanding chemistry. The use of molecular models makes it easier to visualize the structures and shapes of atoms and molecules.
Mechanistic Models: Overview of Compartment Models01:21

Mechanistic Models: Overview of Compartment Models

Mechanistic models, a category encompassing both physiological and compartmental modeling, differ from empirical models' approaches to incorporating known factors about the systems being modeled. Empirical models describe data with minimal assumptions, while mechanistic models aim to provide a robust description of available data by specifying assumptions and integrating known factors about the system. Compartmental analysis is a key example of a mechanistic model in pharmacokinetics and...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Association of Geriatric Emergency Department Care With Hospitalization and Mortality in Older Adults.

Journal of the American Geriatrics Society·2026
Same author

Computer models predict differential dendritic vulnerability with ischemia and spreading depression.

bioRxiv : the preprint server for biology·2025
Same author

The Importance of Geriatric Emergency Department Assessments: Recognizing Patient Risks and Value of Data in Research-A Reply.

Academic emergency medicine : official journal of the Society for Academic Emergency Medicine·2025
Same author

Detection of emergency department patients at risk of dementia through artificial intelligence.

Alzheimer's & dementia : the journal of the Alzheimer's Association·2025
Same author

The PRO-AGE Tool and Its Association With Post Discharge Outcomes in Older Adults Admitted From the Emergency Department.

Journal of the American Geriatrics Society·2025
Same author

Alterations in Gut Microbiome-Host Relationships After Immune Perturbation in Patients With Multiple Sclerosis.

Neurology(R) neuroimmunology & neuroinflammation·2025
Same journal

Layered social competition coordinates reproductive hierarchy formation in ants.

bioRxiv : the preprint server for biology·2026
Same journal

Combination epigenetic-targeted therapy increases the immunogenicity of poorly immunogenic sarcomas.

bioRxiv : the preprint server for biology·2026
Same journal

Loss of LanC-like proteins delays post-injury regeneration of aging skeletal muscles.

bioRxiv : the preprint server for biology·2026
Same journal

Integrative Transfer Network: Deep Transfer Learning Across Populations and Prediction Targets.

bioRxiv : the preprint server for biology·2026
Same journal

Confidence-supported label-free metabolic imaging with FPhaS phase autofluorescence microscopy.

bioRxiv : the preprint server for biology·2026
Same journal

Sequence-encoded autoinhibition couples mRNA decapping activity to phase separation.

bioRxiv : the preprint server for biology·2026
See all related articles

Related Experiment Video

Updated: May 8, 2026

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved (Non-model) Organisms
10:41

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved (Non-model) Organisms

Published on: May 9, 2017

All Models are Wrong, Some are Annotated: Automating Metadata in Biomedical Repositories.

Inessa Cohen1, Hongyi Yu2, Robert A McDougal1,2,3,4,5

  • 1Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06510, United States.

Biorxiv : the Preprint Server for Biology
|May 7, 2026
PubMed
Summary
This summary is machine-generated.

Large language models (LLMs) can automatically generate metadata for scientific models from source code, improving discovery. These AI tools show promise for annotating large biomedical repositories more efficiently than traditional methods.

Keywords:
computational biologylarge language modelsmachine learningmetadatanatural language processing

More Related Videos

A Protocol for Using Gene Set Enrichment Analysis to Identify the Appropriate Animal Model for Translational Research
09:35

A Protocol for Using Gene Set Enrichment Analysis to Identify the Appropriate Animal Model for Translational Research

Published on: August 16, 2017

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Related Experiment Videos

Last Updated: May 8, 2026

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved (Non-model) Organisms
10:41

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved (Non-model) Organisms

Published on: May 9, 2017

A Protocol for Using Gene Set Enrichment Analysis to Identify the Appropriate Animal Model for Translational Research
09:35

A Protocol for Using Gene Set Enrichment Analysis to Identify the Appropriate Animal Model for Translational Research

Published on: August 16, 2017

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Area of Science:

  • Computational Neuroscience
  • Bioinformatics
  • Artificial Intelligence

Background:

  • High-quality metadata is crucial for scientific discovery but is often sparse in large data repositories.
  • Manually annotating complex biological details, such as ion channel and receptor subtypes, from source code is time-consuming and challenging.

Purpose of the Study:

  • To evaluate the effectiveness of large language models (LLMs) in automatically inferring ion channel and receptor subtype metadata directly from source code.
  • To compare LLM performance against a traditional feature-engineered baseline model.

Main Methods:

  • Extracted 5,133 model files from the ModelDB repository.
  • Manually annotated 1,100 models, with 253 reserved for testing.
  • Evaluated LLM approaches (GPT-5.2, GPT-mini) using zero-shot and heuristic-augmented prompting.
  • Compared LLM performance (accuracy, precision, recall, F1 score) against an XGBoost baseline model.

Main Results:

  • LLMs significantly outperformed the XGBoost baseline model in metadata annotation.
  • Heuristically augmented GPT-mini achieved 96.0% accuracy at the type level and 88.1% accuracy at the subtype level.
  • LLM outputs were consistent and errors were generally limited to related biological families.

Conclusions:

  • LLMs show strong potential for scalable metadata generation directly from scientific source code, requiring minimal tuning.
  • While effective, LLM performance can vary across subtypes, necessitating domain-specific validation and careful evaluation.
  • The approach is promising for enhancing biomedical repositories and may generalize to other scientific code repositories.