Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Microbial Classification System

Microbial Classification System

Classification is the process of organizing organisms into hierarchically inclusive groups based on their phenotypic similarities or evolutionary relationships. A species comprises one or more strains, and closely related species are grouped into genera. Genera are further classified into families, families into orders, orders into classes, and so forth, up to the domain level, which is the broadest taxonomic rank derived from a combination of phenotypic and genotypic data.The nomenclature of...

Language and Cognition

Language and Cognition

Language serves as a bridge between ideas and communication, influencing how individuals perceive and interact with the world. Psychologists have long debated whether language shapes thought or vice versa. This discussion gained grip with Edward Sapir and Benjamin Lee Whorf in the 1940s, who proposed that language determines thought, a concept known as linguistic determinism. They suggested that the vocabulary and structure of a language influence how its speakers think and perceive reality.

Metacognition

Metacognition

Metacognition is a conscious process where individuals are aware of their cognitive and executive processes, such as planning before solving a problem or self-monitoring during reading. For instance, a writer may need help with composing a piece. The situation involves a writer who is working on a piece of writing, but while doing so, they realize that something is missing. They notice that their characters lack depth or details. This realization occurs because the writer is reflecting on their...

Stereotype Content Model

Stereotype Content Model

The Stereotype Content Model (SCM) was first proposed by Susan Fiske and her colleagues (Fiske, Cuddy, Glick & Xu, 2002; see also Fiske, 2012 and Fiske, 2017). The SCM specifies that when someone encounters a new group, they will stereotype them based on two metrics: warmth—or that group’s perceived intent, and how likely they are to provide help or inflict harm—and competence—or their ability to carry out that objective. Depending on the warmth-competence...

Multi-species Conserved Sequences

Multi-species Conserved Sequences

Next-generation sequencing technologies have created large genomic databases of a variety of animals and plants. Ever since the human genome project was completed, scientists studied the genome of primates, mammals, and other phylogenetically distant living beings. Such large-scale studies have provided new insights into the evolutionary relationship between organisms.
Although the genome of each species varies greatly from each other, a few sequences are highly conserved. Such conserved...

Methods of Documentation VI: Case Management Model

Methods of Documentation VI: Case Management Model

The case management model is a multidisciplinary approach that involves healthcare professionals from diverse disciplines, such as physicians, nurses, therapists, social workers, and pharmacists, working collaboratively to address the various needs of patients. Each healthcare professional brings unique expertise and perspectives, contributing to a more comprehensive understanding of the patient's condition and tailoring treatment plans accordingly.
For example, a patient with a chronic...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Knowledge Engineering for Open Science: Building and Deploying Knowledge Bases for Metadata Standards.

AI magazine·2026

Same author

The HuBMAP Framework for Advancing Data FAIRness.

bioRxiv : the preprint server for biology·2026

Same author

First-line treatment patterns and real-world outcomes in patients with advanced KRAS-mutated non-small cell lung cancer with high unmet need.

Lung cancer (Amsterdam, Netherlands)·2026

Same author

Characterization and clinical management of adverse events following treatment with repotrectinib: a TRIDENT-1 analysis.

The oncologist·2026

Same author

Diagnostic Delays in Thoracic Cancer Care: A Data-Linkage, Cohort Study between Primary Care, Hospital, and Registry Data.

Health data science·2026

Same author

VO: The Vaccine Ontology.

Scientific data·2026

Same journal

Sensitivity Analyses of a Scoring System for a Contraception Decision Aid.

AMIA ... Annual Symposium proceedings. AMIA Symposium·2026

Same journal

Improving electronic health record processing of large language models via retrieval-augmented generation: A case study on dietary supplements.

AMIA ... Annual Symposium proceedings. AMIA Symposium·2026

Same journal

Developing a User-Centered Mobile Application Prototype: Bridging Lower-Limb Fracture Care from Skilled Nursing Facility and Back to the Community.

AMIA ... Annual Symposium proceedings. AMIA Symposium·2026

Same journal

KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs.

AMIA ... Annual Symposium proceedings. AMIA Symposium·2026

Same journal

Automating Adjudication of Cardiovascular Events Using Large Language Models.

AMIA ... Annual Symposium proceedings. AMIA Symposium·2026

Same journal

Predictive Factors and State-Level Barriers to Postpartum Birth Control Usage in the United States: Insights from PRAMS Phase 8.

AMIA ... Annual Symposium proceedings. AMIA Symposium·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 18, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Structured Knowledge Base Enhances Effective Use of Large Language Models for Metadata Curation.

Sowmya S Sundaram¹, Benjamin Solomon^1,2, Avani Khatri²

¹Center for Biomedical Informatics Research, School of Medicine, Stanford University, Stanford, California, USA.

AMIA ... Annual Symposium Proceedings. AMIA Symposium

|May 26, 2025

Summary

This summary is machine-generated.

Large language models (LLMs) show promise for improving metadata standards in datasets. Integrating LLMs with structured knowledge bases significantly enhances metadata accuracy for research data.

More Related Videos

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts

A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts

Published on: September 20, 2018

Related Experiment Videos

Last Updated: Jan 18, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts

A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts

Published on: September 20, 2018

Area of Science:

Bioinformatics
Computational Biology
Data Science

Background:

Metadata are essential for dataset findability, accessibility, interoperability, and reusability.
Ensuring adherence to metadata standards is critical for scientific data management.

Purpose of the Study:

To investigate the efficacy of large language models (LLMs), specifically GPT-4, in improving metadata standard adherence.
To assess the impact of domain information integration on LLM performance in metadata curation.

Main Methods:

Experiments were conducted on 200 human sample records from the NCBI BioSample repository.
GPT-4's ability to suggest metadata edits was evaluated through peer review.
Metadata adherence accuracy was calculated for field name-field value pairs.

Main Results:

Unaided GPT-4 showed a marginal improvement in metadata adherence from 79% to 80%.
GPT-4, when provided with domain information (CEDAR metadata templates), achieved a statistically significant improvement in adherence to 97% (p<0.01).

Conclusions:

LLMs hold potential for automated metadata curation when combined with structured knowledge.
LLMs may require domain-specific context to effectively improve metadata quality for scientific datasets.