Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Video

Updated: Jun 16, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Scaling sensor metadata extraction for exposure health using LLMs.

Fatemeh Shah-Mohammadi¹, Sunho Im², Julio C Facelli^1,3

¹Department of Biomedical Informatics, The University of Utah, Salt Lake City, UT 84108, United State.

|March 27, 2026

Summary

This summary is machine-generated.

Related Concept Videos

Calibration Curves: Linear Least Squares

Calibration Curves: Linear Least Squares

A calibration curve is a plot of the instrument's response against a series of known concentrations of a substance. This curve is used to set the instrument response levels, using the substance and its concentrations as standards. Alternatively, or additionally, an equation is fitted to the calibration curve plot and subsequently used to calculate the unknown concentrations of other samples reliably.
For data that follow a straight line, the standard method for fitting is the linear...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Understanding uncertainty in large language model predictions of early death in critically ill patients: a conformal prediction approach.

JAMIA open·2026

Same author

Prescribing Trajectories in Type 2 Diabetes in the United States, 2019-2024.

Diabetes, obesity & metabolism·2026

Same author

Opportunities and Challenges in Using National EHR Networks for AI in Learning Health Systems.

Learning health systems·2026

Same author

Reliable Uncertainty Under Class Imbalance and Distribution Shift: Class-Conditional Conformal Prediction of Multiple Sclerosis.

medRxiv : the preprint server for health sciences·2026

Same author

Enhancing prediction of inpatient deterioration by combining clinical and nurse concern features, with or without temporal clustering.

JAMIA open·2026

Same author

Evaluating the indirect interaction between glucagon-like peptide-1 receptor agonists and warfarin using real-world data.

Journal of thrombosis and thrombolysis·2026

Same journal

Determinants of body mass index during early life: findings from an exposome-wide association study with follow-up replication and Mendelian randomization analyses.

Exposome·2026

Same journal

The Exposome journal on hiatus: not even a flesh wound.

Exposome·2026

Same journal

The spatial and contextual exposome and subtypes of hypertensive disorders of pregnancy: a double machine learning-based analysis.

Exposome·2026

Same journal

Inviting ecosystems into the exposome framework.

Exposome·2026

Same journal

Transient exposure to bisphenol F in early life affects the metabolic health of adults.

Exposome·2026

Same journal

The environmental chemical exposome and health insurance: Examining associations and effect modification of epigenetic aging in a representative sample of United States adults.

Exposome·2026

See all related articles

We developed a large language model (LLM) pipeline to automate sensor metadata extraction from research papers. This approach significantly improves efficiency and accuracy for exposure health research.

Area of Science:

Environmental health
Data science
Bioinformatics

Background:

Sensor technologies are rapidly evolving, creating diverse data formats.
Inconsistent sensor metadata reporting hinders exposome and exposure health research.
Manual extraction of sensor metadata from literature is unscalable.

Purpose of the Study:

To develop and evaluate a large language model (LLM)-based pipeline for automating sensor metadata extraction.
To address the bottleneck of manual metadata extraction from unstructured sources.
To harmonize sensor metadata into structured formats for exposure health research.

Main Methods:

Utilized GPT-4 in a zero-shot setting to construct the LLM pipeline.
Developed a pipeline to parse full-text PDFs for sensor metadata extraction.

Keywords:

GPT exposure health information extraction metadata sensor

Related Experiment Videos

Last Updated: Jun 16, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Implemented harmonization of extracted metadata into structured formats.

Main Results:

The automated pipeline demonstrated substantial efficiency gains over manual review.
Achieved high performance metrics: 88.0% accuracy, 88.0% precision, 93.0% recall, and 90.0% F1-score.
Successfully extracted and harmonized sensor metadata from exposure health literature.

Conclusions:

LLM-driven pipelines are feasible and scalable for automating sensor metadata extraction in exposure health.
This automation reduces manual burden and enhances metadata completeness and consistency.
Findings support integrating LLM pipelines into exposure health informatics platforms.