Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Improving Translational Accuracy

Improving Translational Accuracy

Distribution Reliability and Automation

Distribution Reliability and Automation

Distribution reliability in electrical power systems is critical for ensuring an uninterrupted power supply to consumers at minimal cost. According to IEEE Standard Terms, reliability is the probability that a device will function without failure over a specified time period or amount of usage. For electric power distribution, this translates to maintaining continuous power supply and addressing customer concerns over power outages. Several indices, as defined by IEEE Standard 1366-2012, are...

Mass Analyzers: Overview

Mass Analyzers: Overview

The mass analyzer is a crucial component of the mass spectrometer. In the ionization chamber, the vaporized sample is bombarded with a high-energy electron beam to generate a radical cation and further fragment into neutral molecules, radicals, and cations. A series of negatively charged accelerator plates accelerate the cations into the mass analyzer. The mass analyzer separates ions according to their mass-to-charge (m/z) ratios and then directs them to the detector. The common types of mass...

Statistical Analysis: Overview

Statistical Analysis: Overview

When we take repeated measurements on the same or replicated samples, we will observe inconsistencies in the magnitude. These inconsistencies are called errors. To categorize and characterize these results and their errors, the researcher can use statistical analysis to determine the quality of the measurements and/or suitability of the methods.
One of the most commonly used statistical quantifiers is the mean, which is the ratio between the sum of the numerical values of all results and the...

Extraction: Advanced Methods

Extraction: Advanced Methods

Metal ions can be separated from one another by complexation with organic ligands–the chelating agent– to form uncharged chelates. Here, the chelating agent must contain hydrophobic groups and behave as a weak acid, losing a proton to bind with the metal. Since most organic ligands used in this process are insoluble or undergo oxidation in the aqueous phase, the chelating agent is initially added to the organic phase and extracted into the aqueous phase. The metal-ligand complex is...

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic models play a crucial role in algorithms for numerical problem-solving, particularly in nonlinear mixed effects modeling (NMEM). These models aim to minimize specific objective functions by evaluating various parameter estimates, leading to the development of systematic algorithms. In some cases, linearization techniques approximate the model using linear equations.
In individual population analyses, different algorithms are employed, such as Cauchy's method, which uses a...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Spatial and Phenotypic Heterogeneity of ILC Subsets in Mouse Lung Under Type 2 Inflammatory Conditions.

European journal of immunology·2026

Same author

Altered cholesterol immunometabolism activates the macrophage NLRP3-inflammasome in lung fibrosis.

American journal of respiratory cell and molecular biology·2026

Same author

Orchestrating Spatial Transcriptomics Analysis with Bioconductor.

bioRxiv : the preprint server for biology·2025

Same author

Astrocytic-OTUD7B ameliorates murine experimental autoimmune encephalomyelitis by stabilizing glial fibrillary acidic protein and preventing inflammation.

Nature communications·2025

Same author

Flexynesis: A deep learning toolkit for bulk multi-omics data integration for precision oncology and beyond.

Nature communications·2025

Same author

Novel non-coding FOXP3 transcript isoform associated to potential transcriptional interference in human regulatory T cells.

RNA biology·2025

Same journal

Thymidylate synthase inhibitory drugs induce p53-dependent pathways differently.

PloS one·2026

Same journal

Top-down and bottom-up attention for joint pattern classification and reconstruction.

PloS one·2026

Same journal

Short- and long-term scaling behavior of blood pressure and pulse arrival time during sleep in healthy controls and patients with obstructive sleep apnea.

PloS one·2026

Same journal

Double DQN-based secrecy energy efficiency and fairness performance in IRS-assisted NOMA systems with friendly jamming.

PloS one·2026

Same journal

10 recommendations for strengthening citizen science for improved societal and ecological outcomes: A co-produced analysis of challenges and opportunities in the 21st century.

PloS one·2026

Same journal

Paying in public: Peer effects, impression management, and willingness to pay on digital payment platforms.

PloS one·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 26, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Leveraging large language models for data analysis automation.

Jacqueline A Jansen^1,2,3, Artür Manukyan^1,2, Nour Al Khoury^1,2,4

¹Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany.

|February 21, 2025

Summary

This summary is machine-generated.

Large Language Models (LLMs) can generate data analysis pipelines for genomics, but accuracy for complex tasks needs improvement. The mergen R package enhances code generation using specialized prompts and self-correction, improving executable code rates.

More Related Videos

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Published on: December 15, 2023

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

Related Experiment Videos

Last Updated: May 26, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Published on: December 15, 2023

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

Area of Science:

Bioinformatics
Computational Biology
Genomics

Background:

Data analysis in biology is crucial but limited by expert shortages.
Large Language Models (LLMs) show promise for automating code generation.
The accuracy of LLMs for specialized omics data analysis remains an open question.

Purpose of the Study:

To develop and evaluate an R package, mergen, that uses LLMs for generating and executing data analysis pipelines.
To enable researchers to perform complex data analysis through natural language descriptions.
To investigate the effectiveness of prompt engineering and self-correction mechanisms for improving LLM-generated code accuracy.

Main Methods:

Developed the mergen R package integrating LLMs for data analysis code generation and execution.
Employed specialized prompt engineering and error feedback mechanisms to enhance code quality.
Evaluated performance on various genomics data analysis tasks with different complexity levels.
Utilized a self-correction strategy to iteratively improve code generation.

Main Results:

LLMs can generate code for some data analysis tasks, but challenges persist for complex analyses.
The self-correction mechanism significantly improved executable code generation rates across task complexities (22.5% to 52.5%).
Statistical analysis confirmed significant differences in performance across prompting strategies.

Conclusions:

LLMs show potential for automating bioinformatics data analysis, but require careful implementation.
The mergen package and its self-correction feature offer a practical approach to improve LLM-driven code generation.
Further research is needed to fully address LLM limitations in complex, domain-specific data analysis.