Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Improving Translational Accuracy02:07

Improving Translational Accuracy

2.5K
2.5K
Distribution Reliability and Automation01:25

Distribution Reliability and Automation

103
Distribution reliability in electrical power systems is critical for ensuring an uninterrupted power supply to consumers at minimal cost. According to IEEE Standard Terms, reliability is the probability that a device will function without failure over a specified time period or amount of usage. For electric power distribution, this translates to maintaining continuous power supply and addressing customer concerns over power outages. Several indices, as defined by IEEE Standard 1366-2012, are...
103
Mass Analyzers: Overview01:13

Mass Analyzers: Overview

564
The mass analyzer is a crucial component of the mass spectrometer. In the ionization chamber, the vaporized sample is bombarded with a high-energy electron beam to generate a radical cation and further fragment into neutral molecules, radicals, and cations. A series of negatively charged accelerator plates accelerate the cations into the mass analyzer. The mass analyzer separates ions according to their mass-to-charge (m/z) ratios and then directs them to the detector. The common types of mass...
564
Statistical Analysis: Overview01:11

Statistical Analysis: Overview

5.9K
When we take repeated measurements on the same or replicated samples, we will observe inconsistencies in the magnitude. These inconsistencies are called errors. To categorize and characterize these results and their errors, the researcher can use statistical analysis to determine the quality of the measurements and/or suitability of the methods.
One of the most commonly used statistical quantifiers is the mean, which is the ratio between the sum of the numerical values of all results and the...
5.9K
Extraction: Advanced Methods00:56

Extraction: Advanced Methods

401
Metal ions can be separated from one another by complexation with organic ligands–the chelating agent– to form uncharged chelates. Here, the chelating agent must contain hydrophobic groups and behave as a weak acid, losing a proton to bind with the metal. Since most organic ligands used in this process are insoluble or undergo oxidation in the aqueous phase, the chelating agent is initially added to the organic phase and extracted into the aqueous phase. The metal-ligand complex is...
401
Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving01:29

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

38
Mechanistic models play a crucial role in algorithms for numerical problem-solving, particularly in nonlinear mixed effects modeling (NMEM). These models aim to minimize specific objective functions by evaluating various parameter estimates, leading to the development of systematic algorithms. In some cases, linearization techniques approximate the model using linear equations.
In individual population analyses, different algorithms are employed, such as Cauchy's method, which uses a...
38

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Spatial and Phenotypic Heterogeneity of ILC Subsets in Mouse Lung Under Type 2 Inflammatory Conditions.

European journal of immunology·2026
Same author

Altered cholesterol immunometabolism activates the macrophage NLRP3-inflammasome in lung fibrosis.

American journal of respiratory cell and molecular biology·2026
Same author

Orchestrating Spatial Transcriptomics Analysis with Bioconductor.

bioRxiv : the preprint server for biology·2025
Same author

Astrocytic-OTUD7B ameliorates murine experimental autoimmune encephalomyelitis by stabilizing glial fibrillary acidic protein and preventing inflammation.

Nature communications·2025
Same author

Flexynesis: A deep learning toolkit for bulk multi-omics data integration for precision oncology and beyond.

Nature communications·2025
Same author

Novel non-coding FOXP3 transcript isoform associated to potential transcriptional interference in human regulatory T cells.

RNA biology·2025
Same journal

Thymidylate synthase inhibitory drugs induce p53-dependent pathways differently.

PloS one·2026
Same journal

Top-down and bottom-up attention for joint pattern classification and reconstruction.

PloS one·2026
Same journal

Short- and long-term scaling behavior of blood pressure and pulse arrival time during sleep in healthy controls and patients with obstructive sleep apnea.

PloS one·2026
Same journal

Double DQN-based secrecy energy efficiency and fairness performance in IRS-assisted NOMA systems with friendly jamming.

PloS one·2026
Same journal

10 recommendations for strengthening citizen science for improved societal and ecological outcomes: A co-produced analysis of challenges and opportunities in the 21st century.

PloS one·2026
Same journal

Paying in public: Peer effects, impression management, and willingness to pay on digital payment platforms.

PloS one·2026
See all related articles

Related Experiment Video

Updated: May 26, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

476

Leveraging large language models for data analysis automation.

Jacqueline A Jansen1,2,3, Artür Manukyan1,2, Nour Al Khoury1,2,4

  • 1Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany.

Plos One
|February 21, 2025
PubMed
Summary
This summary is machine-generated.

Large Language Models (LLMs) can generate data analysis pipelines for genomics, but accuracy for complex tasks needs improvement. The mergen R package enhances code generation using specialized prompts and self-correction, improving executable code rates.

More Related Videos

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches
09:47

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Published on: December 15, 2023

940
Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
09:09

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

385

Related Experiment Videos

Last Updated: May 26, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

476
Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches
09:47

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Published on: December 15, 2023

940
Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
09:09

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

385

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • Data analysis in biology is crucial but limited by expert shortages.
  • Large Language Models (LLMs) show promise for automating code generation.
  • The accuracy of LLMs for specialized omics data analysis remains an open question.

Purpose of the Study:

  • To develop and evaluate an R package, mergen, that uses LLMs for generating and executing data analysis pipelines.
  • To enable researchers to perform complex data analysis through natural language descriptions.
  • To investigate the effectiveness of prompt engineering and self-correction mechanisms for improving LLM-generated code accuracy.

Main Methods:

  • Developed the mergen R package integrating LLMs for data analysis code generation and execution.
  • Employed specialized prompt engineering and error feedback mechanisms to enhance code quality.
  • Evaluated performance on various genomics data analysis tasks with different complexity levels.
  • Utilized a self-correction strategy to iteratively improve code generation.

Main Results:

  • LLMs can generate code for some data analysis tasks, but challenges persist for complex analyses.
  • The self-correction mechanism significantly improved executable code generation rates across task complexities (22.5% to 52.5%).
  • Statistical analysis confirmed significant differences in performance across prompting strategies.

Conclusions:

  • LLMs show potential for automating bioinformatics data analysis, but require careful implementation.
  • The mergen package and its self-correction feature offer a practical approach to improve LLM-driven code generation.
  • Further research is needed to fully address LLM limitations in complex, domain-specific data analysis.