Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Genome Size and the Evolution of New Genes03:21

Genome Size and the Evolution of New Genes

2.4K
2.4K
Improving Translational Accuracy02:07

Improving Translational Accuracy

2.5K
2.5K
Genetic Lingo01:11

Genetic Lingo

99.0K
Overview
99.0K
lncRNA - Long Non-coding RNAs02:39

lncRNA - Long Non-coding RNAs

2.8K
2.8K
Genome-wide Association Studies-GWAS01:11

Genome-wide Association Studies-GWAS

12.3K
Genome-wide association studies or GWAS are used to identify whether common SNPs are associated with certain diseases. Suppose specific SNPs are more frequently observed in individuals with a particular disease than those without the disease. In that case, those SNPs are said to be associated with the disease. Chi-square analysis is performed to check the probability of the allele likely to be associated with the disease.
GWAS does not require the identification of the target gene involved in...
12.3K
Genome Annotation and Assembly03:36

Genome Annotation and Assembly

18.8K
The genome refers to all of the genetic material in an organism. It can range from a few million base pairs in microbial cells to several billion base pairs in many eukaryotic organisms. Genome assembly refers to the process of taking the DNA sequencing data and putting it all back together in a correct order to create a close representation of the original genome. This is followed by the identification of functional elements on the newly assembled genome, a process called genome annotation.
18.8K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

A HORSESHOE MIXTURE MODEL FOR BAYESIAN SCREENING WITH AN APPLICATION TO LIGHT SHEET FLUORESCENCE MICROSCOPY IN BRAIN IMAGING.

The annals of applied statistics·2026
Same author

Microglia-mediated protection against Alzheimer's disease pathology and detrimental effects in white matter revealed by Ptpn6 deletion.

Neuron·2026
Same author

Drug Development.

Alzheimer's & dementia : the journal of the Alzheimer's Association·2025
Same author

A Robust Mixed-Effects Bandit Algorithm for Assessing Mobile Health Interventions.

Advances in neural information processing systems·2025
Same author

Amiodarone use and prolonged mechanical ventilation after cardiac surgery: a single-center analysis.

BMC cardiovascular disorders·2025
Same author

Therapeutic drug monitoring vs. pharmacogenetic testing in the context of elevated olanzapine concentrations and prior clozapine intolerability: a case study.

BMC psychiatry·2024
Same journal

MOREshiny: a user-friendly application for the inference of phenotype-specific multi-omic regulatory networks.

Bioinformatics advances·2026
Same journal

spammR: an R package designed for analysis and integration of spatial multi-omic measurements.

Bioinformatics advances·2026
Same journal

Interpretable prediction and generation of ASC-speck aptamers using multiscale deep biological learning models.

Bioinformatics advances·2026
Same journal

vClassifier: a toolkit for high-resolution phylogenetic classification of prokaryotic viruses.

Bioinformatics advances·2026
Same journal

GWAIS-Web: a free and secure web service for ultra-fast and large-scale genome-wide association interaction studies.

Bioinformatics advances·2026
Same journal

Folding the unfoldable 2: using AlphaFold and ESMFold to explore spurious proteins.

Bioinformatics advances·2026
See all related articles

Related Experiment Video

Updated: May 23, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

474

Enhancing gene set overrepresentation analysis with large language models.

Jiqing Zhu1, Rebecca Y Wang1, Xiaoting Wang1

  • 1Alector, Inc, South San Francisco, CA 94080, United States.

Bioinformatics Advances
|May 22, 2025
PubMed
Summary
This summary is machine-generated.

This study introduces llm2geneset, a novel framework using large language models (LLMs) to dynamically create gene set databases for analyzing high-throughput biological data. This approach offers flexible, context-aware interpretation, matching human-curated gene set quality.

More Related Videos

A Protocol for Using Gene Set Enrichment Analysis to Identify the Appropriate Animal Model for Translational Research
09:35

A Protocol for Using Gene Set Enrichment Analysis to Identify the Appropriate Animal Model for Translational Research

Published on: August 16, 2017

17.7K
Large-Scale Screens of Metagenomic Libraries
16:05

Large-Scale Screens of Metagenomic Libraries

Published on: May 28, 2007

8.7K

Related Experiment Videos

Last Updated: May 23, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

474
A Protocol for Using Gene Set Enrichment Analysis to Identify the Appropriate Animal Model for Translational Research
09:35

A Protocol for Using Gene Set Enrichment Analysis to Identify the Appropriate Animal Model for Translational Research

Published on: August 16, 2017

17.7K
Large-Scale Screens of Metagenomic Libraries
16:05

Large-Scale Screens of Metagenomic Libraries

Published on: May 28, 2007

8.7K

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • Traditional gene set overrepresentation analysis (ORA) relies on static, human-curated databases, limiting flexibility in interpreting high-throughput transcriptomics and proteomics data.
  • Existing methods struggle to adapt to specific biological contexts or dynamically generated gene lists.

Purpose of the Study:

  • To develop a flexible framework, llm2geneset, that utilizes large language models (LLMs) to dynamically generate gene set databases.
  • To enable context-aware functional interpretation of biological data by integrating LLM-generated gene sets with analysis methods like ORA.
  • To benchmark the performance of LLM-generated gene sets against human-curated databases.

Main Methods:

  • Development of the llm2geneset framework, leveraging LLMs to create gene sets based on input genes and natural language biological context.
  • Integration of dynamically generated gene sets with established analysis methods, such as ORA, for functional annotation.
  • Comparative analysis of LLM-generated gene sets against human-curated databases using benchmarking studies.
  • Application of the framework to RNA-sequencing data from iPSC-derived microglia treated with a TREM2 agonist.

Main Results:

  • LLM-generated gene sets demonstrated comparable quality to human-curated gene sets.
  • The llm2geneset framework successfully identified biological processes within input gene sets, outperforming traditional ORA and direct LLM prompting.
  • The framework facilitated flexible, context-aware gene set generation and improved the interpretation of high-throughput biological data, as shown in the TREM2 agonist study.

Conclusions:

  • llm2geneset provides a powerful and flexible alternative to traditional gene set enrichment analysis, utilizing LLMs for dynamic database generation.
  • The framework enhances the interpretation of complex biological datasets by offering context-specific functional annotations.
  • llm2geneset represents a significant advancement in bioinformatics tools for biological data analysis and discovery.