Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

DNA Microarrays02:34

DNA Microarrays

17.1K
Microarrays are high-throughput and relatively inexpensive assays that can be automated to analyze large quantities of data at a time. They are used in genome-wide studies to compare gene or protein expression under two varied conditions, such as healthy and diseased states. Microarrays consist of glass or silica slides on which probe molecules are covalently attached through surface functionalization. Most commonly, the slides are prepared through the chemisorption of silanes to silica...
17.1K
Reporter Genes02:11

Reporter Genes

11.1K
Reporter genes are a type of protein-coding gene that are often tagged to a gene of interest. Once inside a target cell, reporter genes usually produce visually identifiable characteristics like fluorescence and luminescence when expressed along with the gene of interest. Thus, reporter genes “report” the presence or absence of genes of interest in an organism, determine the gene expression pattern, or track the physical location of a DNA segment or protein in the cell.
11.1K
Cell Specific Gene Expression01:58

Cell Specific Gene Expression

4.5K
4.5K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Evaluation of analysis modes for RNA coexpression in single-cell and bulk tissue.

bioRxiv : the preprint server for biology·2026
Same author

Persistent hindrances to data re-use in single-cell genomics.

Scientific data·2026
Same author

Application of large language models to the annotation of cell lines and mouse strains in genomics data.

bioRxiv : the preprint server for biology·2026
Same author

Using semantic search to find publicly available gene-expression datasets.

Bioinformatics (Oxford, England)·2026
Same author

Translating short-form Python exercises to other programming languages using diverse prompting strategies.

GigaScience·2025
Same author

Functional, Pharmacogenomic, and Immune Landscapes of Long Non-Coding RNAs in Cancer.

Advanced science (Weinheim, Baden-Wurttemberg, Germany)·2025
Same journal

Layered social competition coordinates reproductive hierarchy formation in ants.

bioRxiv : the preprint server for biology·2026
Same journal

Combination epigenetic-targeted therapy increases the immunogenicity of poorly immunogenic sarcomas.

bioRxiv : the preprint server for biology·2026
Same journal

Loss of LanC-like proteins delays post-injury regeneration of aging skeletal muscles.

bioRxiv : the preprint server for biology·2026
Same journal

Integrative Transfer Network: Deep Transfer Learning Across Populations and Prediction Targets.

bioRxiv : the preprint server for biology·2026
Same journal

Confidence-supported label-free metabolic imaging with FPhaS phase autofluorescence microscopy.

bioRxiv : the preprint server for biology·2026
Same journal

Sequence-encoded autoinhibition couples mRNA decapping activity to phase separation.

bioRxiv : the preprint server for biology·2026
See all related articles

Related Experiment Video

Updated: May 17, 2025

A Protocol for Using Gene Set Enrichment Analysis to Identify the Appropriate Animal Model for Translational Research
09:35

A Protocol for Using Gene Set Enrichment Analysis to Identify the Appropriate Animal Model for Translational Research

Published on: August 16, 2017

17.7K

Using semantic search to find publicly available gene-expression datasets.

Grace S Brown1, James Wengler1,2, Aaron Joyce S Fabelico1

  • 1Department of Biology, Brigham Young University, Provo, Utah, USA.

Biorxiv : the Preprint Server for Biology
|March 31, 2025
PubMed
Summary
This summary is machine-generated.

Language models can enhance the discovery of relevant scientific datasets by summarizing descriptions into embeddings. This approach aids researchers in finding similar data for reuse and validation, improving upon existing search methods.

More Related Videos

Microarray Analysis for Saccharomyces cerevisiae
13:17

Microarray Analysis for Saccharomyces cerevisiae

Published on: April 7, 2011

13.6K
Development of Compendium for Esophageal Squamous Cell Carcinoma
03:36

Development of Compendium for Esophageal Squamous Cell Carcinoma

Published on: April 12, 2024

346

Related Experiment Videos

Last Updated: May 17, 2025

A Protocol for Using Gene Set Enrichment Analysis to Identify the Appropriate Animal Model for Translational Research
09:35

A Protocol for Using Gene Set Enrichment Analysis to Identify the Appropriate Animal Model for Translational Research

Published on: August 16, 2017

17.7K
Microarray Analysis for Saccharomyces cerevisiae
13:17

Microarray Analysis for Saccharomyces cerevisiae

Published on: April 7, 2011

13.6K
Development of Compendium for Esophageal Squamous Cell Carcinoma
03:36

Development of Compendium for Esophageal Squamous Cell Carcinoma

Published on: April 12, 2024

346

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Data Science

Background:

  • Vast numbers of high-throughput molecular datasets are publicly available in repositories like Gene Expression Omnibus (GEO).
  • Reusing these datasets is crucial for validating findings and exploring new research questions.
  • Discovering relevant datasets is challenging due to sheer volume, inconsistent descriptions, and lack of semantic annotations, hindering FAIR data principles.

Purpose of the Study:

  • To evaluate the effectiveness of language models in improving dataset discovery within the Gene Expression Omnibus (GEO).
  • To assess if language model-generated embeddings can identify relevant datasets more efficiently than traditional search methods.

Main Methods:

  • Utilized 30 language models to generate numerical representations (embeddings) of dataset descriptions from GEO.
  • Focused on six human medical conditions, using datasets previously curated by humans.
  • Compared the performance of language model-based similarity searches against GEO's built-in search engine.

Main Results:

  • Language models, particularly those trained on general corpora using contrastive learning with large embeddings, often outperformed GEO's search engine in identifying relevant datasets.
  • The effectiveness varied, indicating that this approach is promising but not universally superior.
  • Identified specific model characteristics that correlate with better performance in dataset discovery.

Conclusions:

  • Language models show significant potential to improve the discovery of scientific datasets, complementing existing search tools.
  • This approach can aid researchers in efficiently finding and reusing valuable molecular data.
  • Further development and integration of language models could streamline data discovery and enhance scientific reproducibility.