Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

DNA Microarrays

DNA Microarrays

Microarrays are high-throughput and relatively inexpensive assays that can be automated to analyze large quantities of data at a time. They are used in genome-wide studies to compare gene or protein expression under two varied conditions, such as healthy and diseased states. Microarrays consist of glass or silica slides on which probe molecules are covalently attached through surface functionalization. Most commonly, the slides are prepared through the chemisorption of silanes to silica...

Reporter Genes

Reporter Genes

Reporter genes are a type of protein-coding gene that are often tagged to a gene of interest. Once inside a target cell, reporter genes usually produce visually identifiable characteristics like fluorescence and luminescence when expressed along with the gene of interest. Thus, reporter genes “report” the presence or absence of genes of interest in an organism, determine the gene expression pattern, or track the physical location of a DNA segment or protein in the cell.

Cell Specific Gene Expression

Cell Specific Gene Expression

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Evaluation of analysis modes for RNA coexpression in single-cell and bulk tissue.

bioRxiv : the preprint server for biology·2026

Same author

Persistent hindrances to data re-use in single-cell genomics.

Scientific data·2026

Same author

Application of large language models to the annotation of cell lines and mouse strains in genomics data.

bioRxiv : the preprint server for biology·2026

Same author

Using semantic search to find publicly available gene-expression datasets.

Bioinformatics (Oxford, England)·2026

Same author

Translating short-form Python exercises to other programming languages using diverse prompting strategies.

GigaScience·2025

Same author

Functional, Pharmacogenomic, and Immune Landscapes of Long Non-Coding RNAs in Cancer.

Advanced science (Weinheim, Baden-Wurttemberg, Germany)·2025

Same journal

Layered social competition coordinates reproductive hierarchy formation in ants.

bioRxiv : the preprint server for biology·2026

Same journal

Combination epigenetic-targeted therapy increases the immunogenicity of poorly immunogenic sarcomas.

bioRxiv : the preprint server for biology·2026

Same journal

Loss of LanC-like proteins delays post-injury regeneration of aging skeletal muscles.

bioRxiv : the preprint server for biology·2026

Same journal

Integrative Transfer Network: Deep Transfer Learning Across Populations and Prediction Targets.

bioRxiv : the preprint server for biology·2026

Same journal

Confidence-supported label-free metabolic imaging with FPhaS phase autofluorescence microscopy.

bioRxiv : the preprint server for biology·2026

Same journal

Sequence-encoded autoinhibition couples mRNA decapping activity to phase separation.

bioRxiv : the preprint server for biology·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 17, 2025

A Protocol for Using Gene Set Enrichment Analysis to Identify the Appropriate Animal Model for Translational Research

A Protocol for Using Gene Set Enrichment Analysis to Identify the Appropriate Animal Model for Translational Research

Published on: August 16, 2017

Using semantic search to find publicly available gene-expression datasets.

Grace S Brown¹, James Wengler^1,2, Aaron Joyce S Fabelico¹

¹Department of Biology, Brigham Young University, Provo, Utah, USA.

Biorxiv : the Preprint Server for Biology

|March 31, 2025

Summary

This summary is machine-generated.

Language models can enhance the discovery of relevant scientific datasets by summarizing descriptions into embeddings. This approach aids researchers in finding similar data for reuse and validation, improving upon existing search methods.

More Related Videos

Microarray Analysis for Saccharomyces cerevisiae

Microarray Analysis for Saccharomyces cerevisiae

Published on: April 7, 2011

Development of Compendium for Esophageal Squamous Cell Carcinoma

Development of Compendium for Esophageal Squamous Cell Carcinoma

Published on: April 12, 2024

Related Experiment Videos

Last Updated: May 17, 2025

A Protocol for Using Gene Set Enrichment Analysis to Identify the Appropriate Animal Model for Translational Research

A Protocol for Using Gene Set Enrichment Analysis to Identify the Appropriate Animal Model for Translational Research

Published on: August 16, 2017

Microarray Analysis for Saccharomyces cerevisiae

Microarray Analysis for Saccharomyces cerevisiae

Published on: April 7, 2011

Development of Compendium for Esophageal Squamous Cell Carcinoma

Development of Compendium for Esophageal Squamous Cell Carcinoma

Published on: April 12, 2024

Area of Science:

Bioinformatics
Computational Biology
Data Science

Background:

Vast numbers of high-throughput molecular datasets are publicly available in repositories like Gene Expression Omnibus (GEO).
Reusing these datasets is crucial for validating findings and exploring new research questions.
Discovering relevant datasets is challenging due to sheer volume, inconsistent descriptions, and lack of semantic annotations, hindering FAIR data principles.

Purpose of the Study:

To evaluate the effectiveness of language models in improving dataset discovery within the Gene Expression Omnibus (GEO).
To assess if language model-generated embeddings can identify relevant datasets more efficiently than traditional search methods.

Main Methods:

Utilized 30 language models to generate numerical representations (embeddings) of dataset descriptions from GEO.
Focused on six human medical conditions, using datasets previously curated by humans.
Compared the performance of language model-based similarity searches against GEO's built-in search engine.

Main Results:

Language models, particularly those trained on general corpora using contrastive learning with large embeddings, often outperformed GEO's search engine in identifying relevant datasets.
The effectiveness varied, indicating that this approach is promising but not universally superior.
Identified specific model characteristics that correlate with better performance in dataset discovery.

Conclusions:

Language models show significant potential to improve the discovery of scientific datasets, complementing existing search tools.
This approach can aid researchers in efficiently finding and reusing valuable molecular data.
Further development and integration of language models could streamline data discovery and enhance scientific reproducibility.