Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Cis-regulatory Sequences

Cis-regulatory Sequences

Cis-regulatory sequences are short fragments of non-coding DNA that are present on the same chromosomes as the genes that they regulate. These fragments serve as binding sites for transcriptional regulators, proteins that are responsible for controlling gene transcription and differential gene expression across cell types in eukaryotes. Cis-regulatory sequences can be close to the gene of interest or thousands of bases away in the DNA sequence; however, those sequences that are further away are...

Cis-regulatory Sequences

Cis-regulatory Sequences

Sequences

Sequences

Sequences are fundamental mathematical objects consisting of ordered lists of numbers that follow a specific rule or pattern. Sequences are critical in various mathematical concepts, including calculus, series, and number theory. They can model real-world phenomena such as population growth, financial investments, and physical processes like the diminishing height of a bouncing ball.Each number in a sequence is referred to as a term. Typically, the terms are denoted as a1, a2, a3,…, where...

Sanger Sequencing

Sanger Sequencing

DNA sequencing is a fundamental technique that is routinely used in the biological sciences. This method can be applied to a range of questions at different scales - from the sequencing of a cloned DNA fragment or the study of a mutation in a gene up to whole-genome sequencing. However, despite the widespread use of sequencing today, it was not until 1977 that Fredrick Sanger and his collaborators developed the chain-termination method to decode DNA sequences. It relies on the separation of a...

Arithmetic Sequences

Arithmetic Sequences

An arithmetic sequence is a structured arrangement of numbers where each term is derived by adding a constant value, known as the common difference, to the previous term. This consistent pattern allows for the efficient computation of any term within the sequence as well as the cumulative sum of multiple terms. The formula for finding the nth term of an arithmetic sequence is:Here, aₙ represents the nth term of the sequence, a is the first term, d is the common difference, and n is the...

Geometric Sequences

Geometric Sequences

In systems where values diminish by a constant proportion at each stage, the resulting sequence follows a geometric structure. Each new value in the sequence is obtained by applying a fixed multiplier to the preceding term. This regular, proportional decline type is often used to represent processes involving gradual loss, such as energy dissipation or reduction in amplitude over time.When analyzing the total effect of such a process across unlimited iterations, the series of values is referred...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Data-driven AI system for learning how to run transcript assemblers.

Genome biology·2026

Same author

CodonMoE: DNA language models for codon-dependent mRNA prediction.

Bioinformatics (Oxford, England)·2026

Same author

CodonRL: Multi-Objective Codon Sequence Optimization Using Demonstration-Guided Reinforcement Learning.

bioRxiv : the preprint server for biology·2026

Same author

seq2ribo: Structure-aware integration of machine learning and simulation to predict ribosome location profiles from RNA sequences.

bioRxiv : the preprint server for biology·2026

Same author

Augmenting Electronic Health Records for Adverse Event Detection.

medRxiv : the preprint server for health sciences·2026

Same author

ARCADE: Controllable Codon Design from Foundation Models via Activation Engineering.

bioRxiv : the preprint server for biology·2025

Same journal

GMSA: A Graph Matching and Point Cloud Registration-Based Method for Spatial Transcriptomics Data Alignment.

Journal of computational biology : a journal of computational molecular cell biology·2026

Same journal

Investigations on Multiple Protein Scaffold Filling.

Journal of computational biology : a journal of computational molecular cell biology·2026

Same journal

Cell Type Prediction for Single-Cell RNA Sequencing Utilizing Unsupervised Domain Adaptation and Semi-Supervised Learning.

Journal of computational biology : a journal of computational molecular cell biology·2026

Same journal

PPIGAN: Prediction of Protein-Protein Interactions Using Generative Adversarial Networks.

Journal of computational biology : a journal of computational molecular cell biology·2026

Same journal

Deep Structure-Enhanced Cell Clustering Model for Single-Cell RNA Sequencing Data.

Journal of computational biology : a journal of computational molecular cell biology·2026

Same journal

Asymmetric Drug-Drug Interaction Prediction Based on Generative Adversarial Networks and Knowledge Graph.

Journal of computational biology : a journal of computational molecular cell biology·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Feb 12, 2026

The ITS2 Database

The ITS2 Database

Published on: March 12, 2012

Improved Search of Large Transcriptomic Sequencing Databases Using Split Sequence Bloom Trees.

Brad Solomon¹, Carl Kingsford¹

¹Computational Biology Department, School of Computer Science, Carnegie Mellon University , Pittsburgh, Pennsylvania.

Journal of Computational Biology : a Journal of Computational Molecular Cell Biology

|April 12, 2018

Summary

This summary is machine-generated.

New split sequence bloom trees (SSBTs) enable efficient searching of massive RNA-sequencing databases. This advance allows researchers to find specific gene expression patterns across thousands of experiments faster and with less storage.

Keywords:

RNA-seq data indexing sequence bloom trees sequence search.

More Related Videos

Optimization for Sequencing and Analysis of Degraded FFPE-RNA Samples

Optimization for Sequencing and Analysis of Degraded FFPE-RNA Samples

Published on: June 8, 2020

An Integrated Approach for Microprotein Identification and Sequence Analysis

An Integrated Approach for Microprotein Identification and Sequence Analysis

Published on: July 12, 2022

Related Experiment Videos

Last Updated: Feb 12, 2026

The ITS2 Database

The ITS2 Database

Published on: March 12, 2012

Optimization for Sequencing and Analysis of Degraded FFPE-RNA Samples

Optimization for Sequencing and Analysis of Degraded FFPE-RNA Samples

Published on: June 8, 2020

An Integrated Approach for Microprotein Identification and Sequence Analysis

An Integrated Approach for Microprotein Identification and Sequence Analysis

Published on: July 12, 2022

Area of Science:

Bioinformatics
Computational Biology
Genomics

Background:

Vast RNA-sequencing (RNA-seq) datasets, such as those in the NIH Sequencing Read Archive, offer immense potential for biological discovery.
Current limitations in searching these large-scale datasets hinder the extraction of valuable information on condition-specific expression and population variation.

Purpose of the Study:

To develop and evaluate a novel indexing scheme for efficient sequence-based querying of terabyte-scale RNA-seq data collections.
To improve upon existing data structures like Sequence Bloom Trees (SBTs) for handling massive sequencing experiment data.

Main Methods:

Introduction of split sequence bloom trees (SSBTs), an enhanced data structure for indexing short-read sequencing data.
Application of SSBTs to query expression patterns of specific transcripts across a large dataset of 2652 RNA-seq experiments (breast, blood, brain tissues).

Main Results:

SSBTs provide a fivefold improvement in search and storage costs compared to SBTs.
Querying a 1000-nucleotide sequence within the indexed dataset takes under 4 minutes on a single thread.
The SSBT index for the entire dataset requires only 39 GB of storage.

Conclusions:

SSBTs offer a scalable and efficient solution for sequence-based querying of large RNA-seq experiment collections.
This methodology significantly enhances the utility of public sequencing data archives for biological research.
The developed indexing scheme facilitates faster identification of gene expression conditions and variations.