Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Cis-regulatory Sequences02:02

Cis-regulatory Sequences

11.9K
Cis-regulatory sequences are short fragments of non-coding DNA that are present on the same chromosomes as the genes that they regulate. These fragments serve as binding sites for transcriptional regulators, proteins that are responsible for controlling gene transcription and differential gene expression across cell types in eukaryotes. Cis-regulatory sequences can be close to the gene of interest or thousands of bases away in the DNA sequence; however, those sequences that are further away are...
11.9K
Cis-regulatory Sequences02:02

Cis-regulatory Sequences

4.2K
4.2K
Sequences01:29

Sequences

295
Sequences are fundamental mathematical objects consisting of ordered lists of numbers that follow a specific rule or pattern. Sequences are critical in various mathematical concepts, including calculus, series, and number theory. They can model real-world phenomena such as population growth, financial investments, and physical processes like the diminishing height of a bouncing ball.Each number in a sequence is referred to as a term. Typically, the terms are denoted as a1, a2, a3,…, where...
295
Sanger Sequencing01:57

Sanger Sequencing

774.9K
DNA sequencing is a fundamental technique that is routinely used in the biological sciences. This method can be applied to a range of questions at different scales - from the sequencing of a cloned DNA fragment or the study of a mutation in a gene up to whole-genome sequencing. However, despite the widespread use of sequencing today, it was not until 1977 that Fredrick Sanger and his collaborators developed the chain-termination method to decode DNA sequences. It relies on the separation of a...
774.9K
Arithmetic Sequences01:30

Arithmetic Sequences

243
An arithmetic sequence is a structured arrangement of numbers where each term is derived by adding a constant value, known as the common difference, to the previous term. This consistent pattern allows for the efficient computation of any term within the sequence as well as the cumulative sum of multiple terms. The formula for finding the nth term of an arithmetic sequence is:Here, aₙ represents the nth term of the sequence, a is the first term, d is the common difference, and n is the...
243
Geometric Sequences01:30

Geometric Sequences

289
In systems where values diminish by a constant proportion at each stage, the resulting sequence follows a geometric structure. Each new value in the sequence is obtained by applying a fixed multiplier to the preceding term. This regular, proportional decline type is often used to represent processes involving gradual loss, such as energy dissipation or reduction in amplitude over time.When analyzing the total effect of such a process across unlimited iterations, the series of values is referred...
289

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Data-driven AI system for learning how to run transcript assemblers.

Genome biology·2026
Same author

CodonMoE: DNA language models for codon-dependent mRNA prediction.

Bioinformatics (Oxford, England)·2026
Same author

CodonRL: Multi-Objective Codon Sequence Optimization Using Demonstration-Guided Reinforcement Learning.

bioRxiv : the preprint server for biology·2026
Same author

seq2ribo: Structure-aware integration of machine learning and simulation to predict ribosome location profiles from RNA sequences.

bioRxiv : the preprint server for biology·2026
Same author

Augmenting Electronic Health Records for Adverse Event Detection.

medRxiv : the preprint server for health sciences·2026
Same author

ARCADE: Controllable Codon Design from Foundation Models via Activation Engineering.

bioRxiv : the preprint server for biology·2025
Same journal

GMSA: A Graph Matching and Point Cloud Registration-Based Method for Spatial Transcriptomics Data Alignment.

Journal of computational biology : a journal of computational molecular cell biology·2026
Same journal

Investigations on Multiple Protein Scaffold Filling.

Journal of computational biology : a journal of computational molecular cell biology·2026
Same journal

Cell Type Prediction for Single-Cell RNA Sequencing Utilizing Unsupervised Domain Adaptation and Semi-Supervised Learning.

Journal of computational biology : a journal of computational molecular cell biology·2026
Same journal

PPIGAN: Prediction of Protein-Protein Interactions Using Generative Adversarial Networks.

Journal of computational biology : a journal of computational molecular cell biology·2026
Same journal

Deep Structure-Enhanced Cell Clustering Model for Single-Cell RNA Sequencing Data.

Journal of computational biology : a journal of computational molecular cell biology·2026
Same journal

Asymmetric Drug-Drug Interaction Prediction Based on Generative Adversarial Networks and Knowledge Graph.

Journal of computational biology : a journal of computational molecular cell biology·2026
See all related articles

Related Experiment Video

Updated: Feb 12, 2026

The ITS2 Database
16:17

The ITS2 Database

Published on: March 12, 2012

32.4K

Improved Search of Large Transcriptomic Sequencing Databases Using Split Sequence Bloom Trees.

Brad Solomon1, Carl Kingsford1

  • 1Computational Biology Department, School of Computer Science, Carnegie Mellon University , Pittsburgh, Pennsylvania.

Journal of Computational Biology : a Journal of Computational Molecular Cell Biology
|April 12, 2018
PubMed
Summary
This summary is machine-generated.

New split sequence bloom trees (SSBTs) enable efficient searching of massive RNA-sequencing databases. This advance allows researchers to find specific gene expression patterns across thousands of experiments faster and with less storage.

Keywords:
RNA-seqdata indexingsequence bloom treessequence search.

More Related Videos

Optimization for Sequencing and Analysis of Degraded FFPE-RNA Samples
07:30

Optimization for Sequencing and Analysis of Degraded FFPE-RNA Samples

Published on: June 8, 2020

12.8K
An Integrated Approach for Microprotein Identification and Sequence Analysis
09:37

An Integrated Approach for Microprotein Identification and Sequence Analysis

Published on: July 12, 2022

4.0K

Related Experiment Videos

Last Updated: Feb 12, 2026

The ITS2 Database
16:17

The ITS2 Database

Published on: March 12, 2012

32.4K
Optimization for Sequencing and Analysis of Degraded FFPE-RNA Samples
07:30

Optimization for Sequencing and Analysis of Degraded FFPE-RNA Samples

Published on: June 8, 2020

12.8K
An Integrated Approach for Microprotein Identification and Sequence Analysis
09:37

An Integrated Approach for Microprotein Identification and Sequence Analysis

Published on: July 12, 2022

4.0K

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • Vast RNA-sequencing (RNA-seq) datasets, such as those in the NIH Sequencing Read Archive, offer immense potential for biological discovery.
  • Current limitations in searching these large-scale datasets hinder the extraction of valuable information on condition-specific expression and population variation.

Purpose of the Study:

  • To develop and evaluate a novel indexing scheme for efficient sequence-based querying of terabyte-scale RNA-seq data collections.
  • To improve upon existing data structures like Sequence Bloom Trees (SBTs) for handling massive sequencing experiment data.

Main Methods:

  • Introduction of split sequence bloom trees (SSBTs), an enhanced data structure for indexing short-read sequencing data.
  • Application of SSBTs to query expression patterns of specific transcripts across a large dataset of 2652 RNA-seq experiments (breast, blood, brain tissues).

Main Results:

  • SSBTs provide a fivefold improvement in search and storage costs compared to SBTs.
  • Querying a 1000-nucleotide sequence within the indexed dataset takes under 4 minutes on a single thread.
  • The SSBT index for the entire dataset requires only 39 GB of storage.

Conclusions:

  • SSBTs offer a scalable and efficient solution for sequence-based querying of large RNA-seq experiment collections.
  • This methodology significantly enhances the utility of public sequencing data archives for biological research.
  • The developed indexing scheme facilitates faster identification of gene expression conditions and variations.