Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Genome Annotation and Assembly03:36

Genome Annotation and Assembly

20.4K
The genome refers to all of the genetic material in an organism. It can range from a few million base pairs in microbial cells to several billion base pairs in many eukaryotic organisms. Genome assembly refers to the process of taking the DNA sequencing data and putting it all back together in a correct order to create a close representation of the original genome. This is followed by the identification of functional elements on the newly assembled genome, a process called genome annotation.
20.4K
Evolutionary Relationships through Genome Comparisons02:54

Evolutionary Relationships through Genome Comparisons

6.8K
Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...
6.8K
RNA-seq03:21

RNA-seq

11.6K
RNA sequencing, or RNA-Seq, is a high-throughput sequencing technology used to study the transcriptome of a cell. Transcriptomics helps to interpret the functional elements of a genome and identify the molecular constituents of an organism. Additionally, it also helps in understanding the development of an organism and the occurrence of diseases. 
Before the discovery of RNA-seq, microarray-based methods and Sanger sequencing were used for transcriptome analysis. However, while...
11.6K
Genome Size and the Evolution of New Genes03:21

Genome Size and the Evolution of New Genes

8.9K
While every living organism has a genome of some kind (be it RNA, or DNA), there is considerable variation in the sizes of these blueprints. One major factor that impacts genome size is whether the organism is prokaryotic or eukaryotic. In prokaryotes, the genome contains little to no non-coding sequence, such that genes are tightly clustered in groups or operons sequentially along the chromosome. Conversely, the genes in eukaryotes are punctuated by long stretches of non-coding sequence.
8.9K
Genome Size and the Evolution of New Genes03:21

Genome Size and the Evolution of New Genes

3.2K
3.2K
Genome-wide Association Studies-GWAS01:11

Genome-wide Association Studies-GWAS

15.2K
Genome-wide association studies or GWAS are used to identify whether common SNPs are associated with certain diseases. Suppose specific SNPs are more frequently observed in individuals with a particular disease than those without the disease. In that case, those SNPs are said to be associated with the disease. Chi-square analysis is performed to check the probability of the allele likely to be associated with the disease.
GWAS does not require the identification of the target gene involved in...
15.2K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Representation learning for multi-modal spatially resolved transcriptomics data.

Bioinformatics (Oxford, England)·2026
Same author

Estimation of Physiological Metrics from Resting ECGs Using Deep Learning in the UK Biobank, Including submaximal exercise derived V̇O <sub>2</sub> max, Body Fat Percentage, and Grip Strength.

medRxiv : the preprint server for health sciences·2026
Same author

Rawsamble: overlapping raw nanopore signals using a hash-based seeding mechanism.

Bioinformatics (Oxford, England)·2026
Same author

CAMP: a modular metagenomics analysis system for integrated multistep data exploration.

NAR genomics and bioinformatics·2026
Same author

RMS: a ML-based system for ICU respiratory monitoring and resource planning.

NPJ digital medicine·2025
Same author

ImmunoPepper: extracting personalized peptides from complex splicing graphs.

Bioinformatics (Oxford, England)·2025
Same journal

GMSA: A Graph Matching and Point Cloud Registration-Based Method for Spatial Transcriptomics Data Alignment.

Journal of computational biology : a journal of computational molecular cell biology·2026
Same journal

Investigations on Multiple Protein Scaffold Filling.

Journal of computational biology : a journal of computational molecular cell biology·2026
Same journal

Cell Type Prediction for Single-Cell RNA Sequencing Utilizing Unsupervised Domain Adaptation and Semi-Supervised Learning.

Journal of computational biology : a journal of computational molecular cell biology·2026
Same journal

PPIGAN: Prediction of Protein-Protein Interactions Using Generative Adversarial Networks.

Journal of computational biology : a journal of computational molecular cell biology·2026
Same journal

Deep Structure-Enhanced Cell Clustering Model for Single-Cell RNA Sequencing Data.

Journal of computational biology : a journal of computational molecular cell biology·2026
Same journal

Asymmetric Drug-Drug Interaction Prediction Based on Generative Adversarial Networks and Knowledge Graph.

Journal of computational biology : a journal of computational molecular cell biology·2026
See all related articles

Related Experiment Video

Updated: Dec 31, 2025

A Web Tool for Generating High Quality Machine-readable Biological Pathways
08:01

A Web Tool for Generating High Quality Machine-readable Biological Pathways

Published on: February 8, 2017

18.4K

Sparse Binary Relation Representations for Genome Graph Annotation.

Mikhail Karasikov1,2,3, Harun Mustafa1,2,3, Amir Joudaki1,2,3

  • 1Department of Computer Science, ETH Zurich, Zurich, Switzerland.

Journal of Computational Biology : a Journal of Computational Molecular Cell Biology
|January 1, 2020
PubMed
Summary
This summary is machine-generated.

A new Multi-binary relation wavelet tree (BRWT) method enhances compression for DNA sequencing data labels on de Bruijn graphs. This approach adapts to data characteristics, improving storage efficiency for large biological datasets.

Keywords:
binary relationscompressed data structuresgenome graph annotationsparse binary matrices

More Related Videos

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

1.2K
Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations
08:03

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

2.7K

Related Experiment Videos

Last Updated: Dec 31, 2025

A Web Tool for Generating High Quality Machine-readable Biological Pathways
08:01

A Web Tool for Generating High Quality Machine-readable Biological Pathways

Published on: February 8, 2017

18.4K
Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

1.2K
Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations
08:03

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

2.7K

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Data Compression

Background:

  • High-throughput DNA sequencing generates massive datasets requiring efficient storage and indexing.
  • Labeled de Bruijn graphs are emerging as effective structures for representing and querying sequencing data.
  • Current methods for compressing labels on these graphs are not fully explored and lack adaptability to data characteristics.

Purpose of the Study:

  • To introduce a novel, adaptive compression approach for labels on de Bruijn graphs.
  • To evaluate the performance of the new method against existing state-of-the-art techniques.
  • To analyze how data characteristics influence compression efficiency for sequencing data.

Main Methods:

  • Development of the Multi-binary relation wavelet tree (BRWT) compression method.
  • Systematic analysis and evaluation of five state-of-the-art annotation compression schemes.
  • Testing on both artificial and diverse real-world sequencing datasets.

Main Results:

  • Achieved up to 29% improvement over the basic BRWT method.
  • Demonstrated up to 68% improvement compared to current state-of-the-art de Bruijn graph label compression.
  • Showcased robust performance across various real-world datasets, confirming adaptability.

Conclusions:

  • The proposed Multi-binary relation wavelet tree (BRWT) offers a significant advancement in compressing labels for de Bruijn graphs.
  • The method's adaptability to data sparsity and correlations leads to superior compression performance.
  • This work provides valuable insights into optimizing data storage for large-scale biological sequence repositories.