Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Genome Annotation and Assembly

Genome Annotation and Assembly

The genome refers to all of the genetic material in an organism. It can range from a few million base pairs in microbial cells to several billion base pairs in many eukaryotic organisms. Genome assembly refers to the process of taking the DNA sequencing data and putting it all back together in a correct order to create a close representation of the original genome. This is followed by the identification of functional elements on the newly assembled genome, a process called genome annotation.

Evolutionary Relationships through Genome Comparisons

Evolutionary Relationships through Genome Comparisons

Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...

RNA-seq

RNA-seq

RNA sequencing, or RNA-Seq, is a high-throughput sequencing technology used to study the transcriptome of a cell. Transcriptomics helps to interpret the functional elements of a genome and identify the molecular constituents of an organism. Additionally, it also helps in understanding the development of an organism and the occurrence of diseases.
Before the discovery of RNA-seq, microarray-based methods and Sanger sequencing were used for transcriptome analysis. However, while...

Genome Size and the Evolution of New Genes

Genome Size and the Evolution of New Genes

While every living organism has a genome of some kind (be it RNA, or DNA), there is considerable variation in the sizes of these blueprints. One major factor that impacts genome size is whether the organism is prokaryotic or eukaryotic. In prokaryotes, the genome contains little to no non-coding sequence, such that genes are tightly clustered in groups or operons sequentially along the chromosome. Conversely, the genes in eukaryotes are punctuated by long stretches of non-coding sequence.

Genome Size and the Evolution of New Genes

Genome Size and the Evolution of New Genes

Genome-wide Association Studies-GWAS

Genome-wide Association Studies-GWAS

Genome-wide association studies or GWAS are used to identify whether common SNPs are associated with certain diseases. Suppose specific SNPs are more frequently observed in individuals with a particular disease than those without the disease. In that case, those SNPs are said to be associated with the disease. Chi-square analysis is performed to check the probability of the allele likely to be associated with the disease.
GWAS does not require the identification of the target gene involved in...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Representation learning for multi-modal spatially resolved transcriptomics data.

Bioinformatics (Oxford, England)·2026

Same author

Estimation of Physiological Metrics from Resting ECGs Using Deep Learning in the UK Biobank, Including submaximal exercise derived V̇O <sub>2</sub> max, Body Fat Percentage, and Grip Strength.

medRxiv : the preprint server for health sciences·2026

Same author

Rawsamble: overlapping raw nanopore signals using a hash-based seeding mechanism.

Bioinformatics (Oxford, England)·2026

Same author

CAMP: a modular metagenomics analysis system for integrated multistep data exploration.

NAR genomics and bioinformatics·2026

Same author

RMS: a ML-based system for ICU respiratory monitoring and resource planning.

NPJ digital medicine·2025

Same author

ImmunoPepper: extracting personalized peptides from complex splicing graphs.

Bioinformatics (Oxford, England)·2025

Same journal

GMSA: A Graph Matching and Point Cloud Registration-Based Method for Spatial Transcriptomics Data Alignment.

Journal of computational biology : a journal of computational molecular cell biology·2026

Same journal

Investigations on Multiple Protein Scaffold Filling.

Journal of computational biology : a journal of computational molecular cell biology·2026

Same journal

Cell Type Prediction for Single-Cell RNA Sequencing Utilizing Unsupervised Domain Adaptation and Semi-Supervised Learning.

Journal of computational biology : a journal of computational molecular cell biology·2026

Same journal

PPIGAN: Prediction of Protein-Protein Interactions Using Generative Adversarial Networks.

Journal of computational biology : a journal of computational molecular cell biology·2026

Same journal

Deep Structure-Enhanced Cell Clustering Model for Single-Cell RNA Sequencing Data.

Journal of computational biology : a journal of computational molecular cell biology·2026

Same journal

Asymmetric Drug-Drug Interaction Prediction Based on Generative Adversarial Networks and Knowledge Graph.

Journal of computational biology : a journal of computational molecular cell biology·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Dec 31, 2025

A Web Tool for Generating High Quality Machine-readable Biological Pathways

A Web Tool for Generating High Quality Machine-readable Biological Pathways

Published on: February 8, 2017

Sparse Binary Relation Representations for Genome Graph Annotation.

Mikhail Karasikov^1,2,3, Harun Mustafa^1,2,3, Amir Joudaki^1,2,3

¹Department of Computer Science, ETH Zurich, Zurich, Switzerland.

Journal of Computational Biology : a Journal of Computational Molecular Cell Biology

|January 1, 2020

Summary

This summary is machine-generated.

A new Multi-binary relation wavelet tree (BRWT) method enhances compression for DNA sequencing data labels on de Bruijn graphs. This approach adapts to data characteristics, improving storage efficiency for large biological datasets.

Keywords:

binary relations compressed data structures genome graph annotation sparse binary matrices

More Related Videos

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

Related Experiment Videos

Last Updated: Dec 31, 2025

A Web Tool for Generating High Quality Machine-readable Biological Pathways

A Web Tool for Generating High Quality Machine-readable Biological Pathways

Published on: February 8, 2017

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

Area of Science:

Bioinformatics
Computational Biology
Data Compression

Background:

High-throughput DNA sequencing generates massive datasets requiring efficient storage and indexing.
Labeled de Bruijn graphs are emerging as effective structures for representing and querying sequencing data.
Current methods for compressing labels on these graphs are not fully explored and lack adaptability to data characteristics.

Purpose of the Study:

To introduce a novel, adaptive compression approach for labels on de Bruijn graphs.
To evaluate the performance of the new method against existing state-of-the-art techniques.
To analyze how data characteristics influence compression efficiency for sequencing data.

Main Methods:

Development of the Multi-binary relation wavelet tree (BRWT) compression method.
Systematic analysis and evaluation of five state-of-the-art annotation compression schemes.
Testing on both artificial and diverse real-world sequencing datasets.

Main Results:

Achieved up to 29% improvement over the basic BRWT method.
Demonstrated up to 68% improvement compared to current state-of-the-art de Bruijn graph label compression.
Showcased robust performance across various real-world datasets, confirming adaptability.

Conclusions:

The proposed Multi-binary relation wavelet tree (BRWT) offers a significant advancement in compressing labels for de Bruijn graphs.
The method's adaptability to data sparsity and correlations leads to superior compression performance.
This work provides valuable insights into optimizing data storage for large-scale biological sequence repositories.