Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Genome Annotation and Assembly

Genome Annotation and Assembly

The genome refers to all of the genetic material in an organism. It can range from a few million base pairs in microbial cells to several billion base pairs in many eukaryotic organisms. Genome assembly refers to the process of taking the DNA sequencing data and putting it all back together in a correct order to create a close representation of the original genome. This is followed by the identification of functional elements on the newly assembled genome, a process called genome annotation.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Pattern matching with Elastic-Degenerate strings and Elastic-Founder graphs.

Algorithms for molecular biology : AMB·2026

Same author

Evolutionary characterization of lung cancer metastasis.

Nature·2026

Same author

VUScope: a mathematical model for evaluating image-based drug response measurements and predicting long-term incubation outcomes.

Bioinformatics (Oxford, England)·2026

Same author

Dynamic Mortality Risk Prediction in Myelodysplastic Syndromes Using Longitudinal Clinical Data.

JCO clinical cancer informatics·2025

Same author

Sequence-to-graph alignment based copy number calling using a network flow formulation.

bioRxiv : the preprint server for biology·2025

Same author

Somatic evolution following cancer treatment in normal tissue.

Nature·2025

Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026

Same journal

SpaMFG: a Spatial Multi-omics Integration Method based on Feature Grouping.

Bioinformatics (Oxford, England)·2026

Same journal

CSCN: Inference of Cell-Specific Causal Networks Using Single-Cell RNA-Seq Data.

Bioinformatics (Oxford, England)·2026

Same journal

Sparse CCA-Based Mediation Analysis with High-Dimensional Exposures and Mediators.

Bioinformatics (Oxford, England)·2026

Same journal

Enhancing Cross-Context Generalization in Drug Perturbation Prediction with a Multimodal Conditional Diffusion Framework.

Bioinformatics (Oxford, England)·2026

Same journal

Primer Design through Submodular Function Estimation.

Bioinformatics (Oxford, England)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Apr 4, 2026

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Published on: June 23, 2012

HapCol: accurate and memory-efficient haplotype assembly from long reads.

Yuri Pirola¹, Simone Zaccaria¹, Riccardo Dondi²

¹Dipartimento di Informatica Sistemistica e Comunicazione (DISCo), Univ. degli Studi di Milano-Bicocca, Milan, Italy.

Bioinformatics (Oxford, England)

|August 29, 2015

Summary

This summary is machine-generated.

Haplotype assembly is computationally challenging for diploid organisms. HapCol, a new exact algorithm, efficiently reconstructs haplotypes from long-read sequencing data, improving accuracy and reducing computational demands.

More Related Videos

Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies

Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies

Published on: August 20, 2021

A Computational Pipeline for Intergenic/Intragenic Enhancer RNA Quantification in Mouse Embryonic Stem Cells

A Computational Pipeline for Intergenic/Intragenic Enhancer RNA Quantification in Mouse Embryonic Stem Cells

Published on: October 28, 2025

Related Experiment Videos

Last Updated: Apr 4, 2026

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Published on: June 23, 2012

Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies

Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies

Published on: August 20, 2021

A Computational Pipeline for Intergenic/Intragenic Enhancer RNA Quantification in Mouse Embryonic Stem Cells

A Computational Pipeline for Intergenic/Intragenic Enhancer RNA Quantification in Mouse Embryonic Stem Cells

Published on: October 28, 2025

Area of Science:

Genomics
Bioinformatics
Computational Biology

Background:

Haplotype assembly is crucial for understanding genotype-phenotype relationships.
Next-generation sequencing technologies provide long reads and high coverage, posing challenges for existing assembly methods.
Current methods struggle with scalability and accuracy as read length and coverage increase, or rely on limiting assumptions.

Purpose of the Study:

To develop an exact algorithm for haplotype assembly that effectively utilizes long-read sequencing data.
To address the limitations of existing methods in terms of accuracy, performance, and assumptions.

Main Methods:

Designed HapCol, an exact algorithm leveraging the uniform error distribution of sequencing data.
The algorithm is exponential in the maximum number of corrections per single-nucleotide polymorphism (SNP) position.
Minimizes the overall error-correction score.

Main Results:

HapCol demonstrates competitive performance against state-of-the-art combinatorial methods on real and simulated data.
Achieved improvements in accuracy and the number of phased positions compared to existing approaches.
Required significantly less computational resources, particularly memory, on simulated datasets.

Conclusions:

HapCol offers a computationally efficient solution for haplotype assembly, overcoming previous limitations.
Enables phasing of datasets with higher coverage and relaxes the all-heterozygous assumption.
Provides a valuable tool for genomic research, particularly in characterizing SNP effects.