Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Genome Annotation and Assembly

Genome Annotation and Assembly

The genome refers to all of the genetic material in an organism. It can range from a few million base pairs in microbial cells to several billion base pairs in many eukaryotic organisms. Genome assembly refers to the process of taking the DNA sequencing data and putting it all back together in a correct order to create a close representation of the original genome. This is followed by the identification of functional elements on the newly assembled genome, a process called genome annotation.

Next-generation Sequencing

Next-generation Sequencing

The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features.

RNA-seq

RNA-seq

RNA sequencing, or RNA-Seq, is a high-throughput sequencing technology used to study the transcriptome of a cell. Transcriptomics helps to interpret the functional elements of a genome and identify the molecular constituents of an organism. Additionally, it also helps in understanding the development of an organism and the occurrence of diseases.
Before the discovery of RNA-seq, microarray-based methods and Sanger sequencing were used for transcriptome analysis. However, while microarray-based...

Sanger Sequencing

Sanger Sequencing

DNA sequencing is a fundamental technique that is routinely used in the biological sciences. This method can be applied to a range of questions at different scales - from the sequencing of a cloned DNA fragment or the study of a mutation in a gene up to whole-genome sequencing. However, despite the widespread use of sequencing today, it was not until 1977 that Fredrick Sanger and his collaborators developed the chain-termination method to decode DNA sequences. It relies on the separation of a...

Evolutionary Relationships through Genome Comparisons

Evolutionary Relationships through Genome Comparisons

Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

The New York Genome Center ALS Consortium resource integrates postmortem tissue transcriptomics and whole genome sequencing to empower biological discovery.

medRxiv : the preprint server for health sciences·2026

Same author

A complete human pancreatic cancer genome.

bioRxiv : the preprint server for biology·2026

Same author

Lancet2: Improved and accelerated somatic variant calling with joint multi-sample local assembly graphs.

NAR genomics and bioinformatics·2026

Same author

Basic Science and Pathogenesis.

Alzheimer's & dementia : the journal of the Alzheimer's Association·2025

Same author

Comprehensive benchmarking of somatic single-nucleotide variant and indel detection at ultra-low allele fractions using short- and long-read data.

bioRxiv : the preprint server for biology·2025

Same author

Accurate somatic small variant discovery for multiple sequencing technologies with DeepSomatic.

Nature biotechnology·2025

Same journal

Analysis of strength degradation of coal and rock masses and stability of mined areas under long term immersion environment.

PloS one·2026

Same journal

Biogenic Silver-Selenium nanocomposite with anticancer activity and potent efficacy against vancomycin-resistant Staphylococcus aureus.

PloS one·2026

Same journal

Preparation and physicochemical characterization of a biodegradable chitosan/carboxymethyl cellulose hydrogel synthesized in NaOH/urea medium.

PloS one·2026

Same journal

Action-guilt, survivor-guilt, and depression in combat-related PTSD.

PloS one·2026

Same journal

Explainable machine learning for predicting activities of daily living at discharge in stroke patients: A retrospective study using SHAP interpretability.

PloS one·2026

Same journal

Deep learning based two-way feature depiction model for brain tumor detection.

PloS one·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 25, 2026

Novel Sequence Discovery by Subtractive Genomics

Novel Sequence Discovery by Subtractive Genomics

Published on: January 25, 2019

Feature-by-feature--evaluating de novo sequence assembly.

Francesco Vezzi¹, Giuseppe Narzisi, Bud Mishra

¹Department of Mathematics and Informatics, University of Udine, Udine, Italy.

|February 10, 2012

Summary

This summary is machine-generated.

This study reveals that common whole-genome sequence assembly metrics are insufficient for accurately comparing assembler performance. Multivariate analysis identifies key features for a more reliable assessment, highlighting limitations of simulated data in evaluations.

More Related Videos

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved (Non-model) Organisms

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved (Non-model) Organisms

Published on: May 9, 2017

Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies

Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies

Published on: August 20, 2021

Related Experiment Videos

Last Updated: May 25, 2026

Novel Sequence Discovery by Subtractive Genomics

Novel Sequence Discovery by Subtractive Genomics

Published on: January 25, 2019

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved (Non-model) Organisms

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved (Non-model) Organisms

Published on: May 9, 2017

Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies

Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies

Published on: August 20, 2021

Area of Science:

Computational Biology
Bioinformatics
Genomics

Background:

Whole-genome sequence assembly (WGSA) is a critical problem in computational biology.
Existing tools (assemblers) often claim to solve WGSA, but systematic accuracy comparisons are lacking.
Traditional evaluation metrics (e.g., N50) and simulated datasets have limitations in reflecting true assembly quality and correctness.

Purpose of the Study:

To systematically analyze the relationships and importance of different features used in evaluating genome assembly quality and correctness.
To address the limitations of the Feature Response Curve (FRC) method by accounting for feature correlations.
To identify a reduced set of highly informative features for more accurate and reliable assembler performance comparison.

Main Methods:

Analysis of feature correlations in whole-genome sequence assembly.
Application of multivariate statistical techniques, including Principal Component Analysis (PCA) and Independent Component Analysis (ICA).
Utilizing the Feature Response Curve (FRC) method with a refined set of features.

Main Results:

Multivariate analysis revealed 'excess-dimensionality' in the feature space and demonstrated the inadequacy of the N50 metric for assessing assembly quality.
Independent Component Analysis identified a subset of features that better describe assembler performance.
The study confirmed that evaluations based on simulated data can yield unrealistic results.

Conclusions:

A reduced set of highly informative features, identified through multivariate analysis, enables a more accurate comparison of genome assemblers using the FRC method.
The findings underscore the need for improved evaluation strategies beyond traditional metrics and simulated datasets.
This work provides a more robust framework for assessing and comparing whole-genome sequence assembly tools.