Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Genome Annotation and Assembly

Genome Annotation and Assembly

The genome refers to all of the genetic material in an organism. It can range from a few million base pairs in microbial cells to several billion base pairs in many eukaryotic organisms. Genome assembly refers to the process of taking the DNA sequencing data and putting it all back together in a correct order to create a close representation of the original genome. This is followed by the identification of functional elements on the newly assembled genome, a process called genome annotation.

Next-generation Sequencing

Next-generation Sequencing

The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features.

Evolutionary Relationships through Genome Comparisons

Evolutionary Relationships through Genome Comparisons

Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...

Improving Translational Accuracy

Improving Translational Accuracy

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...

Improving Translational Accuracy

Improving Translational Accuracy

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...

RNA-seq

RNA-seq

RNA sequencing, or RNA-Seq, is a high-throughput sequencing technology used to study the transcriptome of a cell. Transcriptomics helps to interpret the functional elements of a genome and identify the molecular constituents of an organism. Additionally, it also helps in understanding the development of an organism and the occurrence of diseases.
Before the discovery of RNA-seq, microarray-based methods and Sanger sequencing were used for transcriptome analysis. However, while microarray-based...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

MDM4 HAPLOINSUFFICIENCY LEADS TO P53-MEDIATED BONE MARROW FAILURE.

Blood·2026

Same author

FBXO11 suppression rewires an NPM1-centered interactome influencing the progression of myelodysplastic syndrome.

The Journal of clinical investigation·2026

Same author

BCL11A-deficient human erythropoiesis is impaired in vitro and after xenotransplantation into mice.

Blood advances·2025

Same author

Preclinical development of lentiviral vector gene therapy for Diamond-Blackfan anemia syndrome.

Molecular therapy : the journal of the American Society of Gene Therapy·2024

Same author

Integrome signatures of lentiviral gene therapy for SCID-X1 patients.

Science advances·2023

Same author

Activation of γ-globin expression by hypoxia-inducible factor 1α.

Nature·2022

Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026

Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026

Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026

Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026

Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026

Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 17, 2026

Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies

Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies

Published on: August 20, 2021

Improving de novo sequence assembly using machine learning and comparative genomics for overlap correction.

Lance E Palmer¹, Mathaeus Dejori, Randall Bolanos

¹Siemens Corporate Research, 755 College Road East, Princeton, NJ, USA. lance.palmer@siemens.com

BMC Bioinformatics

|January 19, 2010

Summary

This summary is machine-generated.

This study introduces a machine learning approach to improve de novo genome assembly by accurately identifying true read overlaps. This method enhances contig length and assembly quality using comparative genomics and sequence statistics.

More Related Videos

Novel Sequence Discovery by Subtractive Genomics

Novel Sequence Discovery by Subtractive Genomics

Published on: January 25, 2019

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved (Non-model) Organisms

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved (Non-model) Organisms

Published on: May 9, 2017

Related Experiment Videos

Last Updated: Jun 17, 2026

Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies

Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies

Published on: August 20, 2021

Novel Sequence Discovery by Subtractive Genomics

Novel Sequence Discovery by Subtractive Genomics

Published on: January 25, 2019

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved (Non-model) Organisms

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved (Non-model) Organisms

Published on: May 9, 2017

Area of Science:

Genomics
Bioinformatics
Computational Biology

Background:

Advancements in DNA sequencing enable leveraging existing data for new genome projects.
De novo assembly relies heavily on accurately identifying overlapping DNA reads.
Distinguishing true overlaps from spurious alignments due to repetitive sequences remains a challenge.

Purpose of the Study:

To enhance de novo genome assembly by improving the read overlap detection step.
To develop a data-driven method for classifying read overlaps as true or false.

Main Methods:

Extended the Minimus assembler with a machine learning-based overlap classification module.
Trained classification models using Weka with features like percent mismatch, k-mer frequencies, and comparative genomics scores.
Utilized read data from prior sequencing projects and reference genomes for training.

Main Results:

Nearly doubled the median contig length (N50) in E. coli and S. aureus whole-genome sequencing data.
Maintained genome coverage and did not increase the number of mis-assemblies.
Demonstrated the effectiveness of a curated set of overlaps in the contigging phase.

Conclusions:

Machine learning, incorporating comparative and non-comparative features, effectively classifies read overlaps.
This approach significantly improves the quality of de novo sequence assembly.