Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Genome Annotation and Assembly

Genome Annotation and Assembly

The genome refers to all of the genetic material in an organism. It can range from a few million base pairs in microbial cells to several billion base pairs in many eukaryotic organisms. Genome assembly refers to the process of taking the DNA sequencing data and putting it all back together in a correct order to create a close representation of the original genome. This is followed by the identification of functional elements on the newly assembled genome, a process called genome annotation.

Next-generation Sequencing

Next-generation Sequencing

The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features.

Genomics

Genomics

Genomics is the science of genomes: it is the study of all the genetic material of an organism. In humans, the genome consists of information carried in 23 pairs of chromosomes in the nucleus, as well as mitochondrial DNA. In genomics, both coding and non-coding DNA is sequenced and analyzed. Genomics allows a better understanding of all living things, their evolution, and their diversity. It has a myriad of uses: for example, to build phylogenetic trees, to improve productivity and...

Sanger Sequencing

Sanger Sequencing

DNA sequencing is a fundamental technique that is routinely used in the biological sciences. This method can be applied to a range of questions at different scales - from the sequencing of a cloned DNA fragment or the study of a mutation in a gene up to whole-genome sequencing. However, despite the widespread use of sequencing today, it was not until 1977 that Fredrick Sanger and his collaborators developed the chain-termination method to decode DNA sequences. It relies on the separation of a...

Genomic DNA in Eukaryotes

Genomic DNA in Eukaryotes

Eukaryotes have large genomes compared to prokaryotes. To fit their genomes into a cell, eukaryotic DNA is packaged extraordinarily tightly inside the nucleus. To achieve this, DNA is tightly wound around proteins called histones, which are packaged into nucleosomes that are joined by linker DNA and coil into chromatin fibers. Additional fibrous proteins further compact the chromatin, which is recognizable as chromosomes during certain phases of cell division.

Genome-wide Association Studies-GWAS

Genome-wide Association Studies-GWAS

Genome-wide association studies or GWAS are used to identify whether common SNPs are associated with certain diseases. Suppose specific SNPs are more frequently observed in individuals with a particular disease than those without the disease. In that case, those SNPs are said to be associated with the disease. Chi-square analysis is performed to check the probability of the allele likely to be associated with the disease.
GWAS does not require the identification of the target gene involved in...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

A complete human pancreatic cancer genome.

bioRxiv : the preprint server for biology·2026

Same author

Ambiguity in identifying parameters of an SIR model when fitting epidemic incidence data.

Mathematical biosciences and engineering : MBE·2026

Same author

A reference genome sequence for the exceptionally long-lived Great Basin bristlecone pine, Pinus longaeva.

G3 (Bethesda, Md.)·2026

Same author

Mitochondrial heteroplasmy is a risk factor for the development of chronic lymphocytic leukemia.

Nature communications·2026

Same author

Deep generative classification of blood cell morphology.

Nature machine intelligence·2025

Same author

Finishing a complete giraffe genome from telomere to telomere with Verkko-Fillet.

bioRxiv : the preprint server for biology·2025

Same journal

3DICE: Interpretable 3D Cross-Modal Learning for Drug-Target Interaction Prediction and Large-Scale Drug Discovery.

Bioinformatics (Oxford, England)·2026

Same journal

KASSPer: Kinase Active Site Structure Prediction using Protein and Ligand Language Models and Its Application to Virtual Screening.

Bioinformatics (Oxford, England)·2026

Same journal

IDR searcher: a search engine solution for public image resources.

Bioinformatics (Oxford, England)·2026

Same journal

KCFtools: Rapid alignment-free method for introgression screening and GWAS using k-mer profiles.

Bioinformatics (Oxford, England)·2026

Same journal

Meta2DB: Curated shotgun metagenomic feature sets and metadata for health state prediction.

Bioinformatics (Oxford, England)·2026

Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 8, 2026

Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies

Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies

Published on: August 20, 2021

The MaSuRCA genome assembler.

Aleksey V Zimin¹, Guillaume Marçais, Daniela Puiu

¹Institute for Physical Sciences and Technology, University of Maryland, College Park, MD 20742, USA, Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA, Department of Mathematics and Department of Physics, University of Maryland, College Park, MD 20742, USA.

Bioinformatics (Oxford, England)

|August 31, 2013

Summary

This summary is machine-generated.

The Maryland Super-Read Celera Assembler (MaSuRCA) is a novel hybrid approach for genome assembly. It efficiently combines short and long sequencing reads, outperforming existing methods for high-quality genome assembly.

More Related Videos

Metagenomic Analysis of Silage

Metagenomic Analysis of Silage

Published on: January 13, 2017

Novel Sequence Discovery by Subtractive Genomics

Novel Sequence Discovery by Subtractive Genomics

Published on: January 25, 2019

Related Experiment Videos

Last Updated: May 8, 2026

Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies

Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies

Published on: August 20, 2021

Metagenomic Analysis of Silage

Metagenomic Analysis of Silage

Published on: January 13, 2017

Novel Sequence Discovery by Subtractive Genomics

Novel Sequence Discovery by Subtractive Genomics

Published on: January 25, 2019

Area of Science:

Genomics
Bioinformatics
Computational Biology

Background:

Second-generation sequencing offers high coverage at low cost, driving demand for advanced genome assembly methods.
De Bruijn graph and overlap-based assembly strategies are effective but have limitations.
Existing assemblers struggle to integrate diverse read lengths and tolerate sequencing errors.

Purpose of the Study:

To develop a novel hybrid genome assembly approach.
To create a system that combines the efficiency of de Bruijn graphs with the flexibility of overlap-based methods.
To enable assembly of mixed read lengths from various sequencing technologies while tolerating errors.

Main Methods:

Developed the Maryland Super-Read Celera Assembler (MaSuRCA).
Employs a hybrid strategy transforming paired-end reads into longer 'super-reads'.
Integrates variable length reads from Illumina, 454, and Sanger sequencing.

Main Results:

MaSuRCA demonstrates competitive or superior performance compared to Allpaths-LG and SOAPdenovo2 on bacterial and mouse genome datasets.
The assembler effectively handles mixtures of Illumina reads with longer 454 and Sanger reads.
Augmenting data with long reads significantly improves MaSuRCA's assembly quality.

Conclusions:

MaSuRCA offers an efficient and flexible solution for genome assembly using diverse sequencing data.
The super-read approach effectively addresses challenges posed by variable read lengths and sequencing errors.
MaSuRCA represents a significant advancement in handling mixed sequencing technologies for high-quality genome reconstruction.