Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Genome Annotation and Assembly

Genome Annotation and Assembly

The genome refers to all of the genetic material in an organism. It can range from a few million base pairs in microbial cells to several billion base pairs in many eukaryotic organisms. Genome assembly refers to the process of taking the DNA sequencing data and putting it all back together in a correct order to create a close representation of the original genome. This is followed by the identification of functional elements on the newly assembled genome, a process called genome annotation.

Base-pairing and DNA Repair

Base-pairing and DNA Repair

Nucleic Acids and Nucleotides

Nucleic Acids and Nucleotides

Nucleic acids are the most important macromolecules for the continuity of life. They carry the cell's genetic blueprint and have instructions for its functioning. The two main types of nucleic acids are deoxyribonucleic acid (DNA) and ribonucleic acid (RNA).
Deoxyribonucleic Acid (DNA)
DNA is the genetic material in all living organisms, ranging from single-celled bacteria to multicellular mammals. It is in the nucleus of eukaryotes and the organelles such as chloroplasts and mitochondria....

Next-generation Sequencing

Next-generation Sequencing

The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features....

Multi-species Conserved Sequences

Multi-species Conserved Sequences

Next-generation sequencing technologies have created large genomic databases of a variety of animals and plants. Ever since the human genome project was completed, scientists studied the genome of primates, mammals, and other phylogenetically distant living beings. Such large-scale studies have provided new insights into the evolutionary relationship between organisms.
Although the genome of each species varies greatly from each other, a few sequences are highly conserved. Such conserved...

Nucleic Acid Structure

Nucleic Acid Structure

The pentose sugar in DNA is deoxyribose, while in RNA the pentose sugar is ribose. The difference between the sugars is the presence of the hydroxyl group on the ribose's second carbon and a hydrogen on the deoxyribose's second carbon. The phosphate residue attaches to the hydroxyl group of the 5′ carbon of one sugar and the hydroxyl group of the 3′ carbon of the sugar of the next nucleotide, which forms a 5′ to 3′ phosphodiester linkage.
DNA Structure
DNA...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Deep learning in tumour genomics: from multi-omics integration to precision oncology.

Open biology·2026

Same author

How can biological databases support the new UN mechanism for benefit-sharing from digital sequence information?

Scientific data·2026

Same author

StNF-YC9 and StWRKY75 synergistically regulate StPAP10b-mediated root phosphatase activity to drive soil organic phosphorus mobilization in potato.

The Plant cell·2026

Same author

Rapid phylogenomic analysis for viral surveillance and metagenomic profiling with Omni2Tree.

bioRxiv : the preprint server for biology·2026

Same author

Population-scale interpretation of RNA isoform diversity enabled by Isopedia.

bioRxiv : the preprint server for biology·2026

Same author

Scalable and comprehensive mosaic variant calling using DRAGEN.

medRxiv : the preprint server for health sciences·2026

Same journal

Real-time Targeted Enrichment in Single-cell Long-read Sequencing.

Genomics, proteomics & bioinformatics·2026

Same journal

Decoding RNA N6-Methyladenosine Methylome of Wheat Using Machine Learning and Nanopore Direct RNA Sequencing.

Genomics, proteomics & bioinformatics·2026

Same journal

Tranquillyzer: A Neural Network Framework for Long-read Annotation and Demultiplexing.

Genomics, proteomics & bioinformatics·2026

Same journal

Advancing Functional Transcriptomics in Zebrafish with High-accuracy Full-length RNA Sequencing.

Genomics, proteomics & bioinformatics·2026

Same journal

NanoRAPID: A Deep Learning-based Framework for Single-molecule RNA Structure Analysis Using Nanopore Direct RNA Sequencing.

Genomics, proteomics & bioinformatics·2026

Same journal

Single-cell Multiomic and Spatiotemporal Dissection of the Liver Circadian Clock.

Genomics, proteomics & bioinformatics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 23, 2025

Determining the Likelihood of Variant Pathogenicity Using Amino Acid-level Signal-to-Noise Analysis of Genetic Variation

Determining the Likelihood of Variant Pathogenicity Using Amino Acid-level Signal-to-Noise Analysis of Genetic Variation

Published on: January 16, 2019

GenBase: A Nucleotide Sequence Database.

Congfan Bu^1,2, Xinchang Zheng^1,2,3, Xuetong Zhao^1,2

¹National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China.

Genomics, Proteomics & Bioinformatics

|June 24, 2024

Summary

This summary is machine-generated.

GenBase is a new open-access repository for managing vast nucleotide sequence data, improving archiving and sharing. It enhances genomic data accessibility through INSDC standards and efficient data exchange.

Keywords:

Database GenBank GenBase INSDC Nucleotide sequence

More Related Videos

Novel Sequence Discovery by Subtractive Genomics

Novel Sequence Discovery by Subtractive Genomics

Published on: January 25, 2019

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease

Published on: April 4, 2018

Related Experiment Videos

Last Updated: Jun 23, 2025

Determining the Likelihood of Variant Pathogenicity Using Amino Acid-level Signal-to-Noise Analysis of Genetic Variation

Determining the Likelihood of Variant Pathogenicity Using Amino Acid-level Signal-to-Noise Analysis of Genetic Variation

Published on: January 16, 2019

Novel Sequence Discovery by Subtractive Genomics

Novel Sequence Discovery by Subtractive Genomics

Published on: January 25, 2019

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease

Published on: April 4, 2018

Area of Science:

Genomics
Bioinformatics
Data Management

Background:

Sequencing technologies generate massive amounts of data, creating challenges for efficient management and timely access.
Existing data repositories face difficulties in handling the exponential growth of sequence information.

Purpose of the Study:

To introduce GenBase, an open-access data repository designed for efficient archiving, searching, and sharing of nucleotide sequences.
To address the challenges posed by the rapid advancement of sequencing technologies and the increasing volume of genomic data.

Main Methods:

GenBase adheres to International Nucleotide Sequence Database Collaboration (INSDC) data standards.
It offers bilingual submission pipelines, local submission assistance, and a unique Excel format for metadata and annotation.
A real-time data validation system streamlines sequence submissions.

Main Results:

As of April 23, 2024, GenBase houses 68,251 nucleotide and 689,574 annotated protein sequences from 414 species.
Over 90% of submitted sequences are released and publicly accessible via web, FTP, and API.
An effective data exchange mechanism with GenBank has been established, enabling sequence sharing.

Conclusions:

GenBase provides a robust solution for managing and sharing large-scale nucleotide sequence data.
It actively contributes to global genomic data management by integrating with GenBank and ensuring public accessibility.
The repository streamlines data submission and enhances the discoverability of genomic information.