Related Concept Videos
Next-generation Sequencing
87.3K
The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features....
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features....
87.3K
Homologous Recombination
50.2K
The basic reaction of homologous recombination (HR) involves two chromatids that contain DNA sequences sharing a significant stretch of identity. One of these sequences uses a strand from another as a template to synthesize DNA in an enzyme-catalyzed reaction. The final product is a novel amalgamation of the two substrates. To ensure an accurate recombination of sequences, HR is restricted to the S and G2 phases of the cell cycle. At these stages, the DNA has been replicated already and the...
50.2K
Genome Annotation and Assembly
18.8K
The genome refers to all of the genetic material in an organism. It can range from a few million base pairs in microbial cells to several billion base pairs in many eukaryotic organisms. Genome assembly refers to the process of taking the DNA sequencing data and putting it all back together in a correct order to create a close representation of the original genome. This is followed by the identification of functional elements on the newly assembled genome, a process called genome annotation.
18.8K
DNA as a Genetic Template
6.7K
6.7K
Base Excision Repair
22.0K
One of the common DNA damages is the chemical alteration of single bases by alkylation, oxidation, or deamination. The altered bases cause mispairing and strand breakage during replication. This type of damage causes minimal change to the DNA double helix structure and can be repaired by the base excision repair (BER) pathways. BER corrects damaged DNA sequences by removing the damaged base and restoring the original base sequence using the complementary strand as a template.
The first step of...
The first step of...
22.0K
Long-patch Base Excision Repair
7.0K
Since the discovery of the two BER pathways, there has been a debate about how a cell chooses one pathway over the other and the factors determining this selection. Numerous in vitro experiments have pointed out multiple determinants for the sub-pathway selection. These are:
7.0K
Related Experiment Video

10:34
Ultra-long Read Sequencing for Whole Genomic DNA Analysis
Published on: March 15, 2019
22.7K
A generative adversarial network for multiple reads reconstruction in DNA storage.
Xiaodong Zheng1,2, Ranze Xie1, Xiangyu Yao1
1Institution of Computational Science and Technology, Guangzhou University, Guangzhou, China.
Scientific Reports
|December 31, 2024
View abstract on PubMed
Summary
This study introduces DNA-GAN, a novel method using generative adversarial networks to correct errors in DNA data storage reads. It effectively reconstructs sequences from noisy data, even with significant errors and contamination.
Area of Science:
- Bioinformatics
- Data Storage
- Machine Learning
Background:
- DNA data storage offers a promising solution to the escalating data explosion problem.
- Current DNA synthesis, PCR, and sequencing methods introduce significant errors (insertions, deletions, substitutions), particularly with third-generation sequencing technologies.
- Existing error-correction methods often struggle with high error rates and contamination.
Purpose of the Study:
- To develop a novel computational method for reconstructing accurate DNA sequences from erroneous reads in DNA data storage.
- To address the limitations of existing error-correction techniques, especially for third-generation sequencing data.
- To introduce a robust method capable of handling noisy data and irrelevant read contamination.
Main Methods:
- Transformation of multiple erroneous DNA reads into a noisy image representation.
- Construction and application of a conditional generative adversarial network (GAN) to generate a 'smooth' image representing the consensus sequence.
- Evaluation of the DNA-GAN model on two real-world datasets, including assessment of robustness against contaminated clusters.
Main Results:
- The proposed DNA-GAN model successfully reconstructed tested sequences with up to 5.9% errors.
- Demonstrated applicability to third-generation nanopore sequencing environments, outperforming transformer-based models tested only on next-generation sequencing data.
- Exhibited excellent robustness, maintaining performance even with up to 20% of clusters contaminated with irrelevant reads.
Conclusions:
- DNA-GAN represents a pioneering application of GANs for multi-read reconstruction in DNA-based storage systems.
- The method provides a viable solution for accurate sequence reconstruction in the presence of high error rates and data contamination.
- This approach holds significant potential for enhancing the reliability and practicality of DNA data storage, particularly with advanced sequencing technologies.

