GradHC: highly reliable gradual hash-based clustering for DNA storage systems
View abstract on PubMed
Summary
This summary is machine-generated.Synthetic DNA offers high-density data storage. A new algorithm, Gradual Hash-based clustering (GradHC), accurately groups DNA reads, improving storage system reliability and robustness against errors.
Area Of Science
- Biotechnology
- Bioinformatics
- Data Science
Background
- Growing data storage demands necessitate novel solutions beyond traditional methods.
- Synthetic DNA offers exceptional density and durability for long-term data archiving.
- Current DNA data storage faces challenges with read errors and accurate data reconstruction.
Purpose Of The Study
- To address the critical task of clustering DNA reads in DNA storage systems.
- To introduce and evaluate a novel clustering algorithm for DNA data storage.
Main Methods
- Reviewed existing methods for evaluating clustering algorithms.
- Developed and implemented Gradual Hash-based clustering (GradHC).
- Benchmarked GradHC against other clustering algorithms for DNA storage.
Main Results
- GradHC demonstrates high accuracy in clustering diverse DNA designs, including varying strand lengths and cluster sizes.
- The algorithm effectively handles different error ranges inherent in DNA sequencing.
- Benchmark analysis shows GradHC is more stable and robust than prior algorithms.
Conclusions
- GradHC provides a reliable and robust solution for clustering DNA reads in DNA storage systems.
- The algorithm's performance across various conditions makes it suitable for practical DNA data storage applications.
Related Concept Videos
Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...
DNA isolation protocols can be fast and straightforward or complex and time-consuming depending on the type and quality of DNA required for further processing. For example, plasmid DNA extraction is a bit more complicated than genomic DNA extraction because of the need for an appropriate lysis method to separate plasmid DNA from gDNA during isolation. However, for specific applications, such as long-range DNA sequencing that require a good yield of high- quality DNA samples, we need to follow...
RNA sequencing, or RNA-Seq, is a high-throughput sequencing technology used to study the transcriptome of a cell. Transcriptomics helps to interpret the functional elements of a genome and identify the molecular constituents of an organism. Additionally, it also helps in understanding the development of an organism and the occurrence of diseases.
Before the discovery of RNA-seq, microarray-based methods and Sanger sequencing were used for transcriptome analysis. However, while...
Two structural features of the DNA molecule provide a basis for the mechanisms of heredity: the four nucleotide bases and its double-stranded nature. The Watson-Crick model of double-helical DNA structure, proposed in 1952, drew heavily upon the X-ray crystallography work of researchers Rosalind Franklin and Maurice Wilkins. Watson, Crick, and Wilkins jointly received the Nobel Prize in Physiology or Medicine for their work in 1962. Franklin was, controversially, excluded from the prize for...
The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features....

