Efficient DNA-based data storage using shortmer combinatorial encoding
View abstract on PubMed
Summary
This summary is machine-generated.This study introduces combinatorial DNA encoding for data storage, achieving a 6.5-fold increase in logical density with near-zero reconstruction error. This novel approach enhances DNA data storage efficiency and robustness against errors.
Area Of Science
- Biotechnology
- Bioinformatics
- Information Science
Background
- DNA data storage offers a promising archival solution due to its high density and longevity.
- Leveraging composite DNA alphabets can increase storage capacity, but noisy inference poses a significant challenge.
- Existing methods struggle with large composite alphabets, limiting practical applications.
Purpose Of The Study
- To introduce a novel combinatorial DNA encoding approach for data storage.
- To enhance logical density and minimize reconstruction errors in DNA-based storage systems.
- To investigate the theoretical properties and practical implementation of combinatorial DNA encoding.
Main Methods
- Developed and defined combinatorial DNA encoding schemes using distinguishable DNA shortmers.
- Investigated theoretical properties including information density and reconstruction probabilities.
- Proposed an end-to-end system design with 2D error correction codes and reconstruction algorithms.
- Validated the approach through simulations and experimental construction using Gibson assembly.
Main Results
- Achieved up to a 6.5-fold increase in logical density compared to standard DNA storage.
- Demonstrated near-zero reconstruction error with the proposed combinatorial encoding and error correction.
- Successfully reconstructed combinatorial sequences, confirming the robustness against various error types.
- Simulations showed significant improvement in reconstruction rates using 2D Reed-Solomon error correction.
Conclusions
- Combinatorial shortmer encoding shows significant potential for efficient and error-resilient DNA-based data storage.
- Further development in DNA synthesis technologies supporting combinatorial synthesis is crucial.
- Combining combinatorial principles with advanced error-correcting strategies can lead to robust DNA storage solutions.
Related Concept Videos
Two structural features of the DNA molecule provide a basis for the mechanisms of heredity: the four nucleotide bases and its double-stranded nature. The Watson-Crick model of double-helical DNA structure, proposed in 1952, drew heavily upon the X-ray crystallography work of researchers Rosalind Franklin and Maurice Wilkins. Watson, Crick, and Wilkins jointly received the Nobel Prize in Physiology or Medicine for their work in 1962. Franklin was, controversially, excluded from the prize for...
Eukaryotes have large genomes compared to prokaryotes. To fit their genomes into a cell, eukaryotic DNA is packaged extraordinarily tightly inside the nucleus. To achieve this, DNA is tightly wound around proteins called histones, which are packaged into nucleosomes that are joined by linker DNA and coil into chromatin fibers. Additional fibrous proteins further compact the chromatin, which is recognizable as chromosomes during certain phases of cell division.
The Human Genome Measured in...
DNA sequencing is a fundamental technique that is routinely used in the biological sciences. This method can be applied to a range of questions at different scales - from the sequencing of a cloned DNA fragment or the study of a mutation in a gene up to whole-genome sequencing. However, despite the widespread use of sequencing today, it was not until 1977 that Fredrick Sanger and his collaborators developed the chain-termination method to decode DNA sequences. It relies on the separation of a...
In the same year as the discovery of the Sanger sequencing method, another group of scientists, Allan Maxam and Walter Gilbert, demonstrated their chemical-cleavage method for DNA sequencing. The Maxam-Gilbert method relies on using different chemicals that can cleave the DNA sequence at specific sites, the separation of resulting DNA fragments of variable size using electrophoresis, and deciphering the DNA sequence from the resulting gel bands.
Challenges of the Maxam-Gilbert Method
The...
Overview
Eukaryotes have large genomes compared to prokaryotes. In order to fit their genomes into a cell, eukaryotes must pack their DNA tightly inside the nucleus. To do so, DNA is wound around proteins called histones to form nucleosomes, the main unit of DNA packaging. Nucleosomes then coil into compact fibers known as chromatin.
You Have Enough DNA to Stretch to the Sun and Back Hundreds of Times
Most cells in the human body contain about 3 billion base pairs of DNA packaged into 23...
The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features....

