scholarly journals DNA Punch Cards: Storing Data on Native DNA Sequences via Nicking

2019 ◽  
Author(s):  
S Kasra Tabatabaei ◽  
Boya Wang ◽  
Nagendra Bala Murali Athreya ◽  
Behnam Enghiad ◽  
Alvaro Gonzalo Hernandez ◽  
...  

AbstractSynthetic DNA-based data storage systems have received significant attention due to the promise of ultrahigh storage density and long-term stability. However, all platforms proposed so far suffer from high cost, read-write latency and error-rates that render them noncompetitive with modern optical and magnetic storage devices. One means to avoid synthesizing DNA and to reduce the system error-rates is to use readily available native DNA. As the symbol/nucleotide content of native DNA is fixed, one may adopt an alternative recording strategy that modifies the DNA topology to encode desired information. Here, we report the first macromolecular storage paradigm in which data is written in the form of “nicks (punches)” at predetermined positions on the sugar-phosphate backbone of native dsDNA. The platform accommodates parallel nicking on multiple “orthogonal” genomic DNA fragments and paired nicking and disassociation for creating “toehold” regions that enable single-bit random access and strand displacement in-memory computations. As a proof of concept, we used the programmable restriction enzyme Pyrococcus furiosus Argonaute to punch two files into the PCR products of Escherichia coli genomic DNA. The encoded data is accurately reconstructed through high-throughput sequencing and read alignment.

2019 ◽  
Author(s):  
Lee Organick ◽  
Yuan-Jyue Chen ◽  
Siena Dumas Ang ◽  
Randolph Lopez ◽  
Karin Strauss ◽  
...  

ABSTRACTSynthetic DNA has been gaining momentum as a potential storage medium for archival data storage1–9. Digital information is translated into sequences of nucleotides and the resulting synthetic DNA strands are then stored for later individual file retrieval via PCR7–9(Fig. 1a). Using a previously presented encoding scheme9and new experiments, we demonstrate reliable file recovery when as few as 10 copies per sequence are stored, on average. This results in density of about 17 exabytes/g, nearly two orders of magnitude greater than prior work has shown6. Further, no prior work has experimentally demonstrated access to specific files in a pool more complex than approximately 106unique DNA sequences9, leaving the issue of accurate file retrieval at high data density and complexity unexamined. Here, we demonstrate successful PCR random access using three files of varying sizes in a complex pool of over 1010unique sequences, with no evidence that we have begun to approach complexity limits. We further investigate the role of file size on successful data recovery, the effect of increasing sequencing coverage to aid file recovery, and whether DNA strands drop out of solution in a systematic manner. These findings substantiate the robustness of PCR as a random access mechanism in complex settings, and that the number of copies needed for data retrieval does not compromise density significantly.


2017 ◽  
Author(s):  
Lee Organick ◽  
Siena Dumas Ang ◽  
Yuan-Jyue Chen ◽  
Randolph Lopez ◽  
Sergey Yekhanin ◽  
...  

Current storage technologies can no longer keep pace with exponentially growing amounts of data. 1 Synthetic DNA offers an attractive alternative due to its potential information density of ~ 1018 B/mm3, 107 times denser than magnetic tape, and potential durability of thousands of years.2 Recent advances in DNA data storage have highlighted technical challenges, in particular, coding and random access, but have stored only modest amounts of data in synthetic DNA. 3,4,5 This paper demonstrates an end-to-end approach toward the viability of DNA data storage with large-scale random access. We encoded and stored 35 distinct files, totaling 200MB of data, in more than 13 million DNA oligonucleotides (about 2 billion nucleotides in total) and fully recovered the data with no bit errors, representing an advance of almost an order of magnitude compared to prior work. 6 Our data curation focused on technologically advanced data types and historical relevance, including the Universal Declaration of Human Rights in over 100 languages,7 a high-definition music video of the band OK Go,8 and a CropTrust database of the seeds stored in the Svalbard Global Seed Vault.9 We developed a random access methodology based on selective amplification, for which we designed and validated a large library of primers, and successfully retrieved arbitrarily chosen items from a subset of our pool containing 10.3 million DNA sequences. Moreover, we developed a novel coding scheme that dramatically reduces the physical redundancy (sequencing read coverage) required for error-free decoding to a median of 5x, while maintaining levels of logical redundancy comparable to the best prior codes. We further stress-tested our coding approach by successfully decoding a file using the more error-prone nanopore-based sequencing. We provide a detailed analysis of errors in the process of writing, storing, and reading data from synthetic DNA at a large scale, which helps characterize DNA as a storage medium and justify our coding approach. Thus, we have demonstrated a significant improvement in data volume, random access, and encoding/decoding schemes that contribute to a whole-system vision for DNA data storage.


2019 ◽  
Vol 15 (01) ◽  
pp. 1-8
Author(s):  
Ashish C Patel ◽  
C G Joshi

Current data storage technologies cannot keep pace longer with exponentially growing amounts of data through the extensive use of social networking photos and media, etc. The "digital world” with 4.4 zettabytes in 2013 has predicted it to reach 44 zettabytes by 2020. From the past 30 years, scientists and researchers have been trying to develop a robust way of storing data on a medium which is dense and ever-lasting and found DNA as the most promising storage medium. Unlike existing storage devices, DNA requires no maintenance, except the need to store at a cool and dark place. DNA has a small size with high density; just 1 gram of dry DNA can store about 455 exabytes of data. DNA stores the informations using four bases, viz., A, T, G, and C, while CDs, hard disks and other devices stores the information using 0’s and 1’s on the spiral tracks. In the DNA based storage, after binarization of digital file into the binary codes, encoding and decoding are important steps in DNA based storage system. Once the digital file is encoded, the next step is to synthesize arbitrary single-strand DNA sequences and that can be stored in the deep freeze until use.When there is a need for information to be recovered, it can be done using DNA sequencing. New generation sequencing (NGS) capable of producing sequences with very high throughput at a much lower cost about less than 0.1 USD for one MB of data than the first sequencing technologies. Post-sequencing processing includes alignment of all reads using multiple sequence alignment (MSA) algorithms to obtain different consensus sequences. The consensus sequence is decoded as the reversal of the encoding process. Most prior DNA data storage efforts sequenced and decoded the entire amount of stored digital information with no random access, but nowadays it has become possible to extract selective files (e.g., retrieving only required image from a collection) from a DNA pool using PCR-based random access. Various scientists successfully stored up to 110 zettabytes data in one gram of DNA. In the future, with an efficient encoding, error corrections, cheaper DNA synthesis,and sequencing, DNA based storage will become a practical solution for storage of exponentially growing digital data.


2017 ◽  
Vol 8 (2) ◽  
Author(s):  
Slobodan Obradović ◽  
Borivoje Milošević ◽  
Nikola Davidović

For decades, the memory hierarchy was determined based on latency, bandwidth, and cost between processors, random access memory (RAM), and secondary memory. Although the gap between the processor and RAM has been dampened by fast cache memory, the gap between RAM and secondary memory has remained challenging, expanding to 12 size range in 2015 and continuing to expand by around 50% per year. The rapid development of nanotechnology has triggered a new field in the organization of memory space. For more than a decade, FRAM - ferro random access memory has been in use, which keeps data in the form of a polarization of the ferroelectric crystal that does not lose polarization after the power is turned off. The real revolution is expected in the use of magnetic resonance random access memory (MRAM), which represents data storage technology using magnetic moments, not electric charges. Unlike conventional RAM chips, data in MRAM are not stored as an electrical charge, but with magnetic storage elements. The advantage of this memory is energy independence, that is, the storage of recorded data and the absence of power supply. MRAM has similar properties as SRAM, similar to the density of the record as dynamic RAM (DRAM), with much less consumption, and in relation to flash, it is much faster and with time does not degrade its performance. Theoretically, there is no limit to the number of read and write, so new memories could last unlimited. The paper will discuss this new type of memory organization.


2020 ◽  
Author(s):  
Callista Bee ◽  
Yuan-Jyue Chen ◽  
David Ward ◽  
Xiaomeng Liu ◽  
Georg Seelig ◽  
...  

AbstractSynthetic DNA has the potential to store the world’s continuously growing amount of data in an extremely dense and durable medium. Current proposals for DNA-based digital storage systems include the ability to retrieve individual files by their unique identifier, but not by their content. Here, we demonstrate content-based retrieval from a DNA database by learning a mapping from images to DNA sequences such that an encoded query image will retrieve visually similar images from the database via DNA hybridization. We encoded and synthesized a database of 1.6 million images and queried it with a variety of images, showing that each query retrieves a sample of the database containing visually similar images are retrieved at a rate much greater than chance. We compare our results with several algorithms for similarity search in electronic systems, and demonstrate that our molecular approach is competitive with state-of-the-art electronics.One Sentence SummaryLearned encodings enable content-based image similarity search from a database of 1.6 million images encoded in synthetic DNA.


Author(s):  
Yanmin Gao ◽  
Xin Chen ◽  
Jianye Hao ◽  
Chengwei Zhang ◽  
Hongyan Qiao ◽  
...  

AbstractIn DNA data storage, the massive sequence complexity creates challenges in repeatable and efficient information readout. Here, our study clearly demonstrated that canonical polymerase chain reaction (PCR) created significant DNA amplification biases, which greatly hinder fast and stable data retrieving from hundred-thousand synthetic DNA sequences encoding over 2.85 megabyte (MB) digital data. To mitigate the amplification bias, we adapted an isothermal DNA amplification for low-bias amplification of DNA pool with massive sequence complexity, and named the new method isothermal DNA reading (iDR). By using iDR, we were able to robustly and repeatedly retrieve the data stored in DNA strands attached on magnetic beads (MB) with significantly decreased sequencing reads, compared with the PCR method. Therefore, we believe that the low-bias iDR method provides an ideal platform for robust DNA data storage, and fast and reliable data readout.


2016 ◽  
Author(s):  
S. M. Hossein Tabatabaei Yazdi ◽  
Ryan Gabrys ◽  
Olgica Milenkovic

AbstractDNA-based data storage is an emerging nonvolatile memory technology of potentially unprecedented density, durability, and replication efficiency1,2,3,4,5,6. The basic system implementation steps include synthesizing DNA strings that contain user information and subsequently reading them via high-throughput sequencing technologies. All existing architectures enable reading and writing, while some also allow for editing3 and elementary sequencing error correction3,4. However, none of the current architectures offers error-free and random-access readouts from a portable device. Here we show through experimental and theoretical verification that such a platform may be easily implemented in practice using MinION sequencers. The gist of the approach is to design an integrated pipeline that encodes data to avoid synthesis and sequencing errors, enables random access through addressing, and leverages efficient portable nanopore sequencing via new anchored iterative alignment and insertion/deletion error-correcting codes. Our work represents the only known random access DNA-based data storage system that uses error-prone MinION sequencers and produces error-free readouts with the highest reported information rate and density.


2020 ◽  
Author(s):  
Lee Organick ◽  
Bichlien H. Nguyen ◽  
Rachel McAmis ◽  
Weida D. Chen ◽  
A. Xavier Kohll ◽  
...  

ABSTRACTSynthetic DNA has recently risen as a viable alternative for long-term digital data storage. To ensure that information is safely recovered after storage, it is essential to appropriately preserve the physical DNA molecules encoding the data. While preservation of biological DNA has been studied previously, synthetic DNA differs in that it is typically much shorter in length, it has different sequence profiles with fewer, if any, repeats (or homopolymers), and it has different contaminants. In this paper we evaluate nine different methods used to preserve data files encoded in synthetic DNA by accelerated aging of nearly 29,000 DNA sequences. In addition to a molecular count comparison, we also sequence and analyze the DNA after aging. Our findings show that errors and erasures are stochastic and show no practical distribution difference between preservation methods. Finally, we compare the physical density of these methods and provide a stability versus density trade-offs discussion.


2017 ◽  
Vol MCSP2017 (01) ◽  
pp. 7-10 ◽  
Author(s):  
Subhashree Rath ◽  
Siba Kumar Panda

Static random access memory (SRAM) is an important component of embedded cache memory of handheld digital devices. SRAM has become major data storage device due to its large storage density and less time to access. Exponential growth of low power digital devices has raised the demand of low voltage low power SRAM. This paper presents design and implementation of 6T SRAM cell in 180 nm, 90 nm and 45 nm standard CMOS process technology. The simulation has been done in Cadence Virtuoso environment. The performance analysis of SRAM cell has been evaluated in terms of delay, power and static noise margin (SNM).


Author(s):  
Kuldeepsingh A. Kalariya ◽  
Ram Prasnna Meena ◽  
Lipi Poojara ◽  
Deepa Shahi ◽  
Sandip Patel

Abstract Background Squalene synthase (SQS) is a rate-limiting enzyme necessary to produce pentacyclic triterpenes in plants. It is an important enzyme producing squalene molecules required to run steroidal and triterpenoid biosynthesis pathways working in competitive inhibition mode. Reports are available on information pertaining to SQS gene in several plants, but detailed information on SQS gene in Gymnema sylvestre R. Br. is not available. G. sylvestre is a priceless rare vine of central eco-region known for its medicinally important triterpenoids. Our work aims to characterize the GS-SQS gene in this high-value medicinal plant. Results Coding DNA sequences (CDS) with 1245 bp length representing GS-SQS gene predicted from transcriptome data in G. sylvestre was used for further characterization. The SWISS protein structure modeled for the GS-SQS amino acid sequence data had MolProbity Score of 1.44 and the Clash Score 3.86. The quality estimates and statistical score of Ramachandran plots analysis indicated that the homology model was reliable. For full-length amplification of the gene, primers designed from flanking regions of CDS encoding GS-SQS were used to get amplification against genomic DNA as template which resulted in approximately 6.2-kb sized single-band product. The sequencing of this product through NGS was carried out generating 2.32 Gb data and 3347 number of scaffolds with N50 value of 457 bp. These scaffolds were compared to identify similarity with other SQS genes as well as the GS-SQSs of the transcriptome. Scaffold_3347 representing the GS-SQS gene harbored two introns of 101 and 164 bp size. Both these intronic regions were validated by primers designed from adjoining outside regions of the introns on the scaffold representing GS-SQS gene. The amplification took place when the template was genomic DNA and failed when the template was cDNA confirmed the presence of two introns in GS-SQS gene in Gymnema sylvestre R. Br. Conclusion This study shows GS-SQS gene was very closely related to Coffea arabica and Gardenia jasminoides and this gene harbored two introns of 101 and 164 bp size.


Sign in / Sign up

Export Citation Format

Share Document