scholarly journals Enzymatic DNA synthesis for digital information storage

2018 ◽  
Author(s):  
Henry H. Lee ◽  
Reza Kalhor ◽  
Naveen Goela ◽  
Jean Bolot ◽  
George M. Church

AbstractDNA is an emerging storage medium for digital data but its adoption is hampered by limitations of phosphoramidite chemistry, which was developed for single-base accuracy required for biological functionality. Here, we establish a de novo enzymatic DNA synthesis strategy designed from the bottom-up for information storage. We harness a template-independent DNA polymerase for controlled synthesis of sequences with user-defined information content. We demonstrate retrieval of 144-bits, including addressing, from perfectly synthesized DNA strands using batch-processed Illumina and real-time Oxford Nanopore sequencing. We then develop a codec for data retrieval from populations of diverse but imperfectly synthesized DNA strands, each with a ~30% error tolerance. With this codec, we experimentally validate a kilobyte-scale design which stores 1 bit per nucleotide. Simulations of the codec support reliable and robust storage of information for large-scale systems. This work paves the way for alternative synthesis and sequencing strategies to advance information storage in DNA.

2019 ◽  
Vol 15 (01) ◽  
pp. 1-8
Author(s):  
Ashish C Patel ◽  
C G Joshi

Current data storage technologies cannot keep pace longer with exponentially growing amounts of data through the extensive use of social networking photos and media, etc. The "digital world” with 4.4 zettabytes in 2013 has predicted it to reach 44 zettabytes by 2020. From the past 30 years, scientists and researchers have been trying to develop a robust way of storing data on a medium which is dense and ever-lasting and found DNA as the most promising storage medium. Unlike existing storage devices, DNA requires no maintenance, except the need to store at a cool and dark place. DNA has a small size with high density; just 1 gram of dry DNA can store about 455 exabytes of data. DNA stores the informations using four bases, viz., A, T, G, and C, while CDs, hard disks and other devices stores the information using 0’s and 1’s on the spiral tracks. In the DNA based storage, after binarization of digital file into the binary codes, encoding and decoding are important steps in DNA based storage system. Once the digital file is encoded, the next step is to synthesize arbitrary single-strand DNA sequences and that can be stored in the deep freeze until use.When there is a need for information to be recovered, it can be done using DNA sequencing. New generation sequencing (NGS) capable of producing sequences with very high throughput at a much lower cost about less than 0.1 USD for one MB of data than the first sequencing technologies. Post-sequencing processing includes alignment of all reads using multiple sequence alignment (MSA) algorithms to obtain different consensus sequences. The consensus sequence is decoded as the reversal of the encoding process. Most prior DNA data storage efforts sequenced and decoded the entire amount of stored digital information with no random access, but nowadays it has become possible to extract selective files (e.g., retrieving only required image from a collection) from a DNA pool using PCR-based random access. Various scientists successfully stored up to 110 zettabytes data in one gram of DNA. In the future, with an efficient encoding, error corrections, cheaper DNA synthesis,and sequencing, DNA based storage will become a practical solution for storage of exponentially growing digital data.


2021 ◽  
Vol 23 (4) ◽  
pp. 796-815
Author(s):  
Yang Wang ◽  
Sun Sun Lim

People are today located in media ecosystems in which a variety of ICT devices and platforms coexist and complement each other to fulfil users’ heterogeneous requirements. These multi-media affordances promote a highly hyperlinked and nomadic habit of digital data management which blurs the long-standing boundaries between information storage, sharing and exchange. Specifically, during the pervasive sharing and browsing of fragmentary digital information (e.g. photos, videos, online diaries, news articles) across various platforms, life experiences and knowledge involved are meanwhile classified and stored for future retrieval and collective memory construction. For international migrants who straddle different geographical and cultural contexts, management of various digital materials is particularly complicated as they have to be familiar with and appropriately navigate technological infrastructures of both home and host countries. Drawing on ethnographic observations of 40 Chinese migrant mothers in Singapore, this article delves into their quotidian routines of acquiring, storing, sharing and exchanging digital information across a range of ICT devices and platforms, as well as cultural and emotional implications of these mediated behaviours for their everyday life experiences. A multi-layer and multi-sited repertoire of ‘life archiving’ was identified among these migrant mothers in which they leave footprints of everyday life through a tactical combination of interactive sharing, pervasive tagging and backup storage of diverse digital content.


2020 ◽  
Vol 117 (31) ◽  
pp. 18489-18496 ◽  
Author(s):  
William H. Press ◽  
John A. Hawkins ◽  
Stephen K. Jones ◽  
Jeffrey M. Schaub ◽  
Ilya J. Finkelstein

Synthetic DNA is rapidly emerging as a durable, high-density information storage platform. A major challenge for DNA-based information encoding strategies is the high rate of errors that arise during DNA synthesis and sequencing. Here, we describe the HEDGES (Hash Encoded, Decoded by Greedy Exhaustive Search) error-correcting code that repairs all three basic types of DNA errors: insertions, deletions, and substitutions. HEDGES also converts unresolved or compound errors into substitutions, restoring synchronization for correction via a standard Reed–Solomon outer code that is interleaved across strands. Moreover, HEDGES can incorporate a broad class of user-defined sequence constraints, such as avoiding excess repeats, or too high or too low windowed guanine–cytosine (GC) content. We test our code both via in silico simulations and with synthesized DNA. From its measured performance, we develop a statistical model applicable to much larger datasets. Predicted performance indicates the possibility of error-free recovery of petabyte- and exabyte-scale data from DNA degraded with as much as 10% errors. As the cost of DNA synthesis and sequencing continues to drop, we anticipate that HEDGES will find applications in large-scale error-free information encoding.


2020 ◽  
Author(s):  
Min Hao ◽  
Hongyan Qiao ◽  
Yanmin Gao ◽  
Zhaoguan Wang ◽  
Xin Qiao ◽  
...  

AbstractDNA emerged as novel material for mass data storage, the serious problem human society is facing. Taking advantage of current synthesis capacity, massive oligo pool demonstrated its high-potential in data storage in test tube. Herein, mixed culture of bacterial cells carrying mass oligo pool that was assembled in a high copy plasmid was presented as a stable material for large scale data storage. Living cells data storage was fabricated by a multiple-steps process, assembly, transformation and mixed culture. The underlying principle was explored by deep bioinformatic analysis. Although homology assembly showed sequence context dependent bias but the massive digital information oligos in mixed culture were constant over multiple successive passaging. In pushing the limitation, over ten thousand distinct oligos, totally 2304 Kbps encoding 445 KB digital data including texts and images, were stored in bacterial cell, the largest archival data storage in living cell reported so far. The mixed culture of living cell data storage opens up a new approach to simply bridge the in vitro and in vivo storage system with combined advantage of both storage capability and economical information propagation.


2014 ◽  
Vol 11 (5) ◽  
pp. 499-507 ◽  
Author(s):  
Sriram Kosuri ◽  
George M Church
Keyword(s):  

2020 ◽  
Author(s):  
Zhi Ping ◽  
Haoling Zhang ◽  
Shihong Chen ◽  
Qianlong Zhuang ◽  
Sha Joe Zhu ◽  
...  

AbstractChamaeleo is currently the only collection library that focuses on adapting multiple well-established coding schemes for DNA storage. It provides a tool for researchers to study various coding schemes and apply them in practice. Chamaeleo adheres to the concept of high aggregation and low coupling for software design which will enhance the performance efficiency. Here, we describe the working pipeline of Chamaeleo, and demonstrate its advantages over the implementation of existing single coding schemes. The source code is available at https://github.com/ntpz870817/Chamaeleo, it can be also installed by the command of pip.exe, “pip install chamaeleo”. Alternatively, the wheel file can be downloaded at https://pypi.org/project/Chamaeleo/. Detailed documentation is available at https://chamaeleo.readthedocs.io/en/latest/.Author SummaryDNA is now considered to be a promising candidate media for future digital information storage in order to tackle the global issue of data explosion. Transcoding between binary digital data and quanternary DNA information is one of the most important steps in the whole process of DNA digital storage. Although several coding schemes have been reported, researchers are still investigating better strategies. Moreover, the scripts of these coding schemes use different programming languages, software architectures and optimization contents. Therefore, we here introduce Chamaeleo, a library in which several classical coding schemes are collected, to reconstruct and optimize them. One of the key features of this tool is that we modulize the functions and make it feasible for more customized way of usage. Meanwhile, developers can also incorporate their new algorithms according to the framework expediently. Based on the benchmark tests we conducted, Chamaeleo shows better flexibility and expandability compared to original packages and we hope that it will help the further study and applications in DNA digital storage.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Henry H. Lee ◽  
Reza Kalhor ◽  
Naveen Goela ◽  
Jean Bolot ◽  
George M. Church

Author(s):  
Weigang Chen ◽  
Mingzhe Han ◽  
Jianting Zhou ◽  
Qi Ge ◽  
Panpan Wang ◽  
...  

Abstract DNA digital storage provides an alternative for information storage with high density and long-term stability. Here, we report the de novo design and synthesis of an artificial chromosome that encodes two pictures and a video clip. The encoding paradigm utilizing the superposition of sparsified error correction codewords and pseudo-random sequences tolerates base insertions/deletions and is well suited to error-prone nanopore sequencing for data retrieval. The entire 254 kb sequence was 95.27% occupied by encoded data. The Transformation-Associated Recombination method was used in the construction of this chromosome from DNA fragments and necessary autonomous replication sequences. The stability was demonstrated by transmitting the data-carrying chromosome to the 100th generation. This study demonstrates a data storage method using encoded artificial chromosomes via in vivo assembly for write-once and stable replication for multiple retrievals, similar to a compact disc, with potential in economically massive data distribution.


2016 ◽  
Vol 44 (3) ◽  
pp. 702-708 ◽  
Author(s):  
Nicola J. Patron

Synthetic biology aims to apply engineering principles to the design and modification of biological systems and to the construction of biological parts and devices. The ability to programme cells by providing new instructions written in DNA is a foundational technology of the field. Large-scale de novo DNA synthesis has accelerated synthetic biology by offering custom-made molecules at ever decreasing costs. However, for large fragments and for experiments in which libraries of DNA sequences are assembled in different combinations, assembly in the laboratory is still desirable. Biological assembly standards allow DNA parts, even those from multiple laboratories and experiments, to be assembled together using the same reagents and protocols. The adoption of such standards for plant synthetic biology has been cohesive for the plant science community, facilitating the application of genome editing technologies to plant systems and streamlining progress in large-scale, multi-laboratory bioengineering projects.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Peter Michael Schwarz ◽  
Bernd Freisleben

Abstract Background DNA is a promising storage medium for high-density long-term digital data storage. Since DNA synthesis and sequencing are still relatively expensive tasks, the coding methods used to store digital data in DNA should correct errors and avoid unstable or error-prone DNA sequences. Near-optimal rateless erasure codes, also called fountain codes, are particularly interesting codes to realize high-capacity and low-error DNA storage systems, as shown by Erlich and Zielinski in their approach based on the Luby transform (LT) code. Since LT is the most basic fountain code, there is a large untapped potential for improvement in using near-optimal erasure codes for DNA storage. Results We present NOREC4DNA, a software framework to use, test, compare, and improve near-optimal rateless erasure codes (NORECs) for DNA storage systems. These codes can effectively be used to store digital information in DNA and cope with the restrictions of the DNA medium. Additionally, they can adapt to possible variable lengths of DNA strands and have nearly zero overhead. We describe the design and implementation of NOREC4DNA. Furthermore, we present experimental results demonstrating that NOREC4DNA can flexibly be used to evaluate the use of NORECs in DNA storage systems. In particular, we show that NORECs that apparently have not yet been used for DNA storage, such as Raptor and Online codes, can achieve significant improvements over LT codes that were used in previous work. NOREC4DNA is available on https://github.com/umr-ds/NOREC4DNA. Conclusion NOREC4DNA is a flexible and extensible software framework for using, evaluating, and comparing NORECs for DNA storage systems.


Sign in / Sign up

Export Citation Format

Share Document