Enzymatic DNA synthesis for digital information storage

Mapping Intimacies ◽

10.1101/348987 ◽

2018 ◽

Cited By ~ 5

Author(s):

Henry H. Lee ◽

Reza Kalhor ◽

Naveen Goela ◽

Jean Bolot ◽

George M. Church

Keyword(s):

Dna Synthesis ◽

Large Scale ◽

De Novo ◽

Data Retrieval ◽

Information Storage ◽

Digital Data ◽

Digital Information ◽

Synthesis Strategy ◽

Dna Strands ◽

Biological Functionality

AbstractDNA is an emerging storage medium for digital data but its adoption is hampered by limitations of phosphoramidite chemistry, which was developed for single-base accuracy required for biological functionality. Here, we establish a de novo enzymatic DNA synthesis strategy designed from the bottom-up for information storage. We harness a template-independent DNA polymerase for controlled synthesis of sequences with user-defined information content. We demonstrate retrieval of 144-bits, including addressing, from perfectly synthesized DNA strands using batch-processed Illumina and real-time Oxford Nanopore sequencing. We then develop a codec for data retrieval from populations of diverse but imperfectly synthesized DNA strands, each with a ~30% error tolerance. With this codec, we experimentally validate a kilobyte-scale design which stores 1 bit per nucleotide. Simulations of the codec support reliable and robust storage of information for large-scale systems. This work paves the way for alternative synthesis and sequencing strategies to advance information storage in DNA.

Download Full-text

Deoxyribonucleic Acid as a Tool for Digital Information Storage: An Overview

THE INDIAN JOURNAL OF VETERINARY SCIENCES AND BIOTECHNOLOGY ◽

10.21887/ijvsbt.15.1.1 ◽

2019 ◽

Vol 15 (01) ◽

pp. 1-8

Author(s):

Ashish C Patel ◽

C G Joshi

Keyword(s):

Data Storage ◽

Dna Sequences ◽

Consensus Sequence ◽

Random Access ◽

Information Storage ◽

Digital Data ◽

Digital Information ◽

Multiple Sequence ◽

Digital World ◽

Digital File

Current data storage technologies cannot keep pace longer with exponentially growing amounts of data through the extensive use of social networking photos and media, etc. The "digital world” with 4.4 zettabytes in 2013 has predicted it to reach 44 zettabytes by 2020. From the past 30 years, scientists and researchers have been trying to develop a robust way of storing data on a medium which is dense and ever-lasting and found DNA as the most promising storage medium. Unlike existing storage devices, DNA requires no maintenance, except the need to store at a cool and dark place. DNA has a small size with high density; just 1 gram of dry DNA can store about 455 exabytes of data. DNA stores the informations using four bases, viz., A, T, G, and C, while CDs, hard disks and other devices stores the information using 0’s and 1’s on the spiral tracks. In the DNA based storage, after binarization of digital file into the binary codes, encoding and decoding are important steps in DNA based storage system. Once the digital file is encoded, the next step is to synthesize arbitrary single-strand DNA sequences and that can be stored in the deep freeze until use.When there is a need for information to be recovered, it can be done using DNA sequencing. New generation sequencing (NGS) capable of producing sequences with very high throughput at a much lower cost about less than 0.1 USD for one MB of data than the first sequencing technologies. Post-sequencing processing includes alignment of all reads using multiple sequence alignment (MSA) algorithms to obtain different consensus sequences. The consensus sequence is decoded as the reversal of the encoding process. Most prior DNA data storage efforts sequenced and decoded the entire amount of stored digital information with no random access, but nowadays it has become possible to extract selective files (e.g., retrieving only required image from a collection) from a DNA pool using PCR-based random access. Various scientists successfully stored up to 110 zettabytes data in one gram of DNA. In the future, with an efficient encoding, error corrections, cheaper DNA synthesis,and sequencing, DNA based storage will become a practical solution for storage of exponentially growing digital data.

Download Full-text

Nomadic life archiving across platforms: Hyperlinked storage and compartmentalized sharing

New Media & Society ◽

10.1177/1461444820953507 ◽

2021 ◽

Vol 23 (4) ◽

pp. 796-815

Author(s):

Yang Wang ◽

Sun Sun Lim

Keyword(s):

Everyday Life ◽

Collective Memory ◽

Life Experiences ◽

Information Storage ◽

Digital Data ◽

Digital Content ◽

Digital Information ◽

Host Countries ◽

Multi Media ◽

Nomadic Life

People are today located in media ecosystems in which a variety of ICT devices and platforms coexist and complement each other to fulfil users’ heterogeneous requirements. These multi-media affordances promote a highly hyperlinked and nomadic habit of digital data management which blurs the long-standing boundaries between information storage, sharing and exchange. Specifically, during the pervasive sharing and browsing of fragmentary digital information (e.g. photos, videos, online diaries, news articles) across various platforms, life experiences and knowledge involved are meanwhile classified and stored for future retrieval and collective memory construction. For international migrants who straddle different geographical and cultural contexts, management of various digital materials is particularly complicated as they have to be familiar with and appropriately navigate technological infrastructures of both home and host countries. Drawing on ethnographic observations of 40 Chinese migrant mothers in Singapore, this article delves into their quotidian routines of acquiring, storing, sharing and exchanging digital information across a range of ICT devices and platforms, as well as cultural and emotional implications of these mediated behaviours for their everyday life experiences. A multi-layer and multi-sited repertoire of ‘life archiving’ was identified among these migrant mothers in which they leave footprints of everyday life through a tactical combination of interactive sharing, pervasive tagging and backup storage of diverse digital content.

Download Full-text

HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2004821117 ◽

2020 ◽

Vol 117 (31) ◽

pp. 18489-18496 ◽

Cited By ~ 3

Author(s):

William H. Press ◽

John A. Hawkins ◽

Stephen K. Jones ◽

Jeffrey M. Schaub ◽

Ilya J. Finkelstein

Keyword(s):

Dna Synthesis ◽

Large Scale ◽

Broad Class ◽

Error Correcting Code ◽

Gc Content ◽

Information Storage ◽

High Rate ◽

Information Encoding ◽

Outer Code ◽

Sequence Constraints

Synthetic DNA is rapidly emerging as a durable, high-density information storage platform. A major challenge for DNA-based information encoding strategies is the high rate of errors that arise during DNA synthesis and sequencing. Here, we describe the HEDGES (Hash Encoded, Decoded by Greedy Exhaustive Search) error-correcting code that repairs all three basic types of DNA errors: insertions, deletions, and substitutions. HEDGES also converts unresolved or compound errors into substitutions, restoring synchronization for correction via a standard Reed–Solomon outer code that is interleaved across strands. Moreover, HEDGES can incorporate a broad class of user-defined sequence constraints, such as avoiding excess repeats, or too high or too low windowed guanine–cytosine (GC) content. We test our code both via in silico simulations and with synthesized DNA. From its measured performance, we develop a statistical model applicable to much larger datasets. Predicted performance indicates the possibility of error-free recovery of petabyte- and exabyte-scale data from DNA degraded with as much as 10% errors. As the cost of DNA synthesis and sequencing continues to drop, we anticipate that HEDGES will find applications in large-scale error-free information encoding.

Download Full-text

Mixed Culture of Bacterial Cell for Large Scale DNA Storage

10.1101/2020.02.21.960476 ◽

2020 ◽

Author(s):

Min Hao ◽

Hongyan Qiao ◽

Yanmin Gao ◽

Zhaoguan Wang ◽

Xin Qiao ◽

...

Keyword(s):

Mixed Culture ◽

Data Storage ◽

Bacterial Cell ◽

Living Cell ◽

Large Scale ◽

Digital Data ◽

Human Society ◽

Bacterial Cells ◽

Digital Information

AbstractDNA emerged as novel material for mass data storage, the serious problem human society is facing. Taking advantage of current synthesis capacity, massive oligo pool demonstrated its high-potential in data storage in test tube. Herein, mixed culture of bacterial cells carrying mass oligo pool that was assembled in a high copy plasmid was presented as a stable material for large scale data storage. Living cells data storage was fabricated by a multiple-steps process, assembly, transformation and mixed culture. The underlying principle was explored by deep bioinformatic analysis. Although homology assembly showed sequence context dependent bias but the massive digital information oligos in mixed culture were constant over multiple successive passaging. In pushing the limitation, over ten thousand distinct oligos, totally 2304 Kbps encoding 445 KB digital data including texts and images, were stored in bacterial cell, the largest archival data storage in living cell reported so far. The mixed culture of living cell data storage opens up a new approach to simply bridge the in vitro and in vivo storage system with combined advantage of both storage capability and economical information propagation.

Download Full-text

Large-scale de novo DNA synthesis: technologies and applications

Nature Methods ◽

10.1038/nmeth.2918 ◽

2014 ◽

Vol 11 (5) ◽

pp. 499-507 ◽

Cited By ~ 383

Author(s):

Sriram Kosuri ◽

George M Church

Keyword(s):

Dna Synthesis ◽

Large Scale ◽

De Novo

Download Full-text

Chamaeleo: a robust library for DNA storage coding schemes

10.1101/2020.01.02.892588 ◽

2020 ◽

Author(s):

Zhi Ping ◽

Haoling Zhang ◽

Shihong Chen ◽

Qianlong Zhuang ◽

Sha Joe Zhu ◽

...

Keyword(s):

Programming Languages ◽

Information Storage ◽

Digital Data ◽

Digital Information ◽

Link Type ◽

Coding Schemes ◽

Whole Process ◽

Digital Storage ◽

Global Issue ◽

Dna Storage

AbstractChamaeleo is currently the only collection library that focuses on adapting multiple well-established coding schemes for DNA storage. It provides a tool for researchers to study various coding schemes and apply them in practice. Chamaeleo adheres to the concept of high aggregation and low coupling for software design which will enhance the performance efficiency. Here, we describe the working pipeline of Chamaeleo, and demonstrate its advantages over the implementation of existing single coding schemes. The source code is available at https://github.com/ntpz870817/Chamaeleo, it can be also installed by the command of pip.exe, “pip install chamaeleo”. Alternatively, the wheel file can be downloaded at https://pypi.org/project/Chamaeleo/. Detailed documentation is available at https://chamaeleo.readthedocs.io/en/latest/.Author SummaryDNA is now considered to be a promising candidate media for future digital information storage in order to tackle the global issue of data explosion. Transcoding between binary digital data and quanternary DNA information is one of the most important steps in the whole process of DNA digital storage. Although several coding schemes have been reported, researchers are still investigating better strategies. Moreover, the scripts of these coding schemes use different programming languages, software architectures and optimization contents. Therefore, we here introduce Chamaeleo, a library in which several classical coding schemes are collected, to reconstruct and optimize them. One of the key features of this tool is that we modulize the functions and make it feasible for more customized way of usage. Meanwhile, developers can also incorporate their new algorithms according to the framework expediently. Based on the benchmark tests we conducted, Chamaeleo shows better flexibility and expandability compared to original packages and we hope that it will help the further study and applications in DNA digital storage.

Download Full-text

Terminator-free template-independent enzymatic DNA synthesis for digital information storage

Nature Communications ◽

10.1038/s41467-019-10258-1 ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 31

Author(s):

Henry H. Lee ◽

Reza Kalhor ◽

Naveen Goela ◽

Jean Bolot ◽

George M. Church

Keyword(s):

Dna Synthesis ◽

Information Storage ◽

Digital Information

Download Full-text

An artificial chromosome for data storage

National Science Review ◽

10.1093/nsr/nwab028 ◽

2021 ◽

Cited By ~ 3

Author(s):

Weigang Chen ◽

Mingzhe Han ◽

Jianting Zhou ◽

Qi Ge ◽

Panpan Wang ◽

...

Keyword(s):

Data Storage ◽

De Novo ◽

Video Clip ◽

Data Retrieval ◽

Artificial Chromosome ◽

Information Storage ◽

Artificial Chromosomes ◽

Design And Synthesis ◽

The Stability

Abstract DNA digital storage provides an alternative for information storage with high density and long-term stability. Here, we report the de novo design and synthesis of an artificial chromosome that encodes two pictures and a video clip. The encoding paradigm utilizing the superposition of sparsified error correction codewords and pseudo-random sequences tolerates base insertions/deletions and is well suited to error-prone nanopore sequencing for data retrieval. The entire 254 kb sequence was 95.27% occupied by encoded data. The Transformation-Associated Recombination method was used in the construction of this chromosome from DNA fragments and necessary autonomous replication sequences. The stability was demonstrated by transmitting the data-carrying chromosome to the 100th generation. This study demonstrates a data storage method using encoded artificial chromosomes via in vivo assembly for write-once and stable replication for multiple retrievals, similar to a compact disc, with potential in economically massive data distribution.

Download Full-text

Blueprints for green biotech: development and application of standards for plant synthetic biology

Biochemical Society Transactions ◽

10.1042/bst20160044 ◽

2016 ◽

Vol 44 (3) ◽

pp. 702-708 ◽

Cited By ~ 6

Author(s):

Nicola J. Patron

Keyword(s):

Synthetic Biology ◽

Dna Synthesis ◽

Genome Editing ◽

Dna Sequences ◽

Large Scale ◽

De Novo ◽

Plant Science ◽

Custom Made ◽

Science Community ◽

Plant Synthetic Biology

Synthetic biology aims to apply engineering principles to the design and modification of biological systems and to the construction of biological parts and devices. The ability to programme cells by providing new instructions written in DNA is a foundational technology of the field. Large-scale de novo DNA synthesis has accelerated synthetic biology by offering custom-made molecules at ever decreasing costs. However, for large fragments and for experiments in which libraries of DNA sequences are assembled in different combinations, assembly in the laboratory is still desirable. Biological assembly standards allow DNA parts, even those from multiple laboratories and experiments, to be assembled together using the same reagents and protocols. The adoption of such standards for plant synthetic biology has been cohesive for the plant science community, facilitating the application of genome editing technologies to plant systems and streamlining progress in large-scale, multi-laboratory bioengineering projects.

Download Full-text

NOREC4DNA: using near-optimal rateless erasure codes for DNA storage

BMC Bioinformatics ◽

10.1186/s12859-021-04318-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Peter Michael Schwarz ◽

Bernd Freisleben

Keyword(s):

Data Storage ◽

Dna Sequences ◽

Storage Systems ◽

High Capacity ◽

Digital Data ◽

Erasure Codes ◽

Software Framework ◽

Digital Information ◽

Dna Storage ◽

Dna Strands

Abstract Background DNA is a promising storage medium for high-density long-term digital data storage. Since DNA synthesis and sequencing are still relatively expensive tasks, the coding methods used to store digital data in DNA should correct errors and avoid unstable or error-prone DNA sequences. Near-optimal rateless erasure codes, also called fountain codes, are particularly interesting codes to realize high-capacity and low-error DNA storage systems, as shown by Erlich and Zielinski in their approach based on the Luby transform (LT) code. Since LT is the most basic fountain code, there is a large untapped potential for improvement in using near-optimal erasure codes for DNA storage. Results We present NOREC4DNA, a software framework to use, test, compare, and improve near-optimal rateless erasure codes (NORECs) for DNA storage systems. These codes can effectively be used to store digital information in DNA and cope with the restrictions of the DNA medium. Additionally, they can adapt to possible variable lengths of DNA strands and have nearly zero overhead. We describe the design and implementation of NOREC4DNA. Furthermore, we present experimental results demonstrating that NOREC4DNA can flexibly be used to evaluate the use of NORECs in DNA storage systems. In particular, we show that NORECs that apparently have not yet been used for DNA storage, such as Raptor and Online codes, can achieve significant improvements over LT codes that were used in previous work. NOREC4DNA is available on https://github.com/umr-ds/NOREC4DNA. Conclusion NOREC4DNA is a flexible and extensible software framework for using, evaluating, and comparing NORECs for DNA storage systems.

Download Full-text