Towards Practical and Robust DNA-based Data Archiving by Codec System Named ‘Yin-Yang’

Mapping Intimacies ◽

10.1101/829721 ◽

2019 ◽

Cited By ~ 1

Author(s):

Zhi Ping ◽

Shihong Chen ◽

Guangyu Zhou ◽

Xiaoluo Huang ◽

Sha Joe Zhu ◽

...

Keyword(s):

Data Storage ◽

Storage Capacity ◽

Gc Content ◽

Data Archiving ◽

Synthetic Dna ◽

Coding Schemes ◽

Efficient Storage ◽

File Formats ◽

Yin Yang ◽

Practical Feasibility

AbstractMotivationDNA has been reported as a promising medium of data storage for its remarkable durability and space-efficient storage capacity. Here, we propose a robust DNA-based data storage method based on a new codec algorithm, namely ‘Yin-Yang’.ResultsUsing this strategy, we successfully stored different file formats in a single synthetic DNA oligonucleotide pool. Compared to most well-established DNA-based data storage coding schemes presented to date, this codec system can achieve a variety of user goals (e.g. reduce homopolymer length to 3 or 4 at most, maintain balanced GC content between 40% and 60% and simple secondary structure with the Gibbs free energy above −30 kcal/mol). It also shows enhanced robustness in transcoding of different data structure and practical feasibility. We tested this codec with an end-to-end experiment including encoding, DNA synthesis, sequencing and decoding. Through successful retrieval of 3 files totaling 2.02 Megabits after sequencing and decoding, our strategy exhibits great qualities of achieving high storing capacity per nucleotide (427.1 PB/gram) and high fidelity of data recovery.

Download Full-text

Towards Practical and Robust DNA-Based Data Archiving Using ‘Yin-Yang Codec’ System

10.21203/rs.3.rs-536997/v1 ◽

2021 ◽

Author(s):

Zhi Ping ◽

Shihong Chen ◽

Guangyu Zhou ◽

Xiaoluo Huang ◽

Sha Joe Zhu ◽

...

Keyword(s):

Data Storage ◽

Dna Sequences ◽

Recovery Rate ◽

Yeast Cells ◽

Data Types ◽

Data Archiving ◽

Information Density ◽

Yin Yang

Abstract DNA is a promising data storage medium due to its remarkable durability and space-efficient storage. Early bit-to-base transcoding schemes have primarily pursued information density, at the expense however of introducing biocompatibility challenges or at the risk of decoding failure. Here, we propose a robust transcoding algorithm named the “Yin-Yang Codec” (YYC), using two rules to encode two binary bits into one nucleotide, to generate DNA sequences highly compatible with synthesis and sequencing technologies. We encoded two representative file formats and stored them in vitro as 200-nt oligo pools and in vivo as an ~54-kb DNA fragment in yeast cells. Sequencing results show that YYC exhibits high robustness and reliability for a wide variety of data types, with an average recovery rate of 99.94% at 104 molecule copies and an achieved recovery rate of 87.53% at 100 copies. In addition, the in vivo storage demonstration achieved for the first time an experimentally measured physical information density of 198.8 EB per gram of DNA (44% of the theoretical maximum for DNA).

Download Full-text

Codes With Run-Length and GC-Content Constraints for DNA-Based Data Storage

IEEE Communications Letters ◽

10.1109/lcomm.2018.2866566 ◽

2018 ◽

Vol 22 (10) ◽

pp. 2004-2007 ◽

Cited By ~ 15

Author(s):

Wentu Song ◽

Kui Cai ◽

Mu Zhang ◽

Chau Yuen

Keyword(s):

Data Storage ◽

Gc Content ◽

Run Length

Download Full-text

Analysis of a Secure Virtual Desktop Infrastructure System

International Journal of Digital Crime and Forensics ◽

10.4018/ijdcf.2015010104 ◽

2015 ◽

Vol 7 (1) ◽

pp. 69-84 ◽

Cited By ~ 2

Author(s):

Yi Jie Tong ◽

Wei Qi Yan ◽

Jin Yu

Keyword(s):

Information Security ◽

Data Storage ◽

Storage Capacity ◽

Cost Effective ◽

Energy Costs ◽

Campus Network ◽

Effective Solution ◽

Virtual Desktop ◽

Infrastructure System ◽

Virtual Desktop Infrastructure

With an increasing number of personal computers introduced in schools, enterprises and other large organizations, workloads of system administrators have been on the rise due to the issues related to energy costs, IT expenses, PC replacement expenditures, data storage capacity, and information security, etc. However, Application Virtualization (AV) has been proved as a successful cost-effective solution to solve these problems. In this paper, the analytics of a Virtual Desktop Infrastructure (VDI) system will be taken into consideration for a campus network. Our developed system will be introduced and justified. Furthermore, the rationality for these improvements will be introduced.

Download Full-text

Estimation of Lifetime Data Storage Capacity for Human Senses

The Journal of the Korea Contents Association ◽

10.5392/jkca.2009.9.1.023 ◽

2009 ◽

Vol 9 (1) ◽

pp. 23-29

Author(s):

Young-Gap You ◽

Young-Jun Song ◽

Dong-Woo Kim

Keyword(s):

Data Storage ◽

Storage Capacity ◽

Lifetime Data

Download Full-text

Multilevel Optical Data Storage in Eu2+/Ho3+ doped Ba2SiO4 Phosphor with Linear Mapping between Ultraviolet Excitation and Thermoluminescence/Photostimulated Luminescence Response

Journal of Materials Chemistry C ◽

10.1039/d1tc05254c ◽

2021 ◽

Author(s):

Mingliang Pan ◽

Yi Zhong ◽

Hui Lin ◽

Hongran Bao ◽

Lulu Zheng ◽

...

Keyword(s):

Data Storage ◽

Storage Capacity ◽

Optical Storage ◽

Linear Mapping ◽

Optical Data Storage ◽

Optical Data ◽

Photostimulated Luminescence ◽

Persistent Luminescence ◽

Ultraviolet Excitation ◽

Storage Media

Persistent luminescence phosphors are regarded as one of the promising candidates for optical storage media. However, most optical storages using phosphors can only realize single-bit-data recording, limiting the storage capacity....

Download Full-text

Efficient Storage and Temporal Query Evaluation in Hierarchical Data Archiving Systems

Lecture Notes in Computer Science - Scientific and Statistical Database Management ◽

10.1007/978-3-642-22351-8_7 ◽

2011 ◽

pp. 109-128 ◽

Cited By ~ 2

Author(s):

Hui Wang ◽

Ruilin Liu ◽

Dimitri Theodoratos ◽

Xiaoying Wu

Keyword(s):

Query Evaluation ◽

Hierarchical Data ◽

Data Archiving ◽

Efficient Storage ◽

Temporal Query

Download Full-text

Genetic association testing using the GENESIS R/Bioconductor package

Bioinformatics ◽

10.1093/bioinformatics/btz567 ◽

2019 ◽

Cited By ~ 20

Author(s):

Stephanie M Gogarten ◽

Tamar Sofer ◽

Han Chen ◽

Chaoyu Yu ◽

Jennifer A Brody ◽

...

Keyword(s):

Data Storage ◽

Genomic Analysis ◽

Supplementary Information ◽

Storage And Retrieval ◽

Association Testing ◽

Link Functions ◽

Efficient Storage ◽

Genetic Association Testing ◽

Analysis Workflow ◽

Complete Genomic

Abstract Summary The Genomic Data Storage (GDS) format provides efficient storage and retrieval of genotypes measured by microarrays and sequencing. We developed GENESIS to perform various single- and aggregate-variant association tests using genotype data stored in GDS format. GENESIS implements highly flexible mixed models, allowing for different link functions, multiple variance components and phenotypic heteroskedasticity. GENESIS integrates cohesively with other R/Bioconductor packages to build a complete genomic analysis workflow entirely within the R environment. Availability and implementation https://bioconductor.org/packages/GENESIS; vignettes included. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A Novel Approach for Enhancing Data Storage Capacity in Quick Response Code Using Multiplexing and Data Compression Technique

2015 International Conference on Computational Intelligence and Communication Networks (CICN) ◽

10.1109/cicn.2015.214 ◽

2015 ◽

Author(s):

Mona M. Umaria ◽

G.B. Jethava

Keyword(s):

Data Compression ◽

Data Storage ◽

Storage Capacity ◽

Quick Response ◽

Compression Technique ◽

Response Code ◽

Quick Response Code ◽

Novel Approach

Download Full-text

Evaluation of lossless and lossy algorithms for the compression of scientific datasets in NetCDF-4 or HDF5 formatted files

10.5194/gmd-2018-250 ◽

2018 ◽

Author(s):

Xavier Delaunay ◽

Aurélie Courtois ◽

Flavien Gouillon

Keyword(s):

Data Storage ◽

Compression Ratio ◽

Reduction Method ◽

High Compression Ratio ◽

Compression Speed ◽

File Formats ◽

Scientific Datasets ◽

Bounded Data ◽

Data Reduction Method ◽

Rounding Algorithm

Abstract. The increasing volume of scientific datasets imposes the use of compression to reduce the data storage or transmission costs, specifically for the oceanography or meteorological datasets generated by Earth observation mission ground segments. These data are mostly produced in NetCDF formatted files. Indeed, the NetCDF-4/HDF5 file formats are widely spread in the global scientific community because of the nice features they offer. Particularly, the HDF5 offers the dynamically loaded filter plugin functionality allowing users to write filters, such as compression/decompression filters, to process the data before reading or writing it on the disk. In this work, we evaluate the performance of lossy and lossless compression/decompression methods through NetCDF-4 and HDF5 tools on analytical and real scientific floating-point datasets. We also introduce the Digit Rounding algorithm, a new relative error bounded data reduction method inspired by the Bit Grooming algorithm. The Digit Rounding algorithm allows high compression ratio while preserving a given number of significant digits in the dataset. It achieves higher compression ratio than the Bit Grooming algorithm while keeping similar compression speed.

Download Full-text

Reading mixtures of uniform sequence-defined macromolecules to increase data storage capacity

Communications Chemistry ◽

10.1038/s42004-020-00431-9 ◽

2020 ◽

Vol 3 (1) ◽

Author(s):

Maximiliane Frölich ◽

Dennis Hofheinz ◽

Michael A. R. Meier

Keyword(s):

Data Storage ◽

Storage Capacity ◽

Data Interpretation ◽

Molecular Data ◽

Research Topic ◽

Side Chains ◽

Esi Ms ◽

Python Script ◽

High Data ◽

Synthetic Macromolecules

AbstractIn recent years, the field of molecular data storage has emerged from a niche to a vibrant research topic. Herein, we describe a simultaneous and automated read-out of data stored in mixtures of sequence-defined oligomers. Therefore, twelve different sequence-defined tetramers and three hexamers with different mass markers and side chains are successfully synthesised via iterative Passerini three-component reactions and subsequent deprotection steps. By programming a straightforward python script for ESI-MS/MS analysis, it is possible to automatically sequence and thus read-out the information stored in these oligomers within one second. Most importantly, we demonstrate that the use of mass-markers as starting compounds eases MS/MS data interpretation and furthermore allows the unambiguous reading of sequences of mixtures of sequence-defined oligomers. Thus, high data storage capacity considering the field of synthetic macromolecules (up to 64.5 bit in our examples) can be obtained without the need of synthesizing long sequences, but by mixing and simultaneously analysing shorter sequence-defined oligomers.

Download Full-text