scholarly journals Towards Practical and Robust DNA-based Data Archiving by Codec System Named ‘Yin-Yang’

2019 ◽  
Author(s):  
Zhi Ping ◽  
Shihong Chen ◽  
Guangyu Zhou ◽  
Xiaoluo Huang ◽  
Sha Joe Zhu ◽  
...  

AbstractMotivationDNA has been reported as a promising medium of data storage for its remarkable durability and space-efficient storage capacity. Here, we propose a robust DNA-based data storage method based on a new codec algorithm, namely ‘Yin-Yang’.ResultsUsing this strategy, we successfully stored different file formats in a single synthetic DNA oligonucleotide pool. Compared to most well-established DNA-based data storage coding schemes presented to date, this codec system can achieve a variety of user goals (e.g. reduce homopolymer length to 3 or 4 at most, maintain balanced GC content between 40% and 60% and simple secondary structure with the Gibbs free energy above −30 kcal/mol). It also shows enhanced robustness in transcoding of different data structure and practical feasibility. We tested this codec with an end-to-end experiment including encoding, DNA synthesis, sequencing and decoding. Through successful retrieval of 3 files totaling 2.02 Megabits after sequencing and decoding, our strategy exhibits great qualities of achieving high storing capacity per nucleotide (427.1 PB/gram) and high fidelity of data recovery.

2021 ◽  
Author(s):  
Zhi Ping ◽  
Shihong Chen ◽  
Guangyu Zhou ◽  
Xiaoluo Huang ◽  
Sha Joe Zhu ◽  
...  

Abstract DNA is a promising data storage medium due to its remarkable durability and space-efficient storage. Early bit-to-base transcoding schemes have primarily pursued information density, at the expense however of introducing biocompatibility challenges or at the risk of decoding failure. Here, we propose a robust transcoding algorithm named the “Yin-Yang Codec” (YYC), using two rules to encode two binary bits into one nucleotide, to generate DNA sequences highly compatible with synthesis and sequencing technologies. We encoded two representative file formats and stored them in vitro as 200-nt oligo pools and in vivo as an ~54-kb DNA fragment in yeast cells. Sequencing results show that YYC exhibits high robustness and reliability for a wide variety of data types, with an average recovery rate of 99.94% at 104 molecule copies and an achieved recovery rate of 87.53% at 100 copies. In addition, the in vivo storage demonstration achieved for the first time an experimentally measured physical information density of 198.8 EB per gram of DNA (44% of the theoretical maximum for DNA).


2018 ◽  
Vol 22 (10) ◽  
pp. 2004-2007 ◽  
Author(s):  
Wentu Song ◽  
Kui Cai ◽  
Mu Zhang ◽  
Chau Yuen
Keyword(s):  

2015 ◽  
Vol 7 (1) ◽  
pp. 69-84 ◽  
Author(s):  
Yi Jie Tong ◽  
Wei Qi Yan ◽  
Jin Yu

With an increasing number of personal computers introduced in schools, enterprises and other large organizations, workloads of system administrators have been on the rise due to the issues related to energy costs, IT expenses, PC replacement expenditures, data storage capacity, and information security, etc. However, Application Virtualization (AV) has been proved as a successful cost-effective solution to solve these problems. In this paper, the analytics of a Virtual Desktop Infrastructure (VDI) system will be taken into consideration for a campus network. Our developed system will be introduced and justified. Furthermore, the rationality for these improvements will be introduced.


2009 ◽  
Vol 9 (1) ◽  
pp. 23-29
Author(s):  
Young-Gap You ◽  
Young-Jun Song ◽  
Dong-Woo Kim

Author(s):  
Mingliang Pan ◽  
Yi Zhong ◽  
Hui Lin ◽  
Hongran Bao ◽  
Lulu Zheng ◽  
...  

Persistent luminescence phosphors are regarded as one of the promising candidates for optical storage media. However, most optical storages using phosphors can only realize single-bit-data recording, limiting the storage capacity....


Author(s):  
Stephanie M Gogarten ◽  
Tamar Sofer ◽  
Han Chen ◽  
Chaoyu Yu ◽  
Jennifer A Brody ◽  
...  

Abstract Summary The Genomic Data Storage (GDS) format provides efficient storage and retrieval of genotypes measured by microarrays and sequencing. We developed GENESIS to perform various single- and aggregate-variant association tests using genotype data stored in GDS format. GENESIS implements highly flexible mixed models, allowing for different link functions, multiple variance components and phenotypic heteroskedasticity. GENESIS integrates cohesively with other R/Bioconductor packages to build a complete genomic analysis workflow entirely within the R environment. Availability and implementation https://bioconductor.org/packages/GENESIS; vignettes included. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Xavier Delaunay ◽  
Aurélie Courtois ◽  
Flavien Gouillon

Abstract. The increasing volume of scientific datasets imposes the use of compression to reduce the data storage or transmission costs, specifically for the oceanography or meteorological datasets generated by Earth observation mission ground segments. These data are mostly produced in NetCDF formatted files. Indeed, the NetCDF-4/HDF5 file formats are widely spread in the global scientific community because of the nice features they offer. Particularly, the HDF5 offers the dynamically loaded filter plugin functionality allowing users to write filters, such as compression/decompression filters, to process the data before reading or writing it on the disk. In this work, we evaluate the performance of lossy and lossless compression/decompression methods through NetCDF-4 and HDF5 tools on analytical and real scientific floating-point datasets. We also introduce the Digit Rounding algorithm, a new relative error bounded data reduction method inspired by the Bit Grooming algorithm. The Digit Rounding algorithm allows high compression ratio while preserving a given number of significant digits in the dataset. It achieves higher compression ratio than the Bit Grooming algorithm while keeping similar compression speed.


2020 ◽  
Vol 3 (1) ◽  
Author(s):  
Maximiliane Frölich ◽  
Dennis Hofheinz ◽  
Michael A. R. Meier

AbstractIn recent years, the field of molecular data storage has emerged from a niche to a vibrant research topic. Herein, we describe a simultaneous and automated read-out of data stored in mixtures of sequence-defined oligomers. Therefore, twelve different sequence-defined tetramers and three hexamers with different mass markers and side chains are successfully synthesised via iterative Passerini three-component reactions and subsequent deprotection steps. By programming a straightforward python script for ESI-MS/MS analysis, it is possible to automatically sequence and thus read-out the information stored in these oligomers within one second. Most importantly, we demonstrate that the use of mass-markers as starting compounds eases MS/MS data interpretation and furthermore allows the unambiguous reading of sequences of mixtures of sequence-defined oligomers. Thus, high data storage capacity considering the field of synthetic macromolecules (up to 64.5 bit in our examples) can be obtained without the need of synthesizing long sequences, but by mixing and simultaneously analysing shorter sequence-defined oligomers.


Sign in / Sign up

Export Citation Format

Share Document