Towards Practical and Robust DNA-based Data Archiving by Codec System Named ‘Yin-Yang’
AbstractMotivationDNA has been reported as a promising medium of data storage for its remarkable durability and space-efficient storage capacity. Here, we propose a robust DNA-based data storage method based on a new codec algorithm, namely ‘Yin-Yang’.ResultsUsing this strategy, we successfully stored different file formats in a single synthetic DNA oligonucleotide pool. Compared to most well-established DNA-based data storage coding schemes presented to date, this codec system can achieve a variety of user goals (e.g. reduce homopolymer length to 3 or 4 at most, maintain balanced GC content between 40% and 60% and simple secondary structure with the Gibbs free energy above −30 kcal/mol). It also shows enhanced robustness in transcoding of different data structure and practical feasibility. We tested this codec with an end-to-end experiment including encoding, DNA synthesis, sequencing and decoding. Through successful retrieval of 3 files totaling 2.02 Megabits after sequencing and decoding, our strategy exhibits great qualities of achieving high storing capacity per nucleotide (427.1 PB/gram) and high fidelity of data recovery.