scholarly journals Improved read/write cost tradeoff in DNA-based data storage using LDPC codes

2019 ◽  
Author(s):  
Shubham Chandak ◽  
Kedar Tatwawadi ◽  
Billy Lau ◽  
Jay Mardia ◽  
Matthew Kubit ◽  
...  

AbstractWith the amount of data being stored increasing rapidly, there is significant interest in exploring alternative storage technologies. In this context, DNA-based storage systems can offer significantly higher storage densities (petabytes/gram) and durability (thousands of years) than current technologies. Specifically, DNA has been found to be stable over extended periods of time which has been demonstrated in the analysis of organisms long since extinct. Recent advances in DNA sequencing and synthesis pipelines have made DNA-based storage a promising candidate for the storage technology of the future.Recently, there have been multiple efforts in this direction, focusing on aspects such as error correction for synthesis/sequencing errors and erasure correction for handling missing sequences. The typical approach is to use separate codes for handling errors and erasures, but there is limited understanding of the efficiency of this framework. Furthermore, the existing techniques use short block-length codes and heavily rely on read consensus, both of which are known to be suboptimal in coding theory.In this work, we study the tradeoff between the writing and reading costs involved in DNA-based storage and propose a practical scheme to achieve an improved tradeoff between these quantities. Our scheme breaks with the traditional separation framework and instead uses a single large block-length LDPC code for both erasure and error correction. We also introduce novel techniques to handle insertion and deletion errors introduced by the synthesis process. For a range of writing costs, the proposed scheme achieves 30-40% lower reading costs than state-of-the-art techniques on experimental data obtained using array synthesis and Illumina sequencing.The code, data, and Supplementary Material is available at https://github.com/shubhamchandak94/LDPC_DNA_storage.

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
George D. Dickinson ◽  
Golam Md Mortuza ◽  
William Clay ◽  
Luca Piantanida ◽  
Christopher M. Green ◽  
...  

AbstractDNA is a compelling alternative to non-volatile information storage technologies due to its information density, stability, and energy efficiency. Previous studies have used artificially synthesized DNA to store data and automated next-generation sequencing to read it back. Here, we report digital Nucleic Acid Memory (dNAM) for applications that require a limited amount of data to have high information density, redundancy, and copy number. In dNAM, data is encoded by selecting combinations of single-stranded DNA with (1) or without (0) docking-site domains. When self-assembled with scaffold DNA, staple strands form DNA origami breadboards. Information encoded into the breadboards is read by monitoring the binding of fluorescent imager probes using DNA-PAINT super-resolution microscopy. To enhance data retention, a multi-layer error correction scheme that combines fountain and bi-level parity codes is used. As a prototype, fifteen origami encoded with ‘Data is in our DNA!\n’ are analyzed. Each origami encodes unique data-droplet, index, orientation, and error-correction information. The error-correction algorithms fully recover the message when individual docking sites, or entire origami, are missing. Unlike other approaches to DNA-based data storage, reading dNAM does not require sequencing. As such, it offers an additional path to explore the advantages and disadvantages of DNA as an emerging memory material.


2019 ◽  
Author(s):  
Md. Jakaria ◽  
Kowshika Sarker ◽  
Mostofa Rafid Uddin ◽  
Md. Mohaiminul Islam ◽  
Trisha Das ◽  
...  

AbstractThe propitious developments in molecular biology and next generation sequencing have enabled the possibility for DNA storage technologies. However, the full application and power of our genomic revolution have not been fully utilized in clinical medicine given a lack of transition from research to real world clinical practice. This has identified an increasing need for an operating system which allows for the transition from research to clinical use. We present eMED-DNA, an in silico operating system for archiving and managing all forms of electronic health records (EHRs) within one’s own copy of the sequenced genome to aid in the application and integration of genomic medicine within real world clinical practice. We incorporated an efficient and sophisticated in-DNA file management system for the lossless management of EHRs within a genome. This represents the first in silico integrative system which would bring closer the utopian ideal for integrating genotypic data with phenotypic clinical data for future medical practice.


Author(s):  
Rohitkumar R Upadhyay

Abstract: Hamming codes for all intents and purposes are the first nontrivial family of error-correcting codes that can actually correct one error in a block of binary symbols, which literally is fairly significant. In this paper we definitely extend the notion of error correction to error-reduction and particularly present particularly several decoding methods with the particularly goal of improving the error-reducing capabilities of Hamming codes, which is quite significant. First, the error-reducing properties of Hamming codes with pretty standard decoding definitely are demonstrated and explored. We show a sort of lower bound on the definitely average number of errors present in a decoded message when two errors for the most part are introduced by the channel for for all intents and purposes general Hamming codes, which actually is quite significant. Other decoding algorithms are investigated experimentally, and it generally is definitely found that these algorithms for the most part improve the error reduction capabilities of Hamming codes beyond the aforementioned lower bound of for all intents and purposes standard decoding. Keywords: coding theory, hamming codes, hamming distance


2011 ◽  
Vol 341-342 ◽  
pp. 700-704
Author(s):  
Bai Yi Huang

Flash-based solid state disks (SSD) is a performance based data storage technology that optimizes the use of flash-based technology to implement its data storage capabilities compared with mechanically available data storage technologies. It has been argued in theory and practice that SSD devices are better performers compared with mechanical devices. To improve the efficiency of a flash memory SSD device, it is important for it to be designed to be computationally support parallel operations.


Author(s):  
Anupama C. Raman

Unstructured data is growing exponentially. Present day storage infrastructures like Storage Area Networks and Network Attached Storage are not very suitable for storing huge volumes of unstructured data. This has led to the development of new types of storage technologies like object-based storage. Huge amounts of both structured and unstructured data that needs to be made available in real time for analytical insights is referred to as Big Data. On account of the distinct nature of big data, the storage infrastructures for storing big data should possess some specific features. In this chapter, the authors examine the various storage technology options that are available nowadays and their suitability for storing big data. This chapter also provides a bird's eye view of cloud storage technology, which is used widely for big data storage.


Author(s):  
Julian Ray

This chapter identifies and discusses issues associated with integrating technologies for storing spatial data into business information technology frameworks. A new taxonomy of spatial data storage systems is developed differentiating storage systems by the systems architectures used to enable interaction between client applications and physical spatial data stores, and by the methods used by client applications to query and return spatial data. Five distinct storage models are identified and discussed along with current examples of vendor implementations. Building on this initial discussion, the chapter identifies a variety of issues pertaining to spatial data storage systems affecting three distinct aspects of technology adoption: systems design, systems implementation and management of completed systems. Current issues associated with each of these three aspects are described and illustrated along with a discussion of emerging trends in spatial data storage technologies. As spatial data and the technologies designed to store and manipulate it become more prevalent, understanding potential impacts these technologies may have on other technology decisions within an organization becomes increasingly important. Furthermore, understanding how these technologies can introduce security risks and other vulnerabilities into a computing framework is critical to successful implementation.


Author(s):  
D. Chakraborty ◽  
G. Chakraborty ◽  
N. Shiratori

The advancement in optical fiber and switching technologies has resulted in a new generation of high-speed networks that can achieve speeds of up to a few gigabits per second. Also, the progress in audio, video and data storage technologies has given rise to new distributed real-time applications. These applications may involve multimedia, which require low end-to-end delay. The applications’ requirements, such as the end-to-end delay, delay jitter, and loss rate, are expressed as QoS parameters, which must be guaranteed. In addition, many of these new applications involve multiple users, and hence the importance of multicast communication. Multimedia applications are becoming increasingly important, as networks are now capable of carrying continuous media traffic, such as voice and video, to the end user. When there is a lot of information to transmit to a subset of hosts, then multicast is the best possible way to facilitate it. This article addresses different multicast routing algorithms and protocols. We have also discussed about the QoS multicast routing and conclude this article with mobile multicasting.


Author(s):  
Mehdi Asheghi

The magnetic data storage industry has followed a similar density (and data rate) improvement curve as the semiconductor technology (Moore’s Law) for the past decade. However, whether the storage densities will continue to increase at this rate and be able to keep up with the improvements in processor technology is under a near term threat resulting from the fundamental physics up on which the hard disk drives are based. It is expected that novel, more unconventional technological solutions become necessary to overcome limitations, however, many of these technologies rely heavily on heating and energy transport at extremely short time and length scales. It is widely believed that further advances in high-technology data storage systems will be difficult, if not impossible, without rigorous treatment of the nano-scale energy transport. The nano-scale heat transfer research effort at Data Storage System Center (DSSC) has been focused on three interwoven areas of thermal design, failure analysis, and metrology of micro/nano-devices and structures relevant to data storage technologies. In this presentation, underlying physics and fundamentals of heat transport at nanoscale will be discussed. In addition, applications of the nanoscale heat transfer to the thermal analyses of the magnetic and phase change optical data storage technologies will be presented.


Sign in / Sign up

Export Citation Format

Share Document