A Novel Compression Algorithm for High-Throughput DNA Sequence Based on Huffman Coding Method

Author(s):  
Chuan He ◽  
Huaiqiu Zhu
Cancer Cell ◽  
2007 ◽  
Vol 12 (6) ◽  
pp. 501-513 ◽  
Author(s):  
Stefan Fröhling ◽  
Claudia Scholl ◽  
Ross L. Levine ◽  
Marc Loriaux ◽  
Titus J. Boggon ◽  
...  

2013 ◽  
Vol 842 ◽  
pp. 712-716
Author(s):  
Qi Hong ◽  
Xiao Lei Lu

As a lossless data compression coding, Huffman coding is widely used in text compression. Nevertheless, the traditional approach has some deficiencies. For example, same compression on all characters may overlook the particularity of keywords and special statements as well as the regularity of some statements. In terms of this situation, a new data compression algorithm based on semantic analysis is proposed in this paper. The new kind of method, which takes C language keywords as the basic element, is created for solving the text compression of source files of C language. The results of experiment show that the compression ratio has been improved by 150 percent roughly in this way. This method can be promoted to apply to text compression of the constrained-language.


Genome ◽  
2001 ◽  
Vol 44 (4) ◽  
pp. 523-528 ◽  
Author(s):  
Raja Kota ◽  
Markus Wolf ◽  
Wolfgang Michalek ◽  
Andreas Graner

Recent advances in DNA sequence analysis and the establishment of high-throughput assays have provided the framework for large-scale discovery and analysis of DNA sequence variation. In this context, single nucleotide polymorphisms (SNPs) are of particular interest. To initiate a systematic approach to develop an SNP map of barley (Hordeum vulgare L.), we have employed denaturing high-performance liquid chromatography (DHPLC) to analyse segregating SNP patterns in a doubled-haploid (DH) mapping population. To this end, SNPs between the parental genotypes were identified using a direct sequencing approach. Once a SNP was established between the parents, the optimal melting temperature of the PCR fragment containing the SNP was predicted for its analysis by DHPLC. Following the detection of the optimal temperature, the DH lines were analysed for the presence of either of the alleles. To test the utility of the analysis, data from previously mapped RFLP markers from which these SNPs were derived were compared. Results from these experiments indicate that DHPLC can be efficiently employed in analysing SNPs on a high-throughput scale.Key words: denaturing high performance liquid chromatography, doubled-haploid lines, restriction fragment length polymorphism, genetic mapping, molecular markers.


2015 ◽  
Vol 5 (4) ◽  
pp. 73-85 ◽  
Author(s):  
Subhankar Roy ◽  
Akash Bhagot ◽  
Kumari Annapurna Sharma ◽  
Sunirmal Khatua

2016 ◽  
Vol 2016 ◽  
pp. 1-7 ◽  
Author(s):  
Pamela Vinitha Eric ◽  
Gopakumar Gopalakrishnan ◽  
Muralikrishnan Karunakaran

This paper proposes a seed based lossless compression algorithm to compress a DNA sequence which uses a substitution method that is similar to the LempelZiv compression scheme. The proposed method exploits the repetition structures that are inherent in DNA sequences by creating an offline dictionary which contains all such repeats along with the details of mismatches. By ensuring that only promising mismatches are allowed, the method achieves a compression ratio that is at par or better than the existing lossless DNA sequence compression algorithms.


Gut Microbes ◽  
2013 ◽  
Vol 4 (2) ◽  
pp. 125-135 ◽  
Author(s):  
Matthew J. Hamilton ◽  
Alexa R. Weingarden ◽  
Tatsuya Unno ◽  
Alexander Khoruts ◽  
Michael J. Sadowsky

2016 ◽  
Vol 78 (6-4) ◽  
Author(s):  
Muhamad Azlan Daud ◽  
Muhammad Rezal Kamel Ariffin ◽  
S. Kularajasingam ◽  
Che Haziqah Che Hussin ◽  
Nurliyana Juhan ◽  
...  

A new compression algorithm used to ensure a modified Baptista symmetric cryptosystem which is based on a chaotic dynamical system to be applicable is proposed. The Baptista symmetric cryptosystem able to produce various ciphers responding to the same message input. This modified Baptista type cryptosystem suffers from message expansion that goes against the conventional methodology of a symmetric cryptosystem. A new lossless data compression algorithm based on theideas from the Huffman coding for data transmission is proposed.This new compression mechanism does not face the problem of mapping elements from a domain which is much larger than its range.Our new algorithm circumvent this problem via a pre-defined codeword list.  The purposed algorithm has fast encoding and decoding mechanism and proven analytically to be a lossless data compression technique.


Sign in / Sign up

Export Citation Format

Share Document