Practical Efficient String Mining

2012 ◽  
Vol 24 (4) ◽  
pp. 735-744 ◽  
Author(s):  
Jasbir Dhaliwal ◽  
Simon J. Puglisi ◽  
Andrew Turpin
Keyword(s):  
2010 ◽  
Vol 20-23 ◽  
pp. 653-658
Author(s):  
Zhan Xi Guo ◽  
Zhi Xin Ma ◽  
Yu Sheng Xu ◽  
Li Liu

Given m databases D1,...,Dm of strings, the purpose of the frequent string mining is to find all strings that fulfill certain constraints of all string databases. In this paper, a useful data structure is proposed to construct suffix and LCP table which can reduce the total space consumption of string mining efficiently. We demonstrate the use of this data structure by optimizing the algorithm proposed by A.Kügel et al [7] and present the improved algorithm. It is achieved that the space consumption in our algorithm is proportional to the length of the largest string of all databases. A set of comprehensive performance experiments shows that the processing rate is enhanced because amount of items are reduced in new data structure.


2014 ◽  
Vol 7 (23) ◽  
pp. 5063-5067
Author(s):  
K. Geetha Rani ◽  
Shobhanjaly P. Nair ◽  
P. Visu ◽  
S. Koteeswaran
Keyword(s):  

2016 ◽  
Author(s):  
John A. Lees ◽  
Minna Vehkala ◽  
Niko Välimäki ◽  
Simon R. Harris ◽  
Claire Chewapreecha ◽  
...  

AbstractBacterial genomes vary extensively in terms of both gene content and gene sequence – this plasticity hampers the use of traditional SNP-based methods for identifying all genetic associations with phenotypic variation. Here we introduce a computationally scalable and widely applicable statistical method (SEER) for the identification of sequence elements that are significantly enriched in a phenotype of interest. SEER is applicable to even tens of thousands of genomes by counting variable-length k-mers using a distributed string-mining algorithm. Robust options are provided for association analysis that also correct for the clonal population structure of bacteria. Using large collections of genomes of the major human pathogensStreptococcus pneumoniaeandStreptococcus pyogenes, SEER identifies relevant previously characterised resistance determinants for several antibiotics and discovers potential novel factors related to the invasiveness ofS. pyogenes. We thus demonstrate that our method can answer important biologically and medically relevant questions.


Author(s):  
Mohamed Abouelhoda ◽  
Moustafa Ghanem
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document