string mining Latest Research Papers

Accelerating pattern-based time series classification: a linear time and space string mining approach

Knowledge and Information Systems ◽

10.1007/s10115-019-01378-7 ◽

2019 ◽

Vol 62 (3) ◽

pp. 1113-1141 ◽

Cited By ~ 2

Author(s):

Atif Raza ◽

Stefan Kramer

Keyword(s):

Time Series ◽

Linear Time ◽

Time Series Classification ◽

Time And Space ◽

String Mining

Download Full-text

Detecting new Chinese words from massive domain texts with word embedding

Journal of Information Science ◽

10.1177/0165551518786676 ◽

2018 ◽

Vol 45 (2) ◽

pp. 196-211 ◽

Cited By ~ 2

Author(s):

Yu Qian ◽

Yang Du ◽

Xiongwen Deng ◽

Baojun Ma ◽

Qiongwei Ye ◽

...

Keyword(s):

Word Embedding ◽

New Words ◽

Pruning Strategy ◽

String Mining ◽

Novel Method ◽

Correlated Information ◽

N Gram ◽

The Relationship ◽

Embedding Methods ◽

Word String

Textual information retrieval (TIR) is based on the relationship between word units. Traditional word segmentation techniques attempt to discern the word units accurately from texts; however, they are unable to appropriately and efficiently identify all new words. Identification of new words, especially in languages such as Chinese, remains a challenge. In recent years, word embedding methods have used numerical word vectors to retain the semantic and correlated information between words in a corpus. In this article, we propose the word-embedding-based method (WEBM), a novel method that combines word embedding and frequent n-gram string mining for discovering new words from domain corpora. First, we mapped all word units in a domain corpus to a high-dimension word vector space. Second, we used a frequent n-gram word string mining method to identify a set of candidates for new words. We designed a pruning strategy based on the word vectors to quantify the possibility of a word string being a new word, thereby allowing the evaluation of candidates based on the similarity of word units in the same string. In a comparative study, our experimental results revealed that WEBM had a great advantage in detecting new words from massive Chinese corpora.

Download Full-text

Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes

10.1101/038463 ◽

2016 ◽

Cited By ~ 4

Author(s):

John A. Lees ◽

Minna Vehkala ◽

Niko Välimäki ◽

Simon R. Harris ◽

Claire Chewapreecha ◽

...

Keyword(s):

Enrichment Analysis ◽

Genetic Associations ◽

Human Pathogens ◽

Bacterial Genomes ◽

Sequence Element ◽

Clonal Population ◽

Sequence Elements ◽

String Mining ◽

Clonal Population Structure ◽

Element Enrichment

AbstractBacterial genomes vary extensively in terms of both gene content and gene sequence – this plasticity hampers the use of traditional SNP-based methods for identifying all genetic associations with phenotypic variation. Here we introduce a computationally scalable and widely applicable statistical method (SEER) for the identification of sequence elements that are significantly enriched in a phenotype of interest. SEER is applicable to even tens of thousands of genomes by counting variable-length k-mers using a distributed string-mining algorithm. Robust options are provided for association analysis that also correct for the clonal population structure of bacteria. Using large collections of genomes of the major human pathogensStreptococcus pneumoniaeandStreptococcus pyogenes, SEER identifies relevant previously characterised resistance determinants for several antibiotics and discovers potential novel factors related to the invasiveness ofS. pyogenes. We thus demonstrate that our method can answer important biologically and medically relevant questions.

Download Full-text

MIST: Top-k Approximate Sub-string Mining Using Triplet Statistical Significance

Lecture Notes in Computer Science - Advances in Information Retrieval ◽

10.1007/978-3-319-16354-3_31 ◽

2015 ◽

pp. 284-290 ◽

Cited By ~ 1

Author(s):

Sourav Dutta

Keyword(s):

Statistical Significance ◽

String Mining

Download Full-text

Expertised String Mining in Outsized Databases and Hefty Files

Research Journal of Applied Sciences Engineering and Technology ◽

10.19026/rjaset.7.900 ◽

2014 ◽

Vol 7 (23) ◽

pp. 5063-5067

Author(s):

K. Geetha Rani ◽

Shobhanjaly P. Nair ◽

P. Visu ◽

S. Koteeswaran

Keyword(s):

String Mining

Download Full-text

Practical Efficient String Mining

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2010.242 ◽

2012 ◽

Vol 24 (4) ◽

pp. 735-744 ◽

Cited By ~ 7

Author(s):

Jasbir Dhaliwal ◽

Simon J. Puglisi ◽

Andrew Turpin

Keyword(s):

String Mining

Download Full-text

Distributed String Mining for High-Throughput Sequencing Data

Lecture Notes in Computer Science - Algorithms in Bioinformatics ◽

10.1007/978-3-642-33122-0_35 ◽

2012 ◽

pp. 441-452 ◽

Cited By ~ 4

Author(s):

Niko Välimäki ◽

Simon J. Puglisi

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

String Mining

Download Full-text

An Optimized LCP Table Based Algorithm for Frequent String Mining

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.20-23.653 ◽

2010 ◽

Vol 20-23 ◽

pp. 653-658

Author(s):

Zhan Xi Guo ◽

Zhi Xin Ma ◽

Yu Sheng Xu ◽

Li Liu

Keyword(s):

Data Structure ◽

Total Space ◽

Processing Rate ◽

Comprehensive Performance ◽

String Mining ◽

String Databases ◽

Improved Algorithm

Given m databases D1,...,Dm of strings, the purpose of the frequent string mining is to find all strings that fulfill certain constraints of all string databases. In this paper, a useful data structure is proposed to construct suffix and LCP table which can reduce the total space consumption of string mining efficiently. We demonstrate the use of this data structure by optimizing the algorithm proposed by A.Kügel et al [7] and present the improved algorithm. It is achieved that the space consumption in our algorithm is proportional to the length of the largest string of all databases. A set of comprehensive performance experiments shows that the processing rate is enhanced because amount of items are reduced in new data structure.

Download Full-text

String Mining in Bioinformatics

Scientific Data Mining and Knowledge Discovery ◽

10.1007/978-3-642-02788-8_9 ◽

2009 ◽

pp. 207-247 ◽

Cited By ~ 3

Author(s):

Mohamed Abouelhoda ◽

Moustafa Ghanem

Keyword(s):

String Mining

Download Full-text

Space Efficient String Mining under Frequency Constraints

2008 Eighth IEEE International Conference on Data Mining ◽

10.1109/icdm.2008.32 ◽

2008 ◽

Cited By ~ 14

Author(s):

Johannes Fischer ◽

Veli Mäkinen ◽

Niki Välimäki

Keyword(s):

Frequency Constraints ◽

String Mining

Download Full-text

string mining
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Accelerating pattern-based time series classification: a linear time and space string mining approach

Detecting new Chinese words from massive domain texts with word embedding

Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes

MIST: Top-k Approximate Sub-string Mining Using Triplet Statistical Significance

Expertised String Mining in Outsized Databases and Hefty Files

Practical Efficient String Mining

Distributed String Mining for High-Throughput Sequencing Data

An Optimized LCP Table Based Algorithm for Frequent String Mining

String Mining in Bioinformatics

Space Efficient String Mining under Frequency Constraints

Export Citation Format

string miningRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Accelerating pattern-based time series classification: a linear time and space string mining approach

Detecting new Chinese words from massive domain texts with word embedding

Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes

MIST: Top-k Approximate Sub-string Mining Using Triplet Statistical Significance

Expertised String Mining in Outsized Databases and Hefty Files

Practical Efficient String Mining

Distributed String Mining for High-Throughput Sequencing Data

An Optimized LCP Table Based Algorithm for Frequent String Mining

String Mining in Bioinformatics

Space Efficient String Mining under Frequency Constraints

string mining
Recently Published Documents