Efficiently Detecting Frequent Patterns in Biological Sequences

Mining Frequent Patterns with Wildcards from Biological Sequences

2007 IEEE International Conference on Information Reuse and Integration ◽

10.1109/iri.2007.4296642 ◽

2007 ◽

Cited By ~ 7

Author(s):

Yu He ◽

Xindong Wu ◽

Xingquan Zhu ◽

Abdullah N. Arslan

Keyword(s):

Frequent Patterns ◽

Biological Sequences

Download Full-text

Frequent patterns mining in multiple biological sequences

Computers in Biology and Medicine ◽

10.1016/j.compbiomed.2013.07.009 ◽

2013 ◽

Vol 43 (10) ◽

pp. 1444-1452 ◽

Cited By ~ 11

Author(s):

Ling Chen ◽

Wei Liu

Keyword(s):

Frequent Patterns ◽

Biological Sequences

Download Full-text

Genetic Similarity Analysis Based on Positive and Negative Sequence Patterns of DNA

Symmetry ◽

10.3390/sym12122090 ◽

2020 ◽

Vol 12 (12) ◽

pp. 2090

Author(s):

Yue Lu ◽

Long Zhao ◽

Zhao Li ◽

Xiangjun Dong

Keyword(s):

Dna Sequences ◽

Sequence Similarity ◽

Sequential Patterns ◽

Similarity Analysis ◽

Frequent Patterns ◽

Biological Sequences ◽

Biological Sequence ◽

Genetic Characteristics ◽

Missing Gene ◽

Sequence Similarity Analysis

Similarity analysis of DNA sequences can clarify the homology between sequences and predict the structure of, and relationship between, them. At the same time, the frequent patterns of biological sequences explain not only the genetic characteristics of the organism, but they also serve as relevant markers for certain events of biological sequences. However, most of the aforementioned biological sequence similarity analysis methods are targeted at the entire sequential pattern, which ignores the missing gene fragment that may induce potential disease. The similarity analysis of such sequences containing a missing gene item is a blank. Consequently, some sequences with missing bases are ignored or not effectively analyzed. Thus, this paper presents a new method for DNA sequence similarity analysis. Using this method, we first mined not only positive sequential patterns, but also sequential patterns that were missing some of the base terms (collectively referred to as negative sequential patterns). Subsequently, we used these frequent patterns for similarity analysis on a two-dimensional plane. Several experiments were conducted in order to verify the effectiveness of this algorithm. The experimental results demonstrated that the algorithm can obtain various results through the selection of frequent sequential patterns and that accuracy and time efficiency was improved.

Download Full-text

Frequent Patterns Algorithm of Biological Sequences based on Pattern Prefix-tree

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2019.4.3607 ◽

2019 ◽

Vol 14 (4) ◽

pp. 574-589

Author(s):

Linyan Xue ◽

Xiaoke Zhang ◽

Fei Xie ◽

Shuang Liu ◽

Peng Lin

Keyword(s):

Pattern Mining ◽

Sequence Data ◽

Biological Significance ◽

Frequent Pattern ◽

Frequent Patterns ◽

Biological Sequences ◽

Biological Sequence ◽

Protein Database ◽

Sequence Pattern ◽

Multiple Sequences

In the application of bioinformatics, the existing algorithms cannot be directly and efficiently implement sequence pattern mining. Two fast and efficient biological sequence pattern mining algorithms for biological single sequence and multiple sequences are proposed in this paper. The concept of the basic pattern is proposed, and on the basis of mining frequent basic patterns, the frequent pattern is excavated by constructing prefix trees for frequent basic patterns. The proposed algorithms implement rapid mining of frequent patterns of biological sequences based on pattern prefix trees. In experiment the family sequence data in the pfam protein database is used to verify the performance of the proposed algorithm. The prediction results confirm that the proposed algorithms can’t only obtain the mining results with effective biological significance, but also improve the running time efficiency of the biological sequence pattern mining.

Download Full-text