An Affinity Propagation-Based DNA Motif Discovery Algorithm

BioMed Research International ◽

10.1155/2015/853461 ◽

2015 ◽

Vol 2015 ◽

pp. 1-10 ◽

Cited By ~ 5

Author(s):

Chunxiao Sun ◽

Hongwei Huo ◽

Qiang Yu ◽

Haitao Guo ◽

Zhigang Sun

Keyword(s):

Dna Sequences ◽

Motif Discovery ◽

Simulated Data ◽

Biological Data ◽

Affinity Propagation ◽

Local Optimum ◽

Data Sets ◽

Dna Motif ◽

Challenging Tasks ◽

Dna Motif Discovery

The planted(l,d)motif search (PMS) is one of the fundamental problems in bioinformatics, which plays an important role in locating transcription factor binding sites (TFBSs) in DNA sequences. Nowadays, identifying weak motifs and reducing the effect of local optimum are still important but challenging tasks for motif discovery. To solve the tasks, we propose a new algorithm, APMotif, which first applies the Affinity Propagation (AP) clustering in DNA sequences to produce informative and good candidate motifs and then employs Expectation Maximization (EM) refinement to obtain the optimal motifs from the candidate motifs. Experimental results both on simulated data sets and real biological data sets show that APMotif usually outperforms four other widely used algorithms in terms of high prediction accuracy.

Download Full-text

Comparative Analysis of DNA Motif Discovery Algorithms: A Systemic Review

Current Cancer Therapy Reviews ◽

10.2174/1573394714666180417161728 ◽

2019 ◽

Vol 15 (1) ◽

pp. 4-26

Author(s):

Fatma A. Hashim ◽

Mai S. Mabrouk ◽

Walid A.L. Atabany

Keyword(s):

Dna Sequences ◽

Motif Discovery ◽

Probabilistic Approach ◽

Biological Data ◽

Systemic Review ◽

Local Optimum ◽

Dna Motif ◽

Functional Features ◽

Dna Motif Discovery ◽

Discovery Algorithms

Background: Bioinformatics is an interdisciplinary field that combines biology and information technology to study how to deal with the biological data. The DNA motif discovery problem is the main challenge of genome biology and its importance is directly proportional to increasing sequencing technologies which produce large amounts of data. DNA motif is a repeated portion of DNA sequences of major biological interest with important structural and functional features. Motif discovery plays a vital role in the antibody-biomarker identification which is useful for diagnosis of disease and to identify Transcription Factor Binding Sites (TFBSs) that help in learning the mechanisms for regulation of gene expression. Recently, scientists discovered that the TFs have a mutation rate five times higher than the flanking sequences, so motif discovery also has a crucial role in cancer discovery. Methods: Over the past decades, many attempts use different algorithms to design fast and accurate motif discovery tools. These algorithms are generally classified into consensus or probabilistic approach. Results: Many of DNA motif discovery algorithms are time-consuming and easily trapped in a local optimum. Conclusion: Nature-inspired algorithms and many of combinatorial algorithms are recently proposed to overcome the problems of consensus and probabilistic approaches. This paper presents a general classification of motif discovery algorithms with new sub-categories. It also presents a summary comparison between them.

Download Full-text

EFFICIENT DNA MOTIF DISCOVERY USING MODIFIED GENETIC ALGORITHM

International Journal of Computational Intelligence and Applications ◽

10.1142/s146902681350017x ◽

2013 ◽

Vol 12 (03) ◽

pp. 1350017

Author(s):

ESSAM AL DAOUD

Keyword(s):

Genetic Algorithm ◽

Dna Sequences ◽

Motif Discovery ◽

Consensus Algorithms ◽

Gene Position ◽

Dna Motif ◽

Implementation Time ◽

Dna Motif Discovery ◽

Standard Genetic Algorithm ◽

New Distribution

In this study, a new genetic algorithm was developed to discover the best motifs in a set of DNA sequences. The main steps were: finding the potential positions in each sequence by using few voters (1–5 sequences), constructing the chromosomes from the potential positions, evaluating the fitness for each gene (position) and for each chromosome, calculating the new random distribution, and using the new distribution to generate the next generation. To verify the effectiveness of the proposed algorithm, several real and artificial datasets were used; the results are compared to the standard genetic algorithm, and Gibbs, MEME, and consensus algorithms. Although all the algorithms have low correlation with the correct motifs, the new algorithm exhibits higher accuracy, without sacrificing implementation time.

Download Full-text

PairMotifChIP: A Fast Algorithm for Discovery of Patterns Conserved in Large ChIP-seq Data Sets

BioMed Research International ◽

10.1155/2016/4986707 ◽

2016 ◽

Vol 2016 ◽

pp. 1-10 ◽

Cited By ~ 3

Author(s):

Qiang Yu ◽

Hongwei Huo ◽

Dazheng Feng

Keyword(s):

Dna Sequences ◽

Motif Discovery ◽

High Throughput Sequencing ◽

Hamming Distance ◽

Simulated Data ◽

Real Data ◽

Identification Accuracy ◽

Data Sets ◽

Sequencing Data ◽

Data Set

Identifying conserved patterns in DNA sequences, namely, motif discovery, is an important and challenging computational task. With hundreds or more sequences contained, the high-throughput sequencing data set is helpful to improve the identification accuracy of motif discovery but requires an even higher computing performance. To efficiently identify motifs in large DNA data sets, a new algorithm called PairMotifChIP is proposed by extracting and combining pairs of l-mers in the input with relatively small Hamming distance. In particular, a method for rapidly extracting pairs of l-mers is designed, which can be used not only for PairMotifChIP, but also for other DNA data mining tasks with the same demand. Experimental results on the simulated data show that the proposed algorithm can find motifs successfully and runs faster than the state-of-the-art motif discovery algorithms. Furthermore, the validity of the proposed algorithm has been verified on real data.

Download Full-text

Evaluation of Convolutionary Neural Networks Modeling of DNA Sequences using Ordinal versus one-hot Encoding Method

10.1101/186965 ◽

2017 ◽

Cited By ~ 4

Author(s):

Allen Chieng Hoon Choong ◽

Nung Kion Lee

Keyword(s):

Dna Sequences ◽

Motif Discovery ◽

Matrix Representation ◽

Training Time ◽

Motif Prediction ◽

Dna Motif ◽

Sequence Encoding ◽

Dna Motif Discovery ◽

Encoding Method ◽

The One

AbstractConvolutionary neural network (CNN) is a popular choice for supervised DNA motif prediction due to its excellent performances. To employ CNN, the input DNA sequences are required to be encoded as numerical values and represented as either vectors or multi-dimensional matrices. This paper evaluates a simple and more compact ordinal encoding method versus the popular one-hot encoding for DNA sequences. We compare the performances of both encoding methods using three sets of datasets enriched with DNA motifs. We found that the ordinal encoding performs comparable to the one-hot method but with significant reduction in training time. In addition, the one-hot encoding performances are rather consistent across various datasets but would require suitable CNN configuration to perform well. The ordinal encoding with matrix representation performs best in some of the evaluated datasets. This study implies that the performances of CNN for DNA motif discovery depends on the suitable design of the sequence encoding and representation. The good performances of the ordinal encoding method demonstrates that there are still rooms for improvement for the one-hot encoding method.

Download Full-text

Solving DNA motif discovery problem using improved Clonal selection algorithm with tournament selection operator

Proceedings of the International Conference on Advanced Information Science and System ◽

10.1145/3373477.3373480 ◽

2019 ◽

Author(s):

Ezgi Deniz Ülker

Keyword(s):

Motif Discovery ◽

Clonal Selection ◽

Clonal Selection Algorithm ◽

Selection Algorithm ◽

Dna Motif ◽

Tournament Selection ◽

Dna Motif Discovery ◽

Selection Operator

Download Full-text

CpGmotifs: a tool to discover DNA motifs associated to CpG methylation events

BMC Bioinformatics ◽

10.1186/s12859-021-04191-8 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Giovanni Scala ◽

Antonio Federico ◽

Dario Greco

Keyword(s):

Dna Methylation ◽

Motif Discovery ◽

Cpg Methylation ◽

Functional Interpretation ◽

Dna Motifs ◽

Molecular Alterations ◽

Link Type ◽

Dna Motif ◽

Quantitative Manner ◽

Dna Motif Discovery

Abstract Background The investigation of molecular alterations associated with the conservation and variation of DNA methylation in eukaryotes is gaining interest in the biomedical research community. Among the different determinants of methylation stability, the DNA composition of the CpG surrounding regions has been shown to have a crucial role in the maintenance and establishment of methylation statuses. This aspect has been previously characterized in a quantitative manner by inspecting the nucleotidic composition in the region. Research in this field still lacks a qualitative perspective, linked to the identification of certain sequences (or DNA motifs) related to particular DNA methylation phenomena. Results Here we present a novel computational strategy based on short DNA motif discovery in order to characterize sequence patterns related to aberrant CpG methylation events. We provide our framework as a user-friendly, shiny-based application, CpGmotifs, to easily retrieve and characterize DNA patterns related to CpG methylation in the human genome. Our tool supports the functional interpretation of deregulated methylation events by predicting transcription factors binding sites (TFBS) encompassing the identified motifs. Conclusions CpGmotifs is an open source software. Its source code is available on GitHub https://github.com/Greco-Lab/CpGmotifs and a ready-to-use docker image is provided on DockerHub at https://hub.docker.com/r/grecolab/cpgmotifs.

Download Full-text

A modified algorithm for variable length DNA motif discovery

2013 IEEE International Conference on Smart Instrumentation, Measurement and Applications (ICSIMA) ◽

10.1109/icsima.2013.6717960 ◽

2013 ◽

Cited By ~ 1

Author(s):

S. M. Samiul Islam ◽

Md. Rashed Asger ◽

Md. Abid Hasan ◽

M. Abdul Mottalib

Keyword(s):

Motif Discovery ◽

Variable Length ◽

Dna Motif ◽

Dna Motif Discovery ◽

Modified Algorithm

Download Full-text

DNA motif discovery using chemical reaction optimization

Evolutionary Intelligence ◽

10.1007/s12065-020-00444-2 ◽

2020 ◽

Author(s):

Sumit Kumar Saha ◽

Md. Rafiqul Islam ◽

Mredul Hasan

Keyword(s):

Chemical Reaction ◽

Motif Discovery ◽

Chemical Reaction Optimization ◽

Dna Motif ◽

Reaction Optimization ◽

Dna Motif Discovery

Download Full-text

A Clustering Approach for Motif Discovery in ChIP-Seq Dataset

Entropy ◽

10.3390/e21080802 ◽

2019 ◽

Vol 21 (8) ◽

pp. 802

Author(s):

Chun-xiao Sun ◽

Yu Yang ◽

Hua Wang ◽

Wen-hu Wang

Keyword(s):

Dna Sequences ◽

Motif Discovery ◽

Simulated Data ◽

Data Set ◽

Genome Wide ◽

A Genome ◽

Wide Scale ◽

Clustering Approach ◽

Ap Clustering ◽

Generation Sequencing

Chromatin immunoprecipitation combined with next-generation sequencing (ChIP-Seq) technology has enabled the identification of transcription factor binding sites (TFBSs) on a genome-wide scale. To effectively and efficiently discover TFBSs in the thousand or more DNA sequences generated by a ChIP-Seq data set, we propose a new algorithm named AP-ChIP. First, we set two thresholds based on probabilistic analysis to construct and further filter the cluster subsets. Then, we use Affinity Propagation (AP) clustering on the candidate cluster subsets to find the potential motifs. Experimental results on simulated data show that the AP-ChIP algorithm is able to make an almost accurate prediction of TFBSs in a reasonable time. Also, the validity of the AP-ChIP algorithm is tested on a real ChIP-Seq data set.

Download Full-text

DeepFinder: An integration of feature-based and deep learning approach for DNA motif discovery

Biotechnology & Biotechnological Equipment ◽

10.1080/13102818.2018.1438209 ◽

2018 ◽

Vol 32 (3) ◽

pp. 759-768 ◽

Cited By ~ 5

Author(s):

Nung Kion Lee ◽

Farah Liyana Azizan ◽

Yu Shiong Wong ◽

Norshafarina Omar

Keyword(s):

Deep Learning ◽

Motif Discovery ◽

Learning Approach ◽

Dna Motif ◽

Feature Based ◽

Dna Motif Discovery

Download Full-text