scholarly journals Evaluation of Convolutionary Neural Networks Modeling of DNA Sequences using Ordinal versus one-hot Encoding Method

2017 ◽  
Author(s):  
Allen Chieng Hoon Choong ◽  
Nung Kion Lee

AbstractConvolutionary neural network (CNN) is a popular choice for supervised DNA motif prediction due to its excellent performances. To employ CNN, the input DNA sequences are required to be encoded as numerical values and represented as either vectors or multi-dimensional matrices. This paper evaluates a simple and more compact ordinal encoding method versus the popular one-hot encoding for DNA sequences. We compare the performances of both encoding methods using three sets of datasets enriched with DNA motifs. We found that the ordinal encoding performs comparable to the one-hot method but with significant reduction in training time. In addition, the one-hot encoding performances are rather consistent across various datasets but would require suitable CNN configuration to perform well. The ordinal encoding with matrix representation performs best in some of the evaluated datasets. This study implies that the performances of CNN for DNA motif discovery depends on the suitable design of the sequence encoding and representation. The good performances of the ordinal encoding method demonstrates that there are still rooms for improvement for the one-hot encoding method.

Author(s):  
ESSAM AL DAOUD

In this study, a new genetic algorithm was developed to discover the best motifs in a set of DNA sequences. The main steps were: finding the potential positions in each sequence by using few voters (1–5 sequences), constructing the chromosomes from the potential positions, evaluating the fitness for each gene (position) and for each chromosome, calculating the new random distribution, and using the new distribution to generate the next generation. To verify the effectiveness of the proposed algorithm, several real and artificial datasets were used; the results are compared to the standard genetic algorithm, and Gibbs, MEME, and consensus algorithms. Although all the algorithms have low correlation with the correct motifs, the new algorithm exhibits higher accuracy, without sacrificing implementation time.


2015 ◽  
Vol 2015 ◽  
pp. 1-10 ◽  
Author(s):  
Chunxiao Sun ◽  
Hongwei Huo ◽  
Qiang Yu ◽  
Haitao Guo ◽  
Zhigang Sun

The planted(l,d)motif search (PMS) is one of the fundamental problems in bioinformatics, which plays an important role in locating transcription factor binding sites (TFBSs) in DNA sequences. Nowadays, identifying weak motifs and reducing the effect of local optimum are still important but challenging tasks for motif discovery. To solve the tasks, we propose a new algorithm, APMotif, which first applies the Affinity Propagation (AP) clustering in DNA sequences to produce informative and good candidate motifs and then employs Expectation Maximization (EM) refinement to obtain the optimal motifs from the candidate motifs. Experimental results both on simulated data sets and real biological data sets show that APMotif usually outperforms four other widely used algorithms in terms of high prediction accuracy.


2019 ◽  
Vol 15 (1) ◽  
pp. 4-26
Author(s):  
Fatma A. Hashim ◽  
Mai S. Mabrouk ◽  
Walid A.L. Atabany

Background: Bioinformatics is an interdisciplinary field that combines biology and information technology to study how to deal with the biological data. The DNA motif discovery problem is the main challenge of genome biology and its importance is directly proportional to increasing sequencing technologies which produce large amounts of data. DNA motif is a repeated portion of DNA sequences of major biological interest with important structural and functional features. Motif discovery plays a vital role in the antibody-biomarker identification which is useful for diagnosis of disease and to identify Transcription Factor Binding Sites (TFBSs) that help in learning the mechanisms for regulation of gene expression. Recently, scientists discovered that the TFs have a mutation rate five times higher than the flanking sequences, so motif discovery also has a crucial role in cancer discovery. Methods: Over the past decades, many attempts use different algorithms to design fast and accurate motif discovery tools. These algorithms are generally classified into consensus or probabilistic approach. Results: Many of DNA motif discovery algorithms are time-consuming and easily trapped in a local optimum. Conclusion: Nature-inspired algorithms and many of combinatorial algorithms are recently proposed to overcome the problems of consensus and probabilistic approaches. This paper presents a general classification of motif discovery algorithms with new sub-categories. It also presents a summary comparison between them.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Giovanni Scala ◽  
Antonio Federico ◽  
Dario Greco

Abstract Background The investigation of molecular alterations associated with the conservation and variation of DNA methylation in eukaryotes is gaining interest in the biomedical research community. Among the different determinants of methylation stability, the DNA composition of the CpG surrounding regions has been shown to have a crucial role in the maintenance and establishment of methylation statuses. This aspect has been previously characterized in a quantitative manner by inspecting the nucleotidic composition in the region. Research in this field still lacks a qualitative perspective, linked to the identification of certain sequences (or DNA motifs) related to particular DNA methylation phenomena. Results Here we present a novel computational strategy based on short DNA motif discovery in order to characterize sequence patterns related to aberrant CpG methylation events. We provide our framework as a user-friendly, shiny-based application, CpGmotifs, to easily retrieve and characterize DNA patterns related to CpG methylation in the human genome. Our tool supports the functional interpretation of deregulated methylation events by predicting transcription factors binding sites (TFBS) encompassing the identified motifs. Conclusions CpGmotifs is an open source software. Its source code is available on GitHub https://github.com/Greco-Lab/CpGmotifs and a ready-to-use docker image is provided on DockerHub at https://hub.docker.com/r/grecolab/cpgmotifs.


2018 ◽  
Vol 32 (3) ◽  
pp. 759-768 ◽  
Author(s):  
Nung Kion Lee ◽  
Farah Liyana Azizan ◽  
Yu Shiong Wong ◽  
Norshafarina Omar

Sign in / Sign up

Export Citation Format

Share Document