scholarly journals CNN Model With Hilbert Curve Representation of DNA Sequence For Enhancer Prediction

2019 ◽  
Author(s):  
Monowar Md. Anjum ◽  
Ibrahim Asadullah Tahmid ◽  
M. Sohel Rahman

AbstractMotivationEnhancers are distal cis-acting regulating regions that play a vital role in gene transcription. However, due to the inherent nature of enhancers being linearly distant from the affected gene in an irregular manner while being spatially close at the same time, systematically predicting enhancers has been a challenging task. Although several computational predictor models through both epigenetic marker analysis and sequence-based analysis have been proposed, they lack generalization capacity across different enhancer datasets and have feature dependency. On the other hand, the recent proliferation of deep learning methods has opened previously unknown avenues of approach for sequence analysis tasks which eliminates feature dependency and achieves greater generalization. Therefore, harnessing the power of deep learning based sequence analysis techniques to develop a more generalized model than the ones developed before to predict enhancer region in a DNA sequence is a topic of interest in bioinformatics.ResultsIn this study, we develop the predictor model CHilEnPred that has been trained with the visual representation of the DNA sequences with Hilbert Curve. We report our computational prediction result on FANTOM5 dataset where CHilEnPred achieves an accuracy of 94.97% and AUC of 0.987 on test data.AvailabilityOur CHilEnPred model can be freely accessed at https://github.com/iatahmid/[email protected]

Genetics ◽  
1999 ◽  
Vol 152 (3) ◽  
pp. 943-952
Author(s):  
James F Theis ◽  
Chen Yang ◽  
Christopher B Schaefer ◽  
Carol S Newlon

Abstract ARS elements of Saccharomyces cerevisiae are the cis-acting sequences required for the initiation of chromosomal DNA replication. Comparisons of the DNA sequences of unrelated ARS elements from different regions of the genome have revealed no significant DNA sequence conservation. We have compared the sequences of seven pairs of homologous ARS elements from two Saccharomyces species, S. cerevisiae and S. carlsbergensis. In all but one case, the ARS308-ARS308carl pair, significant blocks of homology were detected. In the cases of ARS305, ARS307, and ARS309, previously identified functional elements were found to be conserved in their S. carlsbergensis homologs. Mutation of the conserved sequences in the S. carlsbergensis ARS elements revealed that the homologous sequences are required for function. These observations suggested that the sequences important for ARS function would be conserved in other ARS elements. Sequence comparisons aided in the identification of the essential matches to the ARS consensus sequence (ACS) of ARS304, ARS306, and ARS310carl, though not of ARS310.


Bioinformatics, which is now a well known field of study, originated in the context of biological sequence analysis. Recently graphical representation takes place for the research on DNA sequence. Research in biological sequence is mainly based on the function and its structure. Bioinformatics finds wide range of applications specifically in the domain of molecular biology which focuses on the analysis of molecules viz. DNA, RNA, Protein etc. In this review, we mainly deal with the similarity analysis between sequences and graphical representation of DNA sequence.


2019 ◽  
Author(s):  
Héctor Rodríguez-Pérez ◽  
Tamara Hernández-Beeftink ◽  
José M. Lorenzo-Salazar ◽  
José L. Roda-García ◽  
Carlos J. Pérez-González ◽  
...  

AbstractBackgroundThe Oxford Nanopore Technologies (ONT) MinION portable sequencer makes it possible to use cutting-edge genomic technologies in the field and the academic classroom.ResultsWe present NanoDJ, a Jupyter notebook integration of tools for simplified manipulation and assembly of DNA sequences produced by ONT devices. It integrates basecalling, read trimming and quality control, simulation and plotting routines with a variety of widely used aligners and assemblers, including procedures for hybrid assembly.ConclusionsWith the use of Jupyter-facilitated access to self-explanatory contents of applications and the interactive visualization of results, as well as by its distribution into a Docker software container, NanoDJ is aimed to simplify and make more reproducible ONT DNA sequence analysis. The NanoDJ package code, documentation and installation instructions are freely available at https://github.com/genomicsITER/NanoDJ.


2021 ◽  
Author(s):  
◽  
Robin David Smissen

<p>Scleranthus is a genus of about 12 species of herbaceous flowering plants or small shrubs with a disjunct Eurasian/Australasian distribution. Monophyly of the genus is supported by the close similarity of gynoecial development of all species and consistent with nuclear ITS DNA sequence analysis. Traditionally the genus had been divided into two sections, section Scleranthus and section Mniarum. Section Mniarum is exclusively Australasian while section Scleranthus has been circumscribed to contain exclusively European species or a combination of European and Australasian species. Pollen and floral characters align the species into Australasian and Eurasian groups also supported by nuclear ITS DNA sequence analysis. Section Scleranthus as more broadly defined (i.e., sensu West and Garnock-Jones, 1986) is therefore at least paraphyletic or at worst polypyhyletic. Phylogenetic reconstructions based on morphological characters differ from those based on ITS sequences in supporting different relationships within the Australasian species of Scleranthus. Hybridisation and introgression within the genus are discussed and suggested as the cause of discordance between morphology and DNA sequence based trees. Low sequence divergence among Scleranthus ITS sequences suggests that the European and Australasian clades within the genus diverged within the last l0 million years. Biogeographic implications of these dating and competing hypotheses explaining the disjunct North-South distribution of the genus are discussed. Nuclear ITS and chloroplast ndhF DNA sequences both suggest that Scleranthus belongs to a clade within the family Caryophyllaceae consisting of members of subfamilies Alsinoideae and Caryophylloideae. Phylogenetic relationships between genera belonging to the three subfamilies of Caryophyllaceae (Alsinoideae, Caryophyloideae, and Paronychioideae) are addressed in this thesis through ndhF sequence analysis, which provides no support for the monophyly of traditionally recognised groups. Morphological character data sets are likely to always encompass multiple incongruent data partitions (sensu Bull et al. 1993). It may therefore be appropriate to combine data from DNA sequence and morphology for parsimony analysis even where the two are significantly incongruent.</p>


2015 ◽  
Vol 08 (06) ◽  
pp. 1550080 ◽  
Author(s):  
Richa Thapliyal ◽  
H. C. Taneja

In this paper we consider a generalize dynamic entropy measure and prove that this measure characterizes the distribution function uniquely. Also we propose cumulative residual Rényi entropy of order statistics and prove that it also determines the distribution function uniquely. Applications of entropy concepts to DNA sequence analysis, the ultimate support for the biological systems, have been widely explored by researchers. The entropy measures discussed here can be applied for analysis of ordered DNA sequences.


Author(s):  
Jeremy Charlier ◽  
Robert Nadon ◽  
Vladimir Makarenkov

Abstract Motivation Off-target predictions are crucial in gene editing research. Recently, significant progress has been made in the field of prediction of off-target mutations, particularly with CRISPR-Cas9 data, thanks to the use of deep learning. CRISPR-Cas9 is a gene editing technique which allows manipulation of DNA fragments. The sgRNA-DNA (single guide RNA-DNA) sequence encoding for deep neural networks, however, has a strong impact on the prediction accuracy. We propose a novel encoding of sgRNA-DNA sequences that aggregates sequence data with no loss of information. Results In our experiments, we compare the proposed sgRNA-DNA sequence encoding applied in a deep learning prediction framework with state-of-the-art encoding and prediction methods. We demonstrate the superior accuracy of our approach in a simulation study involving Feedforward Neural Networks (FNNs), Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) as well as the traditional Random Forest (RF), Naive Bayes (NB) and Logistic Regression (LR) classifiers.We highlight the quality of our results by building several FNNs, CNNs and RNNs with various layer depths and performing predictions on two popular CRISPOR and GUIDE-seq gene editing data sets. In all our experiments, the new encoding led to more accurate off-target prediction results, providing an improvement of the area under the Receiver Operating Characteristic (ROC) curve up to 35%. Availability The code and data used in this study are available at: https://github.com/dagrate/dl-offtarget


2017 ◽  
Vol 9 (2) ◽  
pp. 91
Author(s):  
Sunarno Sunarno ◽  
Yuanita Mulyastuti ◽  
Nelly Puspandari ◽  
Kambang Sariadji

BACKGROUND: dtxR gene is a global regulator that can be used as a marker for detection of Corynebacterium diphtheriae (C. diphtheriae) and it is also a representative tool for mapping purpose (molecular typing) of this bacteria. The aim of this study was to analyze the DNA sequences of partial dtxR gene of C. diphtheriae causing diphtheria in some region of Indonesia. DNA sequence analysis was used to verify the accuracy of the in-house multiplex polymerase chain reaction (PCR) method that used for detection of C. diphtheriae in the clinical specimen as well as a preliminary study to determine the strain diversity of C. diphtheriae circulating in Indonesia.METHODS:Ten PCR products targeting the dtxR gene that have been detected as positive C. diphtheriae previously by in-house multiplex PCR used as samples in this study. The DNA sequencing carried out by Sanger method and the sequence data was analyzed by Bioedit software offline and basic local alignment sequence typing (BLAST) online.RESULTS: All of DNA sequence analyzed in this study were similar or identical to the dtxR gene sequence data of C. diphtheriae registered in GenBank. Within the 162 nucleotides (base 150-311) of dtxR gene that analyzed, at least 2 clonals were found among 10 samples. Substitutions of 2 nucleotides (base 225 and 273) was detected, both were silent mutation.CONCLUSION:Ten partial DNA sequences of dtxR genes in this study verify the accuracy of in-house multiplex PCR which used to identify the bacteria causing diphtheria in the clinical specimen. The DNA sequences also represent the existing diversity of the bacteria causing diphtheria circulating in Indonesia.KEYWORDS: dtxR, C. diphtheriae, diphtheria, Indonesia


2021 ◽  
Author(s):  
◽  
Robin David Smissen

<p>Scleranthus is a genus of about 12 species of herbaceous flowering plants or small shrubs with a disjunct Eurasian/Australasian distribution. Monophyly of the genus is supported by the close similarity of gynoecial development of all species and consistent with nuclear ITS DNA sequence analysis. Traditionally the genus had been divided into two sections, section Scleranthus and section Mniarum. Section Mniarum is exclusively Australasian while section Scleranthus has been circumscribed to contain exclusively European species or a combination of European and Australasian species. Pollen and floral characters align the species into Australasian and Eurasian groups also supported by nuclear ITS DNA sequence analysis. Section Scleranthus as more broadly defined (i.e., sensu West and Garnock-Jones, 1986) is therefore at least paraphyletic or at worst polypyhyletic. Phylogenetic reconstructions based on morphological characters differ from those based on ITS sequences in supporting different relationships within the Australasian species of Scleranthus. Hybridisation and introgression within the genus are discussed and suggested as the cause of discordance between morphology and DNA sequence based trees. Low sequence divergence among Scleranthus ITS sequences suggests that the European and Australasian clades within the genus diverged within the last l0 million years. Biogeographic implications of these dating and competing hypotheses explaining the disjunct North-South distribution of the genus are discussed. Nuclear ITS and chloroplast ndhF DNA sequences both suggest that Scleranthus belongs to a clade within the family Caryophyllaceae consisting of members of subfamilies Alsinoideae and Caryophylloideae. Phylogenetic relationships between genera belonging to the three subfamilies of Caryophyllaceae (Alsinoideae, Caryophyloideae, and Paronychioideae) are addressed in this thesis through ndhF sequence analysis, which provides no support for the monophyly of traditionally recognised groups. Morphological character data sets are likely to always encompass multiple incongruent data partitions (sensu Bull et al. 1993). It may therefore be appropriate to combine data from DNA sequence and morphology for parsimony analysis even where the two are significantly incongruent.</p>


2015 ◽  
Vol 2 (1) ◽  
pp. 592
Author(s):  
Lucia Kusumawati ◽  
Ruben Wahyudi ◽  
Reinhard Pinontoan ◽  
Maria Gorreti Lily Panggabean

<p>Phytoplankton has high level of biodiversity. In previous years phytoplankton was identified by their morphological characters. However, their morphology might change in different environments. These difficulties can be overcome by comparing their 18S rDNA sequences. This research is aimed to verify the identity of Melosira sp., Dunaliella sp., Isochrysis sp. and Porphyridium sp. Here, PCR method was used to amplify 18s DNA sequences. Three primer pairs were used, i.e. 18S-F and 18S-R; 501F and 1700R; 18S-2F and 18S-2R. PCR products were sequenced. MEGA5 was used to make phylogenetic tree. Genus verification for Isochrysis sp., Dunaliella sp. and Melosira sp. were conducted successfully using Blast and phylogenetic tree. 18s DNA sequence of Porphyridium sp. shows an interesting result and needs further verification.</p><p><br /><strong>Keywords</strong>: Phytoplankton, Melosira sp., Dunaliella sp., Isochrysis sp., Porphyridium sp.</p>


2020 ◽  
Vol 15 ◽  
Author(s):  
Syed Adnan Shah Bukhari ◽  
Abdul Razzaq ◽  
Javeria Jabeen ◽  
Shaheer Khan ◽  
Zulqurnain Khan

Background: With the rapid development of the sequencing methods in recent years, binding sites have been systematically identified in such projects as Nested-MICA and MEME. Prediction of DNA motifs with higher accuracy and precision has been a very important task for bioinformaticians. Nevertheless, experimental approaches are still timeconsuming for big data set making computational identification of binding sites indispensable. Objective: To facilitate the identification of the binding site, we proposed a deep learning architecture, named Deep-BSC (Deep-Learning Binary Search Classification), to predict binding sites in a raw DNA sequence with more precision and accuracy. Methods: Our proposed architecture purely relies on the raw DNA sequence to predict the binding sites for protein by using a convolutional neural network (CNN). We trained our deep learning model on binding sites at the nucleotide level. DNA sequence of A. thaliana is used in this study because it is a model plant. Results: The results demonstrate the effectiveness and efficiency of our method in the classification of binding sites against random sequences, using deep learning. We construct a CNN with different layers and filters to show the usefulness of max-pooling technique in the proposed method. To gain the interpretability of our approach, we further visualized binding sites in the saliency map and successfully identified similar motifs in the raw sequence. The proposed computational framework is time and resource efficient. Conclusion: Deep-BSC enables the identification of binding sites in the DNA sequences via a highly accurate CNN. The proposed computational framework can also be applied to problems such as operator, repeats in the genome, DNA markers, and recognition sites for enzymes, thereby promoting the use of Deep-BSC method in life sciences


Sign in / Sign up

Export Citation Format

Share Document