CNN Model With Hilbert Curve Representation of DNA Sequence For Enhancer Prediction

DNA Sequence and Functional Analysis of Homologous ARS Elements of Saccharomyces cerevisiae and S. carlsbergensis

Genetics ◽

10.1093/genetics/152.3.943 ◽

1999 ◽

Vol 152 (3) ◽

pp. 943-952

Author(s):

James F Theis ◽

Chen Yang ◽

Christopher B Schaefer ◽

Carol S Newlon

Keyword(s):

Saccharomyces Cerevisiae ◽

Dna Sequence ◽

Dna Sequences ◽

Consensus Sequence ◽

Chromosomal Dna ◽

Functional Elements ◽

Sequence Comparisons ◽

Cis Acting ◽

Homologous Sequences ◽

Ars Elements

Abstract ARS elements of Saccharomyces cerevisiae are the cis-acting sequences required for the initiation of chromosomal DNA replication. Comparisons of the DNA sequences of unrelated ARS elements from different regions of the genome have revealed no significant DNA sequence conservation. We have compared the sequences of seven pairs of homologous ARS elements from two Saccharomyces species, S. cerevisiae and S. carlsbergensis. In all but one case, the ARS308-ARS308carl pair, significant blocks of homology were detected. In the cases of ARS305, ARS307, and ARS309, previously identified functional elements were found to be conserved in their S. carlsbergensis homologs. Mutation of the conserved sequences in the S. carlsbergensis ARS elements revealed that the homologous sequences are required for function. These observations suggested that the sequences important for ARS function would be conserved in other ARS elements. Sequence comparisons aided in the identification of the essential matches to the ARS consensus sequence (ACS) of ARS304, ARS306, and ARS310carl, though not of ARS310.

Download Full-text

Estimation of Similarity between DNA Sequences and Its Graphical Representation

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f9389.059120 ◽

2020 ◽

Vol 9 (1) ◽

pp. 43-51

Keyword(s):

Sequence Analysis ◽

Molecular Biology ◽

Dna Sequence ◽

Dna Sequences ◽

Graphical Representation ◽

Similarity Analysis ◽

Biological Sequence ◽

Field Of Study ◽

Biological Sequence Analysis ◽

Wide Range

Bioinformatics, which is now a well known field of study, originated in the context of biological sequence analysis. Recently graphical representation takes place for the research on DNA sequence. Research in biological sequence is mainly based on the function and its structure. Bioinformatics finds wide range of applications specifically in the domain of molecular biology which focuses on the analysis of molecules viz. DNA, RNA, Protein etc. In this review, we mainly deal with the similarity analysis between sequences and graphical representation of DNA sequence.

Download Full-text

NanoDJ: A Dockerized Jupyter Notebook for Interactive Oxford Nanopore MinION Sequence Manipulation and Genome Assembly

10.1101/586842 ◽

2019 ◽

Author(s):

Héctor Rodríguez-Pérez ◽

Tamara Hernández-Beeftink ◽

José M. Lorenzo-Salazar ◽

José L. Roda-García ◽

Carlos J. Pérez-González ◽

...

Keyword(s):

Quality Control ◽

Sequence Analysis ◽

Dna Sequence ◽

Dna Sequences ◽

Genome Assembly ◽

Interactive Visualization ◽

Hybrid Assembly ◽

Genomic Technologies ◽

Oxford Nanopore ◽

Oxford Nanopore Technologies

AbstractBackgroundThe Oxford Nanopore Technologies (ONT) MinION portable sequencer makes it possible to use cutting-edge genomic technologies in the field and the academic classroom.ResultsWe present NanoDJ, a Jupyter notebook integration of tools for simplified manipulation and assembly of DNA sequences produced by ONT devices. It integrates basecalling, read trimming and quality control, simulation and plotting routines with a variety of widely used aligners and assemblers, including procedures for hybrid assembly.ConclusionsWith the use of Jupyter-facilitated access to self-explanatory contents of applications and the interactive visualization of results, as well as by its distribution into a Docker software container, NanoDJ is aimed to simplify and make more reproducible ONT DNA sequence analysis. The NanoDJ package code, documentation and installation instructions are freely available at https://github.com/genomicsITER/NanoDJ.

Download Full-text

Systematics of Scleranthus (Caryophyllaceae)

10.26686/wgtn.16958875.v1 ◽

2021 ◽

Author(s):

◽

Robin David Smissen

Keyword(s):

Sequence Analysis ◽

Dna Sequence ◽

Dna Sequences ◽

Sequence Divergence ◽

Morphological Characters ◽

Dna Sequence Analysis ◽

Its Sequences ◽

Data Sets ◽

Nuclear Its ◽

Floral Characters

Scleranthus is a genus of about 12 species of herbaceous flowering plants or small shrubs with a disjunct Eurasian/Australasian distribution. Monophyly of the genus is supported by the close similarity of gynoecial development of all species and consistent with nuclear ITS DNA sequence analysis. Traditionally the genus had been divided into two sections, section Scleranthus and section Mniarum. Section Mniarum is exclusively Australasian while section Scleranthus has been circumscribed to contain exclusively European species or a combination of European and Australasian species. Pollen and floral characters align the species into Australasian and Eurasian groups also supported by nuclear ITS DNA sequence analysis. Section Scleranthus as more broadly defined (i.e., sensu West and Garnock-Jones, 1986) is therefore at least paraphyletic or at worst polypyhyletic. Phylogenetic reconstructions based on morphological characters differ from those based on ITS sequences in supporting different relationships within the Australasian species of Scleranthus. Hybridisation and introgression within the genus are discussed and suggested as the cause of discordance between morphology and DNA sequence based trees. Low sequence divergence among Scleranthus ITS sequences suggests that the European and Australasian clades within the genus diverged within the last l0 million years. Biogeographic implications of these dating and competing hypotheses explaining the disjunct North-South distribution of the genus are discussed. Nuclear ITS and chloroplast ndhF DNA sequences both suggest that Scleranthus belongs to a clade within the family Caryophyllaceae consisting of members of subfamilies Alsinoideae and Caryophylloideae. Phylogenetic relationships between genera belonging to the three subfamilies of Caryophyllaceae (Alsinoideae, Caryophyloideae, and Paronychioideae) are addressed in this thesis through ndhF sequence analysis, which provides no support for the monophyly of traditionally recognised groups. Morphological character data sets are likely to always encompass multiple incongruent data partitions (sensu Bull et al. 1993). It may therefore be appropriate to combine data from DNA sequence and morphology for parsimony analysis even where the two are significantly incongruent.

Download Full-text

On Rényi entropies of order statistics

International Journal of Biomathematics ◽

10.1142/s1793524515500801 ◽

2015 ◽

Vol 08 (06) ◽

pp. 1550080 ◽

Cited By ~ 2

Author(s):

Richa Thapliyal ◽

H. C. Taneja

Keyword(s):

Distribution Function ◽

Sequence Analysis ◽

Order Statistics ◽

Dna Sequence ◽

Dna Sequences ◽

Biological Systems ◽

Dna Sequence Analysis ◽

Entropy Measure ◽

Entropy Measures ◽

Cumulative Residual

In this paper we consider a generalize dynamic entropy measure and prove that this measure characterizes the distribution function uniquely. Also we propose cumulative residual Rényi entropy of order statistics and prove that it also determines the distribution function uniquely. Applications of entropy concepts to DNA sequence analysis, the ultimate support for the biological systems, have been widely explored by researchers. The entropy measures discussed here can be applied for analysis of ordered DNA sequences.

Download Full-text

Accurate deep learning off-target prediction with novel sgRNA-DNA sequence encoding in CRISPR-Cas9 gene editing

Bioinformatics ◽

10.1093/bioinformatics/btab112 ◽

2021 ◽

Author(s):

Jeremy Charlier ◽

Robert Nadon ◽

Vladimir Makarenkov

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Dna Sequence ◽

Dna Sequences ◽

Gene Editing ◽

Sequence Data ◽

Target Prediction ◽

Feedforward Neural Networks ◽

Strong Impact ◽

Sequence Encoding

Abstract Motivation Off-target predictions are crucial in gene editing research. Recently, significant progress has been made in the field of prediction of off-target mutations, particularly with CRISPR-Cas9 data, thanks to the use of deep learning. CRISPR-Cas9 is a gene editing technique which allows manipulation of DNA fragments. The sgRNA-DNA (single guide RNA-DNA) sequence encoding for deep neural networks, however, has a strong impact on the prediction accuracy. We propose a novel encoding of sgRNA-DNA sequences that aggregates sequence data with no loss of information. Results In our experiments, we compare the proposed sgRNA-DNA sequence encoding applied in a deep learning prediction framework with state-of-the-art encoding and prediction methods. We demonstrate the superior accuracy of our approach in a simulation study involving Feedforward Neural Networks (FNNs), Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) as well as the traditional Random Forest (RF), Naive Bayes (NB) and Logistic Regression (LR) classifiers.We highlight the quality of our results by building several FNNs, CNNs and RNNs with various layer depths and performing predictions on two popular CRISPOR and GUIDE-seq gene editing data sets. In all our experiments, the new encoding led to more accurate off-target prediction results, providing an improvement of the area under the Receiver Operating Characteristic (ROC) curve up to 35%. Availability The code and data used in this study are available at: https://github.com/dagrate/dl-offtarget

Download Full-text

DNA Sequence Analysis of dtxR Gene (Partial) of Corynebacterium diphtheriae Causing Diphtheria in Jawa and Kalimantan Islands, Indonesia

The Indonesian Biomedical Journal ◽

10.18585/inabj.v9i2.268 ◽

2017 ◽

Vol 9 (2) ◽

pp. 91

Author(s):

Sunarno Sunarno ◽

Yuanita Mulyastuti ◽

Nelly Puspandari ◽

Kambang Sariadji

Keyword(s):

Sequence Analysis ◽

Multiplex Pcr ◽

Dna Sequence ◽

Dna Sequences ◽

Sequence Data ◽

Clinical Specimen ◽

Corynebacterium Diphtheriae ◽

Dna Sequence Analysis ◽

Local Alignment ◽

Pcr Products

BACKGROUND: dtxR gene is a global regulator that can be used as a marker for detection of Corynebacterium diphtheriae (C. diphtheriae) and it is also a representative tool for mapping purpose (molecular typing) of this bacteria. The aim of this study was to analyze the DNA sequences of partial dtxR gene of C. diphtheriae causing diphtheria in some region of Indonesia. DNA sequence analysis was used to verify the accuracy of the in-house multiplex polymerase chain reaction (PCR) method that used for detection of C. diphtheriae in the clinical specimen as well as a preliminary study to determine the strain diversity of C. diphtheriae circulating in Indonesia.METHODS:Ten PCR products targeting the dtxR gene that have been detected as positive C. diphtheriae previously by in-house multiplex PCR used as samples in this study. The DNA sequencing carried out by Sanger method and the sequence data was analyzed by Bioedit software offline and basic local alignment sequence typing (BLAST) online.RESULTS: All of DNA sequence analyzed in this study were similar or identical to the dtxR gene sequence data of C. diphtheriae registered in GenBank. Within the 162 nucleotides (base 150-311) of dtxR gene that analyzed, at least 2 clonals were found among 10 samples. Substitutions of 2 nucleotides (base 225 and 273) was detected, both were silent mutation.CONCLUSION:Ten partial DNA sequences of dtxR genes in this study verify the accuracy of in-house multiplex PCR which used to identify the bacteria causing diphtheria in the clinical specimen. The DNA sequences also represent the existing diversity of the bacteria causing diphtheria circulating in Indonesia.KEYWORDS: dtxR, C. diphtheriae, diphtheria, Indonesia

Download Full-text

Systematics of Scleranthus (Caryophyllaceae)

10.26686/wgtn.16958875 ◽

2021 ◽

Author(s):

◽

Robin David Smissen

Keyword(s):

Sequence Analysis ◽

Dna Sequence ◽

Dna Sequences ◽

Sequence Divergence ◽

Morphological Characters ◽

Dna Sequence Analysis ◽

Its Sequences ◽

Data Sets ◽

Nuclear Its ◽

Floral Characters

Scleranthus is a genus of about 12 species of herbaceous flowering plants or small shrubs with a disjunct Eurasian/Australasian distribution. Monophyly of the genus is supported by the close similarity of gynoecial development of all species and consistent with nuclear ITS DNA sequence analysis. Traditionally the genus had been divided into two sections, section Scleranthus and section Mniarum. Section Mniarum is exclusively Australasian while section Scleranthus has been circumscribed to contain exclusively European species or a combination of European and Australasian species. Pollen and floral characters align the species into Australasian and Eurasian groups also supported by nuclear ITS DNA sequence analysis. Section Scleranthus as more broadly defined (i.e., sensu West and Garnock-Jones, 1986) is therefore at least paraphyletic or at worst polypyhyletic. Phylogenetic reconstructions based on morphological characters differ from those based on ITS sequences in supporting different relationships within the Australasian species of Scleranthus. Hybridisation and introgression within the genus are discussed and suggested as the cause of discordance between morphology and DNA sequence based trees. Low sequence divergence among Scleranthus ITS sequences suggests that the European and Australasian clades within the genus diverged within the last l0 million years. Biogeographic implications of these dating and competing hypotheses explaining the disjunct North-South distribution of the genus are discussed. Nuclear ITS and chloroplast ndhF DNA sequences both suggest that Scleranthus belongs to a clade within the family Caryophyllaceae consisting of members of subfamilies Alsinoideae and Caryophylloideae. Phylogenetic relationships between genera belonging to the three subfamilies of Caryophyllaceae (Alsinoideae, Caryophyloideae, and Paronychioideae) are addressed in this thesis through ndhF sequence analysis, which provides no support for the monophyly of traditionally recognised groups. Morphological character data sets are likely to always encompass multiple incongruent data partitions (sensu Bull et al. 1993). It may therefore be appropriate to combine data from DNA sequence and morphology for parsimony analysis even where the two are significantly incongruent.

Download Full-text

SEQUENCE ANALYSIS OF 18s DNA OF Melosira sp., Dunaliella sp., Isochrysis sp. AND Porphyridium sp.

KnE Life Sciences ◽

10.18502/kls.v2i1.224 ◽

2015 ◽

Vol 2 (1) ◽

pp. 592

Author(s):

Lucia Kusumawati ◽

Ruben Wahyudi ◽

Reinhard Pinontoan ◽

Maria Gorreti Lily Panggabean

Keyword(s):

Sequence Analysis ◽

Phylogenetic Tree ◽

Dna Sequence ◽

Dna Sequences ◽

Morphological Characters ◽

Rdna Sequences ◽

Pcr Products ◽

18S Rdna Sequences ◽

Pcr Method ◽

High Level

Phytoplankton has high level of biodiversity. In previous years phytoplankton was identified by their morphological characters. However, their morphology might change in different environments. These difficulties can be overcome by comparing their 18S rDNA sequences. This research is aimed to verify the identity of Melosira sp., Dunaliella sp., Isochrysis sp. and Porphyridium sp. Here, PCR method was used to amplify 18s DNA sequences. Three primer pairs were used, i.e. 18S-F and 18S-R; 501F and 1700R; 18S-2F and 18S-2R. PCR products were sequenced. MEGA5 was used to make phylogenetic tree. Genus verification for Isochrysis sp., Dunaliella sp. and Melosira sp. were conducted successfully using Blast and phylogenetic tree. 18s DNA sequence of Porphyridium sp. shows an interesting result and needs further verification. Keywords: Phytoplankton, Melosira sp., Dunaliella sp., Isochrysis sp., Porphyridium sp.

Download Full-text

Deep-BSC: Predicting Raw DNA Binding Pattern in Arabidopsis thaliana

Current Bioinformatics ◽

10.2174/1574893615999200707142852 ◽

2020 ◽

Vol 15 ◽

Author(s):

Syed Adnan Shah Bukhari ◽

Abdul Razzaq ◽

Javeria Jabeen ◽

Shaheer Khan ◽

Zulqurnain Khan

Keyword(s):

Deep Learning ◽

Dna Sequence ◽

Dna Sequences ◽

Binding Sites ◽

Rapid Development ◽

Saliency Map ◽

Computational Framework ◽

Data Set ◽

Accuracy And Precision ◽

Experimental Approaches

Background: With the rapid development of the sequencing methods in recent years, binding sites have been systematically identified in such projects as Nested-MICA and MEME. Prediction of DNA motifs with higher accuracy and precision has been a very important task for bioinformaticians. Nevertheless, experimental approaches are still timeconsuming for big data set making computational identification of binding sites indispensable. Objective: To facilitate the identification of the binding site, we proposed a deep learning architecture, named Deep-BSC (Deep-Learning Binary Search Classification), to predict binding sites in a raw DNA sequence with more precision and accuracy. Methods: Our proposed architecture purely relies on the raw DNA sequence to predict the binding sites for protein by using a convolutional neural network (CNN). We trained our deep learning model on binding sites at the nucleotide level. DNA sequence of A. thaliana is used in this study because it is a model plant. Results: The results demonstrate the effectiveness and efficiency of our method in the classification of binding sites against random sequences, using deep learning. We construct a CNN with different layers and filters to show the usefulness of max-pooling technique in the proposed method. To gain the interpretability of our approach, we further visualized binding sites in the saliency map and successfully identified similar motifs in the raw sequence. The proposed computational framework is time and resource efficient. Conclusion: Deep-BSC enables the identification of binding sites in the DNA sequences via a highly accurate CNN. The proposed computational framework can also be applied to problems such as operator, repeats in the genome, DNA markers, and recognition sites for enzymes, thereby promoting the use of Deep-BSC method in life sciences

Download Full-text