scholarly journals Understanding sequence conservation with deep learning

2017 ◽  
Author(s):  
Yi Li ◽  
Daniel Quang ◽  
Xiaohui Xie

AbstractMotivationComparing the human genome to the genomes of closely related mammalian species has been a powerful tool for discovering functional elements in the human genome. Millions of conserved elements have been discovered. However, understanding the functional roles of these elements still remain a challenge, especially in noncoding regions. In particular, it is still unclear why these elements are evolutionarily conserved and what kind of functional elements are encoded within these sequences.ResultsWe present a deep learning framework, called DeepCons, to uncover potential functional elements within conserved sequences. DeepCons is a convolutional neural net (CNN) that receives a short segment of DNA sequence as input and outputs the probability of the sequence of being evolutionary conserved. DeepCons utilizes hundreds of convolution kernels to detect features within DNA sequences, and automatically learns these kernels after training the CNN model using 887,577 conserved elements and a similar number of nonconserved elements in the human genome. On a balanced test dataset, DeepCons can achieve an accuracy of 75% in determining whether a sequence element is conserved or not, and the area under the ROC curve of 0.83, based on information from the human genome alone. We further investigate the properties of the learned kernels. Some kernels are directly related to well-known regulatory motifs corresponding to transcription factors. Many kernels show positional biases relative to transcriptional start sites or transcription end sites. But most of discovered kernels do not correspond to any known functional element, suggesting that they might represent unknown categories of functional elements. We also utilize DeepCons to annotate how changes at each individual nucleotide might impact the conservation properties of the surrounding sequences.AvailabilityThe source code of DeepCons and all the learned convolution kernels in motif format is publicly available online athttps://github.com/uci-cbcl/[email protected]

2019 ◽  
Vol 63 (6) ◽  
pp. 757-771 ◽  
Author(s):  
Claire Francastel ◽  
Frédérique Magdinier

Abstract Despite the tremendous progress made in recent years in assembling the human genome, tandemly repeated DNA elements remain poorly characterized. These sequences account for the vast majority of methylated sites in the human genome and their methylated state is necessary for this repetitive DNA to function properly and to maintain genome integrity. Furthermore, recent advances highlight the emerging role of these sequences in regulating the functions of the human genome and its variability during evolution, among individuals, or in disease susceptibility. In addition, a number of inherited rare diseases are directly linked to the alteration of some of these repetitive DNA sequences, either through changes in the organization or size of the tandem repeat arrays or through mutations in genes encoding chromatin modifiers involved in the epigenetic regulation of these elements. Although largely overlooked so far in the functional annotation of the human genome, satellite elements play key roles in its architectural and topological organization. This includes functions as boundary elements delimitating functional domains or assembly of repressive nuclear compartments, with local or distal impact on gene expression. Thus, the consideration of satellite repeats organization and their associated epigenetic landmarks, including DNA methylation (DNAme), will become unavoidable in the near future to fully decipher human phenotypes and associated diseases.


2020 ◽  
Vol 26 ◽  
Author(s):  
Xiaoping Min ◽  
Fengqing Lu ◽  
Chunyan Li

: Enhancer-promoter interactions (EPIs) in the human genome are of great significance to transcriptional regulation which tightly controls gene expression. Identification of EPIs can help us better deciphering gene regulation and understanding disease mechanisms. However, experimental methods to identify EPIs are constrained by the fund, time and manpower while computational methods using DNA sequences and genomic features are viable alternatives. Deep learning methods have shown promising prospects in classification and efforts that have been utilized to identify EPIs. In this survey, we specifically focus on sequence-based deep learning methods and conduct a comprehensive review of the literatures of them. We first briefly introduce existing sequence-based frameworks on EPIs prediction and their technique details. After that, we elaborate on the dataset, pre-processing means and evaluation strategies. Finally, we discuss the challenges these methods are confronted with and suggest several future opportunities.


2021 ◽  
pp. 1-12
Author(s):  
Mukul Kumar ◽  
Nipun Katyal ◽  
Nersisson Ruban ◽  
Elena Lyakso ◽  
A. Mary Mekala ◽  
...  

Over the years the need for differentiating various emotions from oral communication plays an important role in emotion based studies. There have been different algorithms to classify the kinds of emotion. Although there is no measure of fidelity of the emotion under consideration, which is primarily due to the reason that most of the readily available datasets that are annotated are produced by actors and not generated in real-world scenarios. Therefore, the predicted emotion lacks an important aspect called authenticity, which is whether an emotion is actual or stimulated. In this research work, we have developed a transfer learning and style transfer based hybrid convolutional neural network algorithm to classify the emotion as well as the fidelity of the emotion. The model is trained on features extracted from a dataset that contains stimulated as well as actual utterances. We have compared the developed algorithm with conventional machine learning and deep learning techniques by few metrics like accuracy, Precision, Recall and F1 score. The developed model performs much better than the conventional machine learning and deep learning models. The research aims to dive deeper into human emotion and make a model that understands it like humans do with precision, recall, F1 score values of 0.994, 0.996, 0.995 for speech authenticity and 0.992, 0.989, 0.99 for speech emotion classification respectively.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yoko Kato ◽  
Harumi Katsumata ◽  
Ayumu Inutsuka ◽  
Akihiro Yamanaka ◽  
Tatsushi Onaka ◽  
...  

AbstractMultiple sequential actions, performed during parental behaviors, are essential elements of reproduction in mammalian species. We showed that neurons expressing melanin concentrating hormone (MCH) in the lateral hypothalamic area (LHA) are more active in rodents of both sexes when exhibiting parental nursing behavior. Genetic ablation of the LHA-MCH neurons impaired maternal nursing. The post-birth survival rate was lower in pups born to female mice with congenitally ablated MCH neurons under control of tet-off system, exhibiting reduced crouching behavior. Virgin female and male mice with ablated MCH neurons were less interested in pups and maternal care. Chemogenetic and optogenetic stimulation of LHA-MCH neurons induced parental nursing in virgin female and male mice. LHA-MCH GABAergic neurons project fibres to the paraventricular hypothalamic nucleus (PVN) neurons. Optogenetic stimulation of PVN induces nursing crouching behavior along with increasing plasma oxytocin levels. The hypothalamic MCH neural relays play important functional roles in parental nursing behavior in female and male mice.


2020 ◽  
Author(s):  
Agostini Federico ◽  
Zagalak Julian ◽  
Attig Jan ◽  
Ule Jernej ◽  
Nicholas M. Luscombe

AbstractBackgroundEukaryotic genomes undergo pervasive transcription, leading to the production of many types of stable and unstable RNAs. Transcription is not restricted to regions with annotated gene features but includes almost any genomic context. Currently, the source and function of most RNAs originating from intergenic regions in the human genome remains unclear.ResultsWe hypothesised that many intergenic RNA can be ascribed to the presence of as-yet unannotated genes or the ‘fuzzy’ transcription of known genes that extends beyond the annotated boundaries. To elucidate the contributions of these two sources, we assembled a dataset of >2.5 billion publicly available RNA-seq reads across 5 human cell lines and multiple cellular compartments to annotate transcriptional units in the human genome. About 80% of transcripts from unannotated intergenic regions can be attributed to the fuzzy transcription of existing genes; the remaining transcripts originate mainly from putative long non-coding RNA loci that are rarely spliced. We validated the transcriptional activity of these intergenic RNA using independent measurements, including transcriptional start sites, chromatin signatures, and genomic occupancies of RNA polymerase II in various phosphorylation states. We also analysed the nuclear localisation and sensitivities of intergenic transcripts to nucleases to illustrate that they tend to be rapidly degraded either ‘on-chromatin’ by XRN2 or ‘off-chromatin’ by the exosome.ConclusionsWe provide a curated atlas of intergenic RNAs that distinguishes between alternative processing of well annotated genes from independent transcriptional units based on the combined analysis of chromatin signatures, nuclear RNA localisation and degradation pathways.


2016 ◽  
Author(s):  
Shaun D Jackman ◽  
Benjamin P Vandervalk ◽  
Hamid Mohamadi ◽  
Justin Chu ◽  
Sarah Yeo ◽  
...  

AbstractThe assembly of DNA sequences de novo is fundamental to genomics research. It is the first of many steps towards elucidating and characterizing whole genomes. Downstream applications, including analysis of genomic variation between species, between or within individuals critically depends on robustly assembled sequences. In the span of a single decade, the sequence throughput of leading DNA sequencing instruments has increased drastically, and coupled with established and planned large-scale, personalized medicine initiatives to sequence genomes in the thousands and even millions, the development of efficient, scalable and accurate bioinformatics tools for producing high-quality reference draft genomes is timely.With ABySS 1.0, we originally showed that assembling the human genome using short 50 bp sequencing reads was possible by aggregating the half terabyte of compute memory needed over several computers using a standardized message-passing system (MPI). We present here its re-design, which departs from MPI and instead implements algorithms that employ a Bloom filter, a probabilistic data structure, to represent a de Bruijn graph and reduce memory requirements.We present assembly benchmarks of human Genome in a Bottle 250 bp Illumina paired-end and 6 kbp mate-pair libraries from a single individual, yielding a NG50 (NGA50) scaffold contiguity of 3.5 (3.0) Mbp using less than 35 GB of RAM, a modest memory requirement by today’s standard that is often available on a single computer. We also investigate the use of BioNano Genomics and 10x Genomics’ Chromium data to further improve the scaffold contiguity of this assembly to 42 (15) Mbp.


Sign in / Sign up

Export Citation Format

Share Document