scholarly journals Dilated Convolutions for Modeling Long-Distance Genomic Dependencies

2017 ◽  
Author(s):  
Ankit Gupta ◽  
Alexander M. Rush

AbstractWe consider the task of detecting regulatory elements in the human genome directly from raw DNA. Past work has focused on small snippets of DNA, making it difficult to model long-distance dependencies that arise from DNA’s 3-dimensional conformation. In order to study long-distance dependencies, we develop and release a novel dataset for a larger-context modeling task. Using this new data set we model long-distance interactions using dilated convolutional neural networks, and compare them to standard convolutions and recurrent neural networks. We show that dilated convolutions are effective at modeling the locations of regulatory markers in the human genome, such as transcription factor binding sites, histone modifications, and DNAse hypersensitivity sites.

Nature ◽  
2020 ◽  
Vol 583 (7818) ◽  
pp. 711-719 ◽  
Author(s):  
Eric L. Van Nostrand ◽  
Peter Freese ◽  
Gabriel A. Pratt ◽  
Xiaofeng Wang ◽  
Xintao Wei ◽  
...  

AbstractMany proteins regulate the expression of genes by binding to specific regions encoded in the genome1. Here we introduce a new data set of RNA elements in the human genome that are recognized by RNA-binding proteins (RBPs), generated as part of the Encyclopedia of DNA Elements (ENCODE) project phase III. This class of regulatory elements functions only when transcribed into RNA, as they serve as the binding sites for RBPs that control post-transcriptional processes such as splicing, cleavage and polyadenylation, and the editing, localization, stability and translation of mRNAs. We describe the mapping and characterization of RNA elements recognized by a large collection of human RBPs in K562 and HepG2 cells. Integrative analyses using five assays identify RBP binding sites on RNA and chromatin in vivo, the in vitro binding preferences of RBPs, the function of RBP binding sites and the subcellular localization of RBPs, producing 1,223 replicated data sets for 356 RBPs. We describe the spectrum of RBP binding throughout the transcriptome and the connections between these interactions and various aspects of RNA biology, including RNA stability, splicing regulation and RNA localization. These data expand the catalogue of functional elements encoded in the human genome by the addition of a large set of elements that function at the RNA level by interacting with RBPs.


2020 ◽  
Vol 23 (2) ◽  
pp. 113-120
Author(s):  
A. Athanassiadou

Determination of the DNA sequence of the human genome, revealing extensive genetic variation, and the mapping of the genes and the various regulatory elements of genome function within the genomic DNA, has revolutionized the way we view the states of health and disease in our time. Genetic complexity of the genome is manifested on different levels. The first level refers to the expression of protein coding genes, as regulated by their individual promoter in linear proximity. The next level of genetic complexity involves long distance action by far away enhancers, interacting with promoters through DNA looping. This 3- dimensional (3D) regulation is further developing by chromosome folding into the so called transcription factories, for fully physiological expression. Chromosome folding, mediated by specific genetic elements - insulators - is adding to the genetic complexity by facilitating movements of chromatin of specific genomic regions - the so-called topologically associated domains (TAD) in support of transcription and other cellular functions. Further genetic complexity has emerged with the finding that over 75% of the genome is transcribed and except of the coding genes, a plethora of RNA transcripts are produced - the non-coding RNA - that has important regulatory roles in the gene expression context. The great variation of genome sequence and regulatory elements of the genome architecture are exploited in studies of genome-wide association with disease, in the framework of Precision Medicine and in general of Genomic Medicine.


2020 ◽  
Author(s):  
Marat Sabirov ◽  
Olga Kyrchanova ◽  
Galina V. Pokholkova ◽  
Artem Bonchuk ◽  
Natalia Klimenko ◽  
...  

AbstractThe architectural protein Pita is critical for Drosophila embryogenesis and predominantly binds to gene promoters and insulators. In particular, Pita is involved in the organization of boundaries between regulatory domains that controlled the expression of three hox genes in the Bithorax complex (BX-C). The best-characterized partner for Pita is the BTB/POZ-domain containing protein CP190. Using in vitro pull-down analysis, we precisely mapped two unstructured regions of Pita that interact with the BTB domain of CP190. Then we constructed transgenic lines expressing the Pita protein of the wild-type and mutant variants lacking CP190-interacting regions. The expression of the mutant protein completely complemented the null pita mutation. ChIP-seq experiments with wild-type and mutant embryos showed that the deletion of the CP190-interacting regions did not significantly affect the binding of the mutant Pita protein to most chromatin sites. However, the mutant Pita protein does not support the ability of multimerized Pita sites to prevent cross-talk between the iab-6 and iab-7 regulatory domains that activate the expression of Abdominal-B (Abd-B), one of the genes in the BX-C. The recruitment of a chimeric protein consisting of the DNA-binding domain of GAL4 and CP190-interacting region of the Pita to the GAL4 binding sites on the polytene chromosomes of larvae induces the formation of a new interband, which is a consequence of the formation of open chromatin in this region. These results suggested that the interaction with CP190 is required for the primary Pita activities, but other architectural proteins may also recruit CP190 in flies expressing only the mutant Pita protein.Author SummaryPita is required for Drosophila development and binds specifically to a long motif in active promoters and insulators. Pita belongs to the Drosophila family of zinc-finger architectural proteins, which also includes Su(Hw) and the conserved among higher eukaryotes CTCF. The architectural proteins maintain the active state of regulatory elements and the long-distance interactions between them. The CP190 protein is recruited to chromatin through interaction with the architectural proteins. Here we mapped two regions in Pita that are required for interaction with the CP190 protein. We have demonstrated that CP190-interacting region of the Pita can maintain nucleosome-free open chromatin and is critical for Pita-mediated enhancer blocking activity. At the same time, interaction with CP190 is not required for the in vivo function of the mutant Pita protein, which binds to the same regions of the genome as the wild-type protein. Unexpectedly, we found that CP190 was still associated with the most of genome regions bound by the mutant Pita protein, which suggested that other architectural proteins were continuing to recruit CP190 to these regions. These results support a model in which the regulatory elements are composed of combinations of binding sites that interact with several architectural proteins with similar functions.


2018 ◽  
Vol 7 (4) ◽  
pp. 30 ◽  
Author(s):  
Ioannis Voutsadakis

CTCF (CCCTC-binding factor) is a transcription regulator with hundreds of binding sites in the human genome. It has a main function as an insulator protein, defining together with cohesins the boundaries of areas of the genome called topologically associating domains (TADs). TADs contain regulatory elements such as enhancers which function as regulators of the transcription of genes inside the boundaries of the TAD while they are restricted from regulating genes outside these boundaries. This paper will examine the most common genetic lesions of CTCF as well as its related protein CTCFL (CTCF-like also called BORIS) in cancer using publicly available data from published genomic studies. Cancer types where abnormalities in the two genes are more common will be examined for possible associations with underlying repair defects or other prevalent genetic lesions. The putative functional effects in CTCF and CTCFL lesions will also be explored.


Author(s):  
Jaap Brink ◽  
Wah Chiu

Crotoxin complex is the principal neurotoxin of the South American rattlesnake, Crotalus durissus terrificus and has a molecular weight of 24 kDa. The protein is a heterodimer with subunit A assigneda chaperone function. Subunit B carries the lethal activity, which is exerted on both sides ofthe neuro-muscular junction, and which is thought to involve binding to the acetylcholine receptor. Insight in crotoxin complex’ mode of action can be gained from a 3 Å resolution structure obtained by electron crystallography. This abstract communicates our progress in merging the electron diffraction amplitudes into a 3-dimensional (3D) intensity data set close to completion. Since the thickness of crotoxin complex crystals varies from one crystal to the other, we chose to collect tilt series of electron diffraction patterns after determining their thickness. Furthermore, by making use of the symmetry present in these tilt data, intensities collected only from similar crystals will be merged.Suitable crystals of glucose-embedded crotoxin complex were searched for in the defocussed diffraction mode with the goniometer tilted to 55° of higher in a JEOL4000 electron cryo-microscopc operated at 400 kV with the crystals kept at -120°C in a Gatan 626 cryo-holder. The crystal thickness was measured using the local contrast of the crystal relative to the supporting film from search-mode images acquired using a 1024 x 1024 slow-scan CCD camera (model 679, Gatan Inc.).


Sensors ◽  
2020 ◽  
Vol 21 (1) ◽  
pp. 11
Author(s):  
Domonkos Haffner ◽  
Ferenc Izsák

The localization of multiple scattering objects is performed while using scattered waves. An up-to-date approach: neural networks are used to estimate the corresponding locations. In the scattering phenomenon under investigation, we assume known incident plane waves, fully reflecting balls with known diameters and measurement data of the scattered wave on one fixed segment. The training data are constructed while using the simulation package μ-diff in Matlab. The structure of the neural networks, which are widely used for similar purposes, is further developed. A complex locally connected layer is the main compound of the proposed setup. With this and an appropriate preprocessing of the training data set, the number of parameters can be kept at a relatively low level. As a result, using a relatively large training data set, the unknown locations of the objects can be estimated effectively.


BMC Biology ◽  
2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Alexandre Z. Daly ◽  
Lindsey A. Dudley ◽  
Michael T. Peel ◽  
Stephen A. Liebhaber ◽  
Stephen C. J. Parker ◽  
...  

Abstract Background The pituitary gland is a neuroendocrine organ containing diverse cell types specialized in secreting hormones that regulate physiology. Pituitary thyrotropes produce thyroid-stimulating hormone (TSH), a critical factor for growth and maintenance of metabolism. The transcription factors POU1F1 and GATA2 have been implicated in thyrotrope fate, but the transcriptomic and epigenomic landscapes of these neuroendocrine cells have not been characterized. The goal of this work was to discover transcriptional regulatory elements that drive thyrotrope fate. Results We identified the transcription factors and epigenomic changes in chromatin that are associated with differentiation of POU1F1-expressing progenitors into thyrotropes using cell lines that represent an undifferentiated Pou1f1 lineage progenitor (GHF-T1) and a committed thyrotrope line that produces TSH (TαT1). We compared RNA-seq, ATAC-seq, histone modification (H3K27Ac, H3K4Me1, and H3K27Me3), and POU1F1 binding in these cell lines. POU1F1 binding sites are commonly associated with bZIP transcription factor consensus binding sites in GHF-T1 cells and Helix-Turn-Helix (HTH) or basic Helix-Loop-Helix (bHLH) factors in TαT1 cells, suggesting that these classes of transcription factors may recruit or cooperate with POU1F1 binding at unique sites. We validated enhancer function of novel elements we mapped near Cga, Pitx1, Gata2, and Tshb by transfection in TαT1 cells. Finally, we confirmed that an enhancer element near Tshb can drive expression in thyrotropes of transgenic mice, and we demonstrate that GATA2 enhances Tshb expression through this element. Conclusion These results extend the ENCODE multi-omic profiling approach to the pituitary gland, which should be valuable for understanding pituitary development and disease pathogenesis. Graphical abstract


2020 ◽  
Vol 6 ◽  
Author(s):  
Jaime de Miguel Rodríguez ◽  
Maria Eugenia Villafañe ◽  
Luka Piškorec ◽  
Fernando Sancho Caparrini

Abstract This work presents a methodology for the generation of novel 3D objects resembling wireframes of building types. These result from the reconstruction of interpolated locations within the learnt distribution of variational autoencoders (VAEs), a deep generative machine learning model based on neural networks. The data set used features a scheme for geometry representation based on a ‘connectivity map’ that is especially suited to express the wireframe objects that compose it. Additionally, the input samples are generated through ‘parametric augmentation’, a strategy proposed in this study that creates coherent variations among data by enabling a set of parameters to alter representative features on a given building type. In the experiments that are described in this paper, more than 150 k input samples belonging to two building types have been processed during the training of a VAE model. The main contribution of this paper has been to explore parametric augmentation for the generation of large data sets of 3D geometries, showcasing its problems and limitations in the context of neural networks and VAEs. Results show that the generation of interpolated hybrid geometries is a challenging task. Despite the difficulty of the endeavour, promising advances are presented.


Animals ◽  
2020 ◽  
Vol 11 (1) ◽  
pp. 50
Author(s):  
Jennifer Salau ◽  
Jan Henning Haas ◽  
Wolfgang Junge ◽  
Georg Thaller

Machine learning methods have become increasingly important in animal science, and the success of an automated application using machine learning often depends on the right choice of method for the respective problem and data set. The recognition of objects in 3D data is still a widely studied topic and especially challenging when it comes to the partition of objects into predefined segments. In this study, two machine learning approaches were utilized for the recognition of body parts of dairy cows from 3D point clouds, i.e., sets of data points in space. The low cost off-the-shelf depth sensor Microsoft Kinect V1 has been used in various studies related to dairy cows. The 3D data were gathered from a multi-Kinect recording unit which was designed to record Holstein Friesian cows from both sides in free walking from three different camera positions. For the determination of the body parts head, rump, back, legs and udder, five properties of the pixels in the depth maps (row index, column index, depth value, variance, mean curvature) were used as features in the training data set. For each camera positions, a k nearest neighbour classifier and a neural network were trained and compared afterwards. Both methods showed small Hamming losses (between 0.007 and 0.027 for k nearest neighbour (kNN) classification and between 0.045 and 0.079 for neural networks) and could be considered successful regarding the classification of pixel to body parts. However, the kNN classifier was superior, reaching overall accuracies 0.888 to 0.976 varying with the camera position. Precision and recall values associated with individual body parts ranged from 0.84 to 1 and from 0.83 to 1, respectively. Once trained, kNN classification is at runtime prone to higher costs in terms of computational time and memory compared to the neural networks. The cost vs. accuracy ratio for each methodology needs to be taken into account in the decision of which method should be implemented in the application.


Author(s):  
Harri Makkonen ◽  
Jorma J. Palvimo

AbstractAndrogen receptor (AR) acts as a hormone-controlled transcription factor that conveys the messages of both natural and synthetic androgens to the level of genes and gene programs. Defective AR signaling leads to a wide array of androgen insensitivity disorders, and deregulated AR function, in particular overexpression of AR, is involved in the growth and progression of prostate cancer. Classic models of AR action view AR-binding sites as upstream regulatory elements in gene promoters or their proximity. However, recent wider genomic screens indicate that AR target genes are commonly activated through very distal chromatin-binding sites. This highlights the importance of long-range chromatin regulation of transcription by the AR, shifting the focus from the linear gene models to three-dimensional models of AR target genes and gene programs. The capability of AR to regulate promoters from long distances in the chromatin is particularly important when evaluating the role of AR in the regulation of genes in malignant prostate cells that frequently show striking genomic aberrations, especially gene fusions. Therefore, in addition to the mechanisms of DNA loop formation between the enhancer bound ARs and the transcription apparatus at the target core promoter, the mechanisms insulating distally bound ARs from promiscuously making contacts and activating other than their normal target gene promoters are critical for proper physiological regulation and thus currently under intense investigation. This review discusses the current knowledge about the AR action in the context of gene aberrations and the three-dimensional chromatin landscape of prostate cancer cells.


Sign in / Sign up

Export Citation Format

Share Document