scholarly journals Denoising Genome-wide Histone ChIP-seq with Convolutional Neural Networks

2016 ◽  
Author(s):  
Pang Wei Koh ◽  
Emma Pierson ◽  
Anshul Kundaje

AbstractMotivationChromatin immunoprecipitation sequencing (ChIP-seq) experiments are commonly used to obtain genome-wide profiles of histone modifications associated with different types of functional genomic elements. However, the quality of histone ChIP-seq data is affected by a myriad of experimental parameters such as the amount of input DNA, antibody specificity, ChIP enrichment, and sequencing depth. Making accurate inferences from chromatin profiling experiments that involve diverse experimental parameters is challenging.ResultsWe introduce a convolutional denoising algorithm, Coda, that uses convolutional neural networks to learn a mapping from suboptimal to high-quality histone ChIP-seq data. This overcomes various sources of noise and variability, substantially enhancing and recovering signal when applied to low-quality chromatin profiling datasets across individuals, cell types, and species. Our method has the potential to improve data quality at reduced costs. More broadly, this approach – using a high-dimensional discriminative model to encode a generative noise process – is generally applicable to other biological domains where it is easy to generate noisy data but difficult to analytically characterize the noise or underlying data distribution.Availabilityhttps://github.com/kundajelab/[email protected]

2018 ◽  
pp. 99-103
Author(s):  
D. S. Kolesnikov ◽  
D. A. Kuznetsov

State of the art convolutional neural networks provide high accuracy in solving a wide range of problems. Usually it is achieved by a significant increasing their computational complexity and the representation of the network parameters in single-precision floating point numbers. However, due to the limited resources, the application of networks in embedded systems and mobile applications in real time is problematic. One of the methods to solve this problem is to reduce the bit depth of data and use integer arithmetic. For this purpose, the network parameters are quantized. Performing quantization, it is necessary to ensure a minimum loss of recognition accuracy. The article proposes to use an optimal uniform quantizer with an adaptive step. The quantizer step depends on the distribution function of the quantized parameters. It reduces the effect of the quantization error on the recognition accuracy. There are also described approaches to improving the quality of quantization. The proposed quantization method is estimated on the CIFAR-10 database. It is shown that the optimal uniform quantizer for CIFAR-10 database with 8-bit representation of network parameters allows to achieve the accuracy of the initial trained network.


Author(s):  
Josep Arús-Pous ◽  
Simon Johansson ◽  
Oleksii Prykhodko ◽  
Esben Jannik Bjerrum ◽  
Christian Tyrchan ◽  
...  

Recurrent Neural Networks (RNNs) trained with a set of molecules represented as unique (canonical) SMILES strings, have shown the capacity to create large chemical spaces of valid and meaningful structures. Herein we perform an extensive benchmark on models trained with subsets of GDB-13 of different sizes (1 million , 10,000 and 1,000), with different SMILES variants (canonical, randomized and DeepSMILES), with two different recurrent cell types (LSTM and GRU) and with different hyperparameter combinations. To guide the benchmarks new metrics were developed that define the generated chemical space with respect to its uniformity, closedness and completeness. Results show that models that use LSTM cells trained with 1 million randomized SMILES, a non-unique molecular string representation, are able to generate larger chemical spaces than the other approaches and they represent more accurately the target chemical space. Specifically, a model was trained with randomized SMILES that was able to generate almost all molecules from GDB-13 with a quasi-uniform probability. Models trained with smaller samples show an even bigger improvement when trained with randomized SMILES models. Additionally, models were trained on molecules obtained from ChEMBL and illustrate again that training with randomized SMILES lead to models having a better representation of the drug-like chemical space. Namely, the model trained with randomized SMILES was able to generate at least double the amount of unique molecules with the same distribution of properties comparing to one trained with canonical SMILES.


Author(s):  
H. Albanwan ◽  
R. Qin

Abstract. The digital surface models (DSM) fusion algorithms are one of the ongoing challenging problems to enhance the quality of 3D models, especially for complex regions with variable radiometric and geometric distortions like satellite datasets. DSM generation using Multiview stereo analysis (MVS) is the most common cost-efficient approach to recover elevations. Algorithms like Census-semi global matching (SGM) and Convolutional Neural Networks (MC-CNN) have been successfully implemented to generate the disparity and recover DSMs; however, their performances are limited when matching stereo pair images with ill-posed regions, low texture, dense texture, occluded, or noisy, which can yield missing or incorrect elevation values, in additions to fuzzy boundaries. DSM fusion algorithms have proven to tackle such problems, but their performance may vary based on the quality of the input and the type of fusion which can be classified into adaptive and non-adaptive. In this paper, we evaluate the performance of the adaptive and nonadaptive fusion methods using median filter, adaptive median filter, K-median clustering fusion, weighted average fusion, and adaptive spatiotemporal fusion for DSM generated using Census and MC-CNN. We perform our evaluation on 9 testing regions using stereo pair images from Worldview-3 satellite to generate DSMs using Census and MC-CNN. Our results show that adaptive fusion algorithms are more accurate than non-adaptive algorithms in predicting elevations due to their ability to learn from temporal and contextual information. Our results also show that MC-CNN produces better fusion results with a lower overall average RMSE than Census.


2020 ◽  
Vol 6 ◽  
pp. e278 ◽  
Author(s):  
Ghazaleh Khodabandelou ◽  
Etienne Routhier ◽  
Julien Mozziconacci

Application of deep neural network is a rapidly expanding field now reaching many disciplines including genomics. In particular, convolutional neural networks have been exploited for identifying the functional role of short genomic sequences. These approaches rely on gathering large sets of sequences with known functional role, extracting those sequences from whole-genome-annotations. These sets are then split into learning, test and validation sets in order to train the networks. While the obtained networks perform well on validation sets, they often perform poorly when applied on whole genomes in which the ratio of positive over negative examples can be very different than in the training set. We here address this issue by assessing the genome-wide performance of networks trained with sets exhibiting different ratios of positive to negative examples. As a case study, we use sequences encompassing gene starts from the RefGene database as positive examples and random genomic sequences as negative examples. We then demonstrate that models trained using data from one organism can be used to predict gene-start sites in a related species, when using training sets providing good genome-wide performance. This cross-species application of convolutional neural networks provides a new way to annotate any genome from existing high-quality annotations in a related reference species. It also provides a way to determine whether the sequence motifs recognised by chromatin-associated proteins in different species are conserved or not.


Author(s):  
Yao Lu ◽  
Guangming Lu ◽  
Bob Zhang ◽  
Yuanrong Xu ◽  
Jinxing Li

To construct small mobile networks without performance loss and address the over-fitting issues caused by the less abundant training datasets, this paper proposes a novel super sparse convolutional (SSC) kernel, and its corresponding network is called SSC-Net. In a SSC kernel, every spatial kernel has only one non-zero parameter and these non-zero spatial positions are all different. The SSC kernel can effectively select the pixels from the feature maps according to its non-zero positions and perform on them. Therefore, SSC can preserve the general characteristics of the geometric and the channels’ differences, resulting in preserving the quality of the retrieved features and meeting the general accuracy requirements. Furthermore, SSC can be entirely implemented by the “shift” and “group point-wise” convolutional operations without any spatial kernels (e.g., “3×3”). Therefore, SSC is the first method to remove the parameters’ redundancy from the both spatial extent and the channel extent, leading to largely decreasing the parameters and Flops as well as further reducing the img2col and col2img operations implemented by the low leveled libraries. Meanwhile, SSC-Net can improve the sparsity and overcome the over-fitting more effectively than the other mobile networks. Comparative experiments were performed on the less abundant CIFAR and low resolution ImageNet datasets. The results showed that the SSC-Nets can significantly decrease the parameters and the computational Flops without any performance losses. Additionally, it can also improve the ability of addressing the over-fitting problem on the more challenging less abundant datasets.


2020 ◽  
Vol 9 (2) ◽  
pp. 1055-1059

Identifying the quality of fresh produce while procuring is a major task that involves time and human effort in the retail industry. The main objective of this project is to identify and classify whether the apple fruit is fresh or rotten using Convolutional Neural Networks. The outcome of our study resulted in 97.92 percent accuracy for the 2 classes of approximately 5031 images in the classification, by identifying apples using Resnet 50 and then classifying them using the proposed model.


2015 ◽  
Author(s):  
David R. Kelley ◽  
Jasper Snoek ◽  
John Rinn

AbstractThe complex language of eukaryotic gene expression remains incompletely understood. Despite the importance suggested by many noncoding variants statistically associated with human disease, nearly all such variants have unknown mechanism. Here, we address this challenge using an approach based on a recent machine learning advance—deep convolutional neural networks (CNNs). We introduce an open source package Basset (https://github.com/davek44/Basset) to apply CNNs to learn the functional activity of DNA sequences from genomics data. We trained Basset on a compendium of accessible genomic sites mapped in 164 cell types by DNaseI-seq and demonstrate far greater predictive accuracy than previous methods. Basset predictions for the change in accessibility between variant alleles were far greater for GWAS SNPs that are likely to be causal relative to nearby SNPs in linkage disequilibrium with them. With Basset, a researcher can perform a single sequencing assay in their cell type of interest and simultaneously learn that cell’s chromatin accessibility code and annotate every mutation in the genome with its influence on present accessibility and latent potential for accessibility. Thus, Basset offers a powerful computational approach to annotate and interpret the noncoding genome.


2021 ◽  
pp. 1-10
Author(s):  
Halime Ergun

Fiber and vessel structures located in the cross-section are anatomical features that play an important role in identifying tree species. In order to determine the microscopic anatomical structure of these cell types, each cell must be accurately segmented. In this study, a segmentation method is proposed for wood cell images based on deep convolutional neural networks. The network, which was developed by combining two-stage CNN structures, was trained using the Adam optimization algorithm. For evaluation, the method was compared with SegNet and U-Net architectures, trained with the same dataset. The losses in these models trained were compared using IoU (Intersection over Union), accuracy, and BF-score measurements on the test data. The automatic identification of the cells in the wood images obtained using a microscope will provide a fast, inexpensive, and reliable tool for those working in this field.


Sign in / Sign up

Export Citation Format

Share Document