PEDLA: predicting enhancers with a deep learning-based algorithmic framework

Mapping Intimacies ◽

10.1101/036129 ◽

2016 ◽

Cited By ~ 2

Author(s):

Feng Liu ◽

Hao Li ◽

Chao Ren ◽

Xiaochen Bo ◽

Wenjie Shu

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Cell Types ◽

Heterogeneous Data ◽

Superior Performance ◽

Enhancer Prediction ◽

A Genome ◽

Heterogeneous Features ◽

Independent Test ◽

Algorithmic Framework

AbstractTranscriptional enhancers are non-coding segments of DNA that play a central role in the spatiotemporal regulation of gene expression programs. However, systematically and precisely predicting enhancers remain a major challenge. Although existing methods have achieved some success in enhancer prediction, they still suffer from many issues. We developed a deep learning-based algorithmic framework named PEDLA (https://github.com/wenjiegroup/PEDLA), which can directly learn an enhancer predictor from massively heterogeneous data and generalize in ways that are mostly consistent across various cell types/tissues. We first trained PEDLA with 1,114-dimensional heterogeneous features in H1 cells, and we demonstrated that our PEDLA framework integrates diverse heterogeneous features and gives state-of-the-art performance relative to five existing methods for enhancer prediction. We further extended PEDLA to iteratively learn from 22 training cell types/tissues. Our results showed that PEDLA manifested superior performance consistency in both training and independent test sets. On average, PEDLA achieved 95.0% accuracy and a 96.8% geometric mean (GM) across 22 training cell types/tissues, as well as 95.7% accuracy and a 96.8% GM across 20 independent test cell types/tissues. Together, our work illustrates the power of harnessing state-of-the-art deep learning techniques to consistently identify regulatory elements at a genome-wide scale from massively heterogeneous data across diverse cell types/tissues.

Download Full-text

Classification of Hyperspectral Image Based on Double-Branch Dual-Attention Mechanism Network

Remote Sensing ◽

10.3390/rs12030582 ◽

2020 ◽

Vol 12 (3) ◽

pp. 582 ◽

Cited By ~ 4

Author(s):

Rui Li ◽

Shunyi Zheng ◽

Chenxi Duan ◽

Yang Yang ◽

Xiqi Wang

Keyword(s):

Deep Learning ◽

Hyperspectral Image ◽

State Of The Art ◽

Attention Mechanism ◽

Superior Performance ◽

Feature Maps ◽

Spatial Features ◽

Training Samples ◽

Series Of Experiments

In recent years, researchers have paid increasing attention on hyperspectral image (HSI) classification using deep learning methods. To improve the accuracy and reduce the training samples, we propose a double-branch dual-attention mechanism network (DBDA) for HSI classification in this paper. Two branches are designed in DBDA to capture plenty of spectral and spatial features contained in HSI. Furthermore, a channel attention block and a spatial attention block are applied to these two branches respectively, which enables DBDA to refine and optimize the extracted feature maps. A series of experiments on four hyperspectral datasets show that the proposed framework has superior performance to the state-of-the-art algorithm, especially when the training samples are signally lacking.

Download Full-text

SeqEnhDL: sequence-based classification of cell type-specific enhancers using deep learning models

10.21203/rs.3.rs-94396/v1 ◽

2020 ◽

Author(s):

Yupeng Wang ◽

Rosario Jaime-Lara ◽

Abhrarup Roy ◽

Ying Sun ◽

Xinyue Liu ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Cell Types ◽

Regulatory Elements ◽

Learning Models ◽

Cell Type ◽

Coding Sequences ◽

Sequence Features ◽

A Genome ◽

Cell Type Specific

Abstract ObjectiveComputational identification of cell type-specific regulatory elements on a genome-wide scale is very challenging.ResultsWe propose SeqEnhDL, a deep learning framework for classifying cell type-specific enhancers based on sequence features. DNA sequences of “strong enhancer” chromatin states in nine cell types from the ENCODE project were retrieved to build and test enhancer classifiers. For any DNA sequence, sequential k-mer (k=5, 7, 9 and 11) fold changes relative to randomly selected non-coding sequences were used as features for deep learning models. Three deep learning models were implemented, including multi-layer perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). All models in SeqEnhDL outperform state-of-the-art enhancer classifiers including gkm-SVM and DanQ, with regard to distinguishing cell type-specific enhancers from randomly selected non-coding sequences. Moreover, SeqEnhDL is able to directly discriminate enhancers from different cell types, which has not been achieved by other enhancer classifiers. Our analysis suggests that both enhancers and their tissue-specificity can be accurately identified according to their sequence features. SeqEnhDL is publicly available at https://github.com/wyp1125/SeqEnhDL.

Download Full-text

GraphProt2: A novel deep learning-based method for predicting binding sites of RNA-binding proteins

10.1101/850024 ◽

2019 ◽

Cited By ~ 2

Author(s):

Michael Uhl ◽

Van Dinh Tran ◽

Rolf Backofen

Keyword(s):

Gene Expression ◽

Neural Networks ◽

Deep Learning ◽

Binding Sites ◽

Binding Proteins ◽

Rna Binding ◽

Rna Binding Proteins ◽

State Of The Art ◽

Prediction Method ◽

Superior Performance

AbstractCLIP-seq is the state-of-the-art technique to experimentally determine transcriptome-wide binding sites of RNA-binding proteins (RBPs). However, it relies on gene expression which can be highly variable between conditions, and thus cannot provide a complete picture of the RBP binding landscape. This necessitates the use of computational methods to predict missing binding sites. Here we present GraphProt2, a computational RBP binding site prediction method based on graph convolutional neural networks (GCN). In contrast to current CNN methods, GraphProt2 supports variable length input as well as the possibility to accurately predict nucleotide-wise binding profiles. We demonstrate its superior performance compared to GraphProt and a CNN-based method on single as well as combined CLIP-seq datasets.

Download Full-text

Continuous Training and Deployment of Deep Learning Models

Datenbank-Spektrum ◽

10.1007/s13222-021-00386-8 ◽

2021 ◽

Author(s):

Ioannis Prapas ◽

Behrouz Derakhshan ◽

Alireza Rezaei Mahdiraji ◽

Volker Markl

Keyword(s):

Deep Learning ◽

Historical Data ◽

State Of The Art ◽

Streaming Data ◽

Superior Performance ◽

Learning Models ◽

Model Quality ◽

Continuous Training ◽

Training Time ◽

Machine Learning Methods

AbstractDeep Learning (DL) has consistently surpassed other Machine Learning methods and achieved state-of-the-art performance in multiple cases. Several modern applications like financial and recommender systems require models that are constantly updated with fresh data. The prominent approach for keeping a DL model fresh is to trigger full retraining from scratch when enough new data are available. However, retraining large and complex DL models is time-consuming and compute-intensive. This makes full retraining costly, wasteful, and slow. In this paper, we present an approach to continuously train and deploy DL models. First, we enable continuous training through proactive training that combines samples of historical data with new streaming data. Second, we enable continuous deployment through gradient sparsification that allows us to send a small percentage of the model updates per training iteration. Our experimental results with LeNet5 on MNIST and modern DL models on CIFAR-10 show that proactive training keeps models fresh with comparable—if not superior—performance to full retraining at a fraction of the time. Combined with gradient sparsification, sparse proactive training enables very fast updates of a deployed model with arbitrarily large sparsity, reducing communication per iteration up to four orders of magnitude, with minimal—if any—losses in model quality. Sparse training, however, comes at a price; it incurs overhead on the training that depends on the size of the model and increases the training time by factors ranging from 1.25 to 3 in our experiments. Arguably, a small price to pay for successfully enabling the continuous training and deployment of large DL models.

Download Full-text

DNC4mC-Deep: Identification and Analysis of DNA N4-Methylcytosine Sites Based on Different Encoding Schemes By Using Deep Learning

Cells ◽

10.3390/cells9081756 ◽

2020 ◽

Vol 9 (8) ◽

pp. 1756 ◽

Cited By ~ 4

Author(s):

Abdul Wahab ◽

Omid Mahmoudi ◽

Jeehong Kim ◽

Kil To Chong

Keyword(s):

Deep Learning ◽

Protein Interactions ◽

State Of The Art ◽

Critical Role ◽

Regulation Of Gene Expression ◽

The State ◽

Superior Performance ◽

Training Dataset ◽

Conformation Stability ◽

Deep Learning Model

N4-methylcytosine as one kind of modification of DNA has a critical role which alters genetic performance such as protein interactions, conformation, stability in DNA as well as the regulation of gene expression same cell developmental and genomic imprinting. Some different 4mC site identifiers have been proposed for various species. Herein, we proposed a computational model, DNC4mC-Deep, including six encoding techniques plus a deep learning model to predict 4mC sites in the genome of F. vesca, R. chinensis, and Cross-species dataset. It was demonstrated by the 10-fold cross-validation test to get superior performance. The DNC4mC-Deep obtained 0.829 and 0.929 of MCC on F. vesca and R. chinensis training dataset, respectively, and 0.814 on cross-species. This means the proposed method outperforms the state-of-the-art predictors at least 0.284 and 0.265 on F. vesca and R. chinensis training dataset in turn. Furthermore, the DNC4mC-Deep achieved 0.635 and 0.565 of MCC on F. vesca and R. chinensis independent dataset, respectively, and 0.562 on cross-species which shows it can achieve the best performance to predict 4mC sites as compared to the state-of-the-art predictor.

Download Full-text

A Deep Learning Framework for Malware Classification

International Journal of Digital Crime and Forensics ◽

10.4018/ijdcf.2020010105 ◽

2020 ◽

Vol 12 (1) ◽

pp. 90-108

Author(s):

Mahmoud Kalash ◽

Mrigank Rochan ◽

Noman Mohammed ◽

Neil Bruce ◽

Yang Wang ◽

...

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Learning Algorithms ◽

Superior Performance ◽

Traditional Learning ◽

Security Threats ◽

Learning Approaches ◽

Learning Framework ◽

Malware Classification ◽

New Strategies

In this article, the authors propose a deep learning framework for malware classification. There has been a huge increase in the volume of malware in recent years which poses serious security threats to financial institutions, businesses, and individuals. In order to combat the proliferation of malware, new strategies are essential to quickly identify and classify malware samples. Nowadays, machine learning approaches are becoming popular for malware classification. However, most of these approaches are based on shallow learning algorithms (e.g. SVM). Recently, convolutional neural networks (CNNs), a deep learning approach, have shown superior performance compared to traditional learning algorithms, especially in tasks such as image classification. Inspired by this, the authors propose a CNN-based architecture to classify malware samples. They convert malware binaries to grayscale images and subsequently train a CNN for classification. Experiments on two challenging malware classification datasets, namely Malimg and Microsoft, demonstrate that their method outperforms competing state-of-the-art algorithms.

Download Full-text

Automated recognition of ultrasound cardiac views based on deep learning with graph constraint

10.1101/2020.05.07.20094045 ◽

2020 ◽

Author(s):

Yanhua Gao ◽

Yuan Zhu ◽

Bo Liu ◽

Yue Hu ◽

Youmin Guo

Keyword(s):

Deep Learning ◽

Cardiac Cycle ◽

State Of The Art ◽

The State ◽

Automated Recognition ◽

Cardiac Image ◽

Independent Test ◽

The Mean ◽

Shape Changes ◽

Generalization Accuracy

ObjectiveIn Transthoracic echocardiographic (TTE) examination, it is essential to identify the cardiac views accurately. Computer-aided recognition is expected to improve the accuracy of the TTE examination.MethodsThis paper proposes a new method for automatic recognition of cardiac views based on deep learning, including three strategies. First, A spatial transform network is performed to learn cardiac shape changes during the cardiac cycle, which reduces intra-class variability. Second, a channel attention mechanism is introduced to adaptively recalibrates channel-wise feature responses. Finally, unlike conventional deep learning methods, which learned each input images individually, the structured signals are applied by a graph of similarities among images. These signals are transformed into the graph-based image embedding, which act as unsupervised regularization constraints to improve the generalization accuracy.ResultsThe proposed method was trained and tested in 171792 cardiac images from 584 subjects. Compared with the known result of the state of the art, the overall accuracy of the proposed method on cardiac image classification is 99.10% vs. 91.7%, and the mean AUC is 99.36%. Moreover, the overall accuracy is 98.15%, and the mean AUC is 98.96% on an independent test set with 34211 images from 100 subjects.ConclusionThe method of this paper achieved the results of the state of the art, which is expected to be an automated recognition tool for cardiac views recognition. The work confirms the potential of deep learning on ultrasound medicine.

Download Full-text

MetaRNN: Differentiating Rare Pathogenic and Rare Benign Missense SNVs and InDels Using Deep Learning

10.1101/2021.04.09.438706 ◽

2021 ◽

Author(s):

Chang Li ◽

Degui Zhi ◽

Kai Wang ◽

Xiaoming Liu

Keyword(s):

Deep Learning ◽

Prediction Models ◽

State Of The Art ◽

Single Nucleotide Variants ◽

Score Distribution ◽

Single Nucleotide ◽

Pathogenicity Prediction ◽

New Models ◽

Independent Test

We present the pathogenicity prediction models MetaRNN and MetaRNN-indel to help identify and prioritize rare nonsynonymous single nucleotide variants (nsSNVs) and non-frameshift insertion/deletions (nfINDELs) using deep learning and context annotations. Employing independent test datasets, we demonstrate that these new models outperform state-of-the-art competitors and achieve a more interpretable score distribution. MetaRNN executables and precomputed scores are available at http://www.liulab.science/MetaRNN.

Download Full-text

MarkerCapsule: Explainable Single Cell Typing using Capsule Networks

10.1101/2020.09.22.307512 ◽

2020 ◽

Author(s):

Sumanta Ray ◽

Alexander Schönhuth

Keyword(s):

Single Cell ◽

State Of The Art ◽

Activity Patterns ◽

Cell Types ◽

Heterogeneous Data ◽

The State ◽

Manual Annotation ◽

Human Knowledge ◽

Typing Methods ◽

Cell Typing

ABSTRACTMany single cell typing methods require manual annotation which casts problems with respect to resolution of (sub-)types, manpower resources and bias towards existing human knowledge. The integration of heterogeneous data and biologically meaningful interpretation of results are further current key challenges. We introduce MarkerCapsule, which leverages the landmark advantages of capsule networks achieved in their original applications in single cell typing. Thereby, the small amount of labeled data required and the naturally arising, biologically meaningful interpretation of cell types in terms of characteristic gene activity patterns are exemplary strengths, beyond outperforming the state of the art in terms of basic typing accuracy. MarkerCapsule is available at: https://github.com/sumantaray/MarkerCapsule.

Download Full-text

Sample-Efficient Deep Learning for COVID-19 Diagnosis Based on CT Scans

10.1101/2020.04.13.20063941 ◽

2020 ◽

Cited By ~ 5

Author(s):

Xuehai He ◽

Xingyi Yang ◽

Shanghang Zhang ◽

Jinyu Zhao ◽

Yichen Zhang ◽

...

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Superior Performance ◽

Ct Scans ◽

Learning To Learn ◽

Learning Methods ◽

Feature Representations ◽

Medical Tests ◽

Diagnosis Accuracy ◽

Ct Data

AbstractCoronavirus disease 2019 (COVID-19) has infected more than 1.3 million individuals all over the world and caused more than 106,000 deaths. One major hurdle in controlling the spreading of this disease is the inefficiency and shortage of medical tests. There have been increasing efforts on developing deep learning methods to diagnose COVID-19 based on CT scans. However, these works are difficult to reproduce and adopt since the CT data used in their studies are not publicly available. Besides, these works require a large number of CTs to train accurate diagnosis models, which are difficult to obtain. In this paper, we aim to address these two problems. We build a publicly-available dataset containing hundreds of CT scans positive for COVID-19 and develop sample-efficient deep learning methods that can achieve high diagnosis accuracy of COVID-19 from CT scans even when the number of training CT images are limited. Specifically, we propose a Self-Trans approach, which synergistically integrates contrastive self-supervised learning with transfer learning to learn powerful and unbiased feature representations for reducing the risk of overfitting. Extensive experiments demonstrate the superior performance of our proposed Self-Trans approach compared with several state-of-the-art baselines. Our approach achieves an F1 of 0.85 and an AUC of 0.94 in diagnosing COVID-19 from CT scans, even though the number of training CTs is just a few hundred.

Download Full-text