Probe Efficient Feature Representation of Gapped K-mer Frequency Vectors from Sequences using Deep Neural Networks

2017 ◽  
Author(s):  
Zhen Cao ◽  
Shihua Zhang

AbstractHow to extract informative features from genome sequence is a challenging issue. Gapped k-mers frequency vectors (gkm-fv) has been presented as a new type of features in the last few years. Coupled with support vector machine (gkm-SVM), gkm-fvs have been used to achieve effective sequence-based predictions. However, the huge computation of a large kernel matrix prevents it from using large amount of data. And it is unclear how to combine gkm-fvs with other data sources in the context of string kernel. On the other hand, the high dimensionality, colinearity and sparsity of gkm-fvs hinder the use of many traditional machine learning methods without a kernel trick. Therefore, we proposed a flexible and scalable framework gkm-DNN to achieve feature representation from high-dimensional gkm-fvs using deep neural networks (DNN). We first proposed a more concise version of gkm-fvs which significantly reduce the dimension of gkm-fvs. Then we implemented an efficient method to calculate the gkm-fv of a given sequence at the first time. Finally, we adopted a DNN model with gkm-fvs as inputs to achieve efficient feature representation and a prediction task. Here, we took the transcription factor binding site prediction as an illustrative application. We applied gkm-DNN onto 467 small and 69 big human ENCODE ChIP-seq datasets to demonstrate its performance and compared it with the state-of-the-art method gkm-SVM. We demonstrated that gkm-DNN can not only improve the limitations of high dimensionality, colinearity and sparsity of gkm-fvs, but also make comparable overall performance compared with gkm-SVM using the same gkm-fvs. In addition, we used gkm-DNN to explore the representation power of gkm-fvs and provided more explanation on how gkm-fvs work.

SLEEP ◽  
2021 ◽  
Vol 44 (Supplement_2) ◽  
pp. A164-A164
Author(s):  
Pahnwat Taweesedt ◽  
JungYoon Kim ◽  
Jaehyun Park ◽  
Jangwoon Park ◽  
Munish Sharma ◽  
...  

Abstract Introduction Obstructive sleep apnea (OSA) is a common sleep-related breathing disorder with an estimation of one billion people. Full-night polysomnography is considered the gold standard for OSA diagnosis. However, it is time-consuming, expensive and is not readily available in many parts of the world. Many screening questionnaires and scores have been proposed for OSA prediction with high sensitivity and low specificity. The present study is intended to develop models with various machine learning techniques to predict the severity of OSA by incorporating features from multiple questionnaires. Methods Subjects who underwent full-night polysomnography in Torr sleep center, Texas and completed 5 OSA screening questionnaires/scores were included. OSA was diagnosed by using Apnea-Hypopnea Index ≥ 5. We trained five different machine learning models including Deep Neural Networks with the scaled principal component analysis (DNN-PCA), Random Forest (RF), Adaptive Boosting classifier (ABC), and K-Nearest Neighbors classifier (KNC) and Support Vector Machine Classifier (SVMC). Training:Testing subject ratio of 65:35 was used. All features including demographic data, body measurement, snoring and sleepiness history were obtained from 5 OSA screening questionnaires/scores (STOP-BANG questionnaires, Berlin questionnaires, NoSAS score, NAMES score and No-Apnea score). Performance parametrics were used to compare between machine learning models. Results Of 180 subjects, 51.5 % of subjects were male with mean (SD) age of 53.6 (15.1). One hundred and nineteen subjects were diagnosed with OSA. Area Under the Receiver Operating Characteristic Curve (AUROC) of DNN-PCA, RF, ABC, KNC, SVMC, STOP-BANG questionnaire, Berlin questionnaire, NoSAS score, NAMES score, and No-Apnea score were 0.85, 0.68, 0.52, 0.74, 0.75, 0.61, 0.63, 0,61, 0.58 and 0,58 respectively. DNN-PCA showed the highest AUROC with sensitivity of 0.79, specificity of 0.67, positive-predictivity of 0.93, F1 score of 0.86, and accuracy of 0.77. Conclusion Our result showed that DNN-PCA outperforms OSA screening questionnaires, scores and other machine learning models. Support (if any):


2018 ◽  
Vol 184 (1) ◽  
pp. 36-43 ◽  
Author(s):  
Gal Amit ◽  
Hanan Datz

Abstract We present here for the first time a fast and reliable automatic algorithm based on artificial neural networks for the anomaly detection of a thermoluminescence dosemeter (TLD) glow curves (GCs), and compare its performance with formerly developed support vector machine method. The GC shape of TLD depends on numerous physical parameters, which may significantly affect it. When integrated into a dosimetry laboratory, this automatic algorithm can classify ‘anomalous’ (having any kind of anomaly) GCs for manual review, and ‘regular’ (acceptable) GCs for automatic analysis. The new algorithm performance is then compared with two kinds of formerly developed support vector machine classifiers—regular and weighted ones—using three different metrics. Results show an impressive accuracy rate of 97% for TLD GCs that are correctly classified to either of the classes.


2021 ◽  
Author(s):  
Guojun Huang ◽  
Cheng Wang ◽  
Xi Fu

Aims: Individualized patient profiling is instrumental for personalized management in hepatocellular carcinoma (HCC). This study built a model based on bidirectional deep neural networks (BiDNNs), an unsupervised machine-learning approach, to integrate multi-omics data and predict survival in HCC. Methods: DNA methylation and mRNA expression data for HCC samples from the TCGA database were integrated using BiDNNs. With optimal clusters as labels, a support vector machine model was developed to predict survival. Results: Using the BiDNN-based model, samples were clustered into two survival subgroups. The survival subgroup classification was an independent prognostic factor. BiDNNs were superior to multimodal autoencoders. Conclusion: This study constructed and validated a BiDNN-based model for predicting prognosis in HCC, with implications for individualized therapies in HCC.


2020 ◽  
Vol 12 (15) ◽  
pp. 2353
Author(s):  
Henning Heiselberg

Classification of ships and icebergs in the Arctic in satellite images is an important problem. We study how to train deep neural networks for improving the discrimination of ships and icebergs in multispectral satellite images. We also analyze synthetic-aperture radar (SAR) images for comparison. The annotated datasets of ships and icebergs are collected from multispectral Sentinel-2 data and taken from the C-CORE dataset of Sentinel-1 SAR images. Convolutional Neural Networks with a range of hyperparameters are tested and optimized. Classification accuracies are considerably better for deep neural networks than for support vector machines. Deeper neural nets improve the accuracy per epoch but at the cost of longer processing time. Extending the datasets with semi-supervised data from Greenland improves the accuracy considerably whereas data augmentation by rotating and flipping the images has little effect. The resulting classification accuracies for ships and icebergs are 86% for the SAR data and 96% for the MSI data due to the better resolution and more multispectral bands. The size and quality of the datasets are essential for training the deep neural networks, and methods to improve them are discussed. The reduced false alarm rates and exploitation of multisensory data are important for Arctic search and rescue services.


2020 ◽  
Author(s):  
Simon Nachtergaele ◽  
Johan De Grave

Abstract. Artificial intelligence techniques such as deep neural networks and computer vision are developed for fission track recognition and included in a computer program for the first time. These deep neural networks use the Yolov3 object detection algorithm, which is currently one of the most powerful and fastest object recognition algorithms. These deep neural networks can be used in new software called AI-Track-tive. The developed program successfully finds most of the fission tracks in the microscope images, however, the user still needs to supervise the automatic counting. The success rates of the automatic recognition range from 70 % to 100 % depending on the areal track densities in apatite and (muscovite) external detector. The success rate generally decreases for images with high areal track densities, because overlapping tracks are less easily recognizable for computer vision techniques.


2021 ◽  
Vol 14 ◽  
Author(s):  
Hyojin Bae ◽  
Sang Jeong Kim ◽  
Chang-Eop Kim

One of the central goals in systems neuroscience is to understand how information is encoded in the brain, and the standard approach is to identify the relation between a stimulus and a neural response. However, the feature of a stimulus is typically defined by the researcher's hypothesis, which may cause biases in the research conclusion. To demonstrate potential biases, we simulate four likely scenarios using deep neural networks trained on the image classification dataset CIFAR-10 and demonstrate the possibility of selecting suboptimal/irrelevant features or overestimating the network feature representation/noise correlation. Additionally, we present studies investigating neural coding principles in biological neural networks to which our points can be applied. This study aims to not only highlight the importance of careful assumptions and interpretations regarding the neural response to stimulus features but also suggest that the comparative study between deep and biological neural networks from the perspective of machine learning can be an effective strategy for understanding the coding principles of the brain.


2021 ◽  
Author(s):  
Guangyuan Pan ◽  
Chen Qili ◽  
Fu Liping ◽  
Yu Ming ◽  
Muresan Matthew

Deep neural networks have been successfully used in many different areas of traffic engineering, such as crash prediction, intelligent signal optimization and real-time road surface condition monitoring. The benefits of deep neural networks are often uniquely suited to solve certain problems and can offer improvements in performance when compared to traditional methods. In collision prediction, uncertainty estimation is a critical area that can benefit from their application, and accurate information on the reliability of a model’s predictions can increase public confidence in those models. Applications of deep neural networks to this problem that consider these effects have not been studied previously. This paper develops a Bayesian deep neural network for crash prediction and examines the reliability of the model based on three key methods: layer-wise greedy unsupervised learning, Bayesian regularization and adapted marginalization. An uncertainty equation for the model is also proposed for this domain for the first time. To test the performance, eight years of car collision data collected from Highway 401, Canada, is used, and three experiments are designed.


Processes ◽  
2021 ◽  
Vol 9 (12) ◽  
pp. 2286
Author(s):  
Ammar Amjad ◽  
Lal Khan ◽  
Hsien-Tsung Chang

Recently, identifying speech emotions in a spontaneous database has been a complex and demanding study area. This research presents an entirely new approach for recognizing semi-natural and spontaneous speech emotions with multiple feature fusion and deep neural networks (DNN). A proposed framework extracts the most discriminative features from hybrid acoustic feature sets. However, these feature sets may contain duplicate and irrelevant information, leading to inadequate emotional identification. Therefore, an support vector machine (SVM) algorithm is utilized to identify the most discriminative audio feature map after obtaining the relevant features learned by the fusion approach. We investigated our approach utilizing the eNTERFACE05 and BAUM-1s benchmark databases and observed a significant identification accuracy of 76% for a speaker-independent experiment with SVM and 59% accuracy with, respectively. Furthermore, experiments on the eNTERFACE05 and BAUM-1s dataset indicate that the suggested framework outperformed current state-of-the-art techniques on the semi-natural and spontaneous datasets.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Lvxing Zhu ◽  
Haoran Zheng

Abstract Background Biomedical event extraction is a fundamental and in-demand technology that has attracted substantial interest from many researchers. Previous works have heavily relied on manual designed features and external NLP packages in which the feature engineering is large and complex. Additionally, most of the existing works use the pipeline process that breaks down a task into simple sub-tasks but ignores the interaction between them. To overcome these limitations, we propose a novel event combination strategy based on hybrid deep neural networks to settle the task in a joint end-to-end manner. Results We adapted our method to several annotated corpora of biomedical event extraction tasks. Our method achieved state-of-the-art performance with noticeable overall F1 score improvement compared to that of existing methods for all of these corpora. Conclusions The experimental results demonstrated that our method is effective for biomedical event extraction. The combination strategy can reconstruct complex events from the output of deep neural networks, while the deep neural networks effectively capture the feature representation from the raw text. The biomedical event extraction implementation is available online at http://www.predictor.xin/event_extraction.


Sign in / Sign up

Export Citation Format

Share Document