Genetic variant effect prediction by supervised nonnegative matrix tri-factorization

2021 ◽  
Author(s):  
Asieh Amousoltani Arani ◽  
Mohammadreza Sehhati ◽  
Mohammad Amin Tabatabaiefar

A new feature space, which can discriminate deleterious variants, was constructed by the integration of various input data using the proposed supervised nonnegative matrix tri-factorization (sNMTF) algorithm.

2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Philipp Rentzsch ◽  
Max Schubach ◽  
Jay Shendure ◽  
Martin Kircher

Abstract Background Splicing of genomic exons into mRNAs is a critical prerequisite for the accurate synthesis of human proteins. Genetic variants impacting splicing underlie a substantial proportion of genetic disease, but are challenging to identify beyond those occurring at donor and acceptor dinucleotides. To address this, various methods aim to predict variant effects on splicing. Recently, deep neural networks (DNNs) have been shown to achieve better results in predicting splice variants than other strategies. Methods It has been unclear how best to integrate such process-specific scores into genome-wide variant effect predictors. Here, we use a recently published experimental data set to compare several machine learning methods that score variant effects on splicing. We integrate the best of those approaches into general variant effect prediction models and observe the effect on classification of known pathogenic variants. Results We integrate two specialized splicing scores into CADD (Combined Annotation Dependent Depletion; cadd.gs.washington.edu), a widely used tool for genome-wide variant effect prediction that we previously developed to weight and integrate diverse collections of genomic annotations. With this new model, CADD-Splice, we show that inclusion of splicing DNN effect scores substantially improves predictions across multiple variant categories, without compromising overall performance. Conclusions While splice effect scores show superior performance on splice variants, specialized predictors cannot compete with other variant scores in general variant interpretation, as the latter account for nonsense and missense effects that do not alter splicing. Although only shown here for splice scores, we believe that the applied approach will generalize to other specific molecular processes, providing a path for the further improvement of genome-wide variant effect prediction.


Author(s):  
I.O. KOZLOV

The article is devoted to the development of laser Doppler flowmetry and analysis of the recorded signal to study the distribution of perfusion over the frequencies of Doppler broadening of laser radiation. The processing algorithm and the necessary technical conditions for the correct registration of the signal are shown. As examples of the proposed method implementation, the data are obtained from a healthy volunteer and a patient with diabetes mellitus type 2 and analyzed. According to the proposed method, processing of recorded data provides a new feature space for data analysis of laser Doppler flowmetry signal.


2021 ◽  
pp. 159-179
Author(s):  
Ariel José Berenstein ◽  
Franco Gino Brunello ◽  
Adrian Turjanski ◽  
Marcelo A. Martì

2019 ◽  
Vol 2019 ◽  
pp. 1-11
Author(s):  
Yuntao Zhao ◽  
Chunyu Xu ◽  
Bo Bo ◽  
Yongxin Feng

The increasing sophistication of malware variants such as encryption, polymorphism, and obfuscation calls for the new detection and classification technology. In this paper, MalDeep, a novel malware classification framework of deep learning based on texture visualization, is proposed against malicious variants. Through code mapping, texture partitioning, and texture extracting, we can study malware classification in a new feature space of image texture representation without decryption and disassembly. Furthermore, we built a malware classifier on convolutional neural network with two convolutional layers, two downsampling layers, and many full connection layers. We adopt the dataset, from Microsoft Malware Classification Challenge including 9 categories of malware families and 10868 variant samples, to train the model. The experiment results show that the established MalDeep has a higher accuracy rate for malware classification. In particular, for some backdoor families, the classification accuracy of the model reaches over 99%. Moreover, compared with other main antivirus software, MalDeep also outperforms others in the average accuracy for the variants from different families.


2014 ◽  
Vol 493 ◽  
pp. 337-342 ◽  
Author(s):  
Achmad Widodo ◽  
I. Haryanto ◽  
T. Prahasto

This paper deals with implementation of intelligent system for fault diagnostics of rolling element bearing. In this work, the proposed intelligent system was basically created using support vector machine (SVM) due to its excellent performance in classification task. Moreover, SVM was modified by introducing wavelet function as kernel for mapping input data into feature space. Input data were vibration signals acquired from bearings through standard data acquisition process. Statistical features were then calculated from bearing signals, and extraction of salient features was conducted using component analysis. Results of fault diagnostics are shown by observing classification of bearing conditions which gives plausible accuracy in testing of the proposed system.


Author(s):  
GULDEN UCHYIGIT ◽  
KEITH CLARK

Text classification is the problem of classifying a set of documents into a pre-defined set of classes. A major problem with text classification problems is the high dimensionality of the feature space. Only a small subset of these words are feature words which can be used in determining a document's class, while the rest adds noise and can make the results unreliable and significantly increase computational time. A common approach in dealing with this problem is feature selection where the number of words in the feature space are significantly reduced. In this paper we present the experiments of a comparative study of feature selection methods used for text classification. Ten feature selection methods were evaluated in this study including the new feature selection method, called the GU metric. The other feature selection methods evaluated in this study are: Chi-Squared (χ2) statistic, NGL coefficient, GSS coefficient, Mutual Information, Information Gain, Odds Ratio, Term Frequency, Fisher Criterion, BSS/WSS coefficient. The experimental evaluations show that the GU metric obtained the best F1 and F2 scores. The experiments were performed on the 20 Newsgroups data sets with the Naive Bayesian Probabilistic Classifier.


2021 ◽  
Author(s):  
Ha Young Kim ◽  
Woosung Jeon ◽  
Dongsup Kim

Abstract The development of an accurate and reliable variant effect prediction tool is important for research in human genetic diseases. A large number of predictors have been developed towards this goal, yet many of these predictors suffer from the problem of data circularity. Here we present MTBAN (Mutation effect predictor using the Temporal convolutional network and the Born-Again Networks), a method for predicting the deleteriousness of variants. We apply a form of knowledge distillation technique known as the Born-Again Networks (BAN) to a previously developed deep autoregressive generative model, mutationTCN, to achieve an improved performance in variant effect prediction. As the model is fully unsupervised and trained only on the evolutionarily related sequences of a protein, it does not suffer from the problem of data circularity which is common across supervised predictors. When evaluated on a test dataset consisting of deleterious and benign human protein variants, MTBAN shows an outstanding predictive ability compared to other well-known variant effect predictors. We also offer a user-friendly web server to predict variant effects using MTBAN, freely accessible at http://mtban.kaist.ac.kr. To our knowledge, MTBAN is the first variant effect prediction tool based on a deep generative model that provides a user-friendly web server for the prediction of deleteriousness of variants.


Entropy ◽  
2021 ◽  
Vol 23 (10) ◽  
pp. 1316
Author(s):  
Kuiyong Song ◽  
Lianke Zhou ◽  
Hongbin Wang

Vigilance estimation of drivers is a hot research field of current traffic safety. Wearable devices can monitor information regarding the driver’s state in real time, which is then analyzed by a data analysis model to provide an estimation of vigilance. The accuracy of the data analysis model directly affects the effect of vigilance estimation. In this paper, we propose a deep coupling recurrent auto-encoder (DCRA) that combines electroencephalography (EEG) and electrooculography (EOG). This model uses a coupling layer to connect two single-modal auto-encoders to construct a joint objective loss function optimization model, which consists of single-modal loss and multi-modal loss. The single-modal loss is measured by Euclidean distance, and the multi-modal loss is measured by a Mahalanobis distance of metric learning, which can effectively reflect the distance between different modal data so that the distance between different modes can be described more accurately in the new feature space based on the metric matrix. In order to ensure gradient stability in the long sequence learning process, a multi-layer gated recurrent unit (GRU) auto-encoder model was adopted. The DCRA integrates data feature extraction and feature fusion. Relevant comparative experiments show that the DCRA is better than the single-modal method and the latest multi-modal fusion. The DCRA has a lower root mean square error (RMSE) and a higher Pearson correlation coefficient (PCC).


Sign in / Sign up

Export Citation Format

Share Document