Genetic variant effect prediction by supervised nonnegative matrix tri-factorization

CADD-Splice—improving genome-wide variant effect prediction using deep learning-derived splice scores

Genome Medicine ◽

10.1186/s13073-021-00835-9 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Philipp Rentzsch ◽

Max Schubach ◽

Jay Shendure ◽

Martin Kircher

Keyword(s):

Prediction Models ◽

Splice Variants ◽

Superior Performance ◽

Data Set ◽

Pathogenic Variants ◽

Genome Wide ◽

Donor And Acceptor ◽

Human Proteins ◽

Variant Effect ◽

Variant Effect Prediction

Abstract Background Splicing of genomic exons into mRNAs is a critical prerequisite for the accurate synthesis of human proteins. Genetic variants impacting splicing underlie a substantial proportion of genetic disease, but are challenging to identify beyond those occurring at donor and acceptor dinucleotides. To address this, various methods aim to predict variant effects on splicing. Recently, deep neural networks (DNNs) have been shown to achieve better results in predicting splice variants than other strategies. Methods It has been unclear how best to integrate such process-specific scores into genome-wide variant effect predictors. Here, we use a recently published experimental data set to compare several machine learning methods that score variant effects on splicing. We integrate the best of those approaches into general variant effect prediction models and observe the effect on classification of known pathogenic variants. Results We integrate two specialized splicing scores into CADD (Combined Annotation Dependent Depletion; cadd.gs.washington.edu), a widely used tool for genome-wide variant effect prediction that we previously developed to weight and integrate diverse collections of genomic annotations. With this new model, CADD-Splice, we show that inclusion of splicing DNN effect scores substantially improves predictions across multiple variant categories, without compromising overall performance. Conclusions While splice effect scores show superior performance on splice variants, specialized predictors cannot compete with other variant scores in general variant interpretation, as the latter account for nonsense and missense effects that do not alter splicing. Although only shown here for splice scores, we believe that the applied approach will generalize to other specific molecular processes, providing a path for the further improvement of genome-wide variant effect prediction.

Download Full-text

METHOD FOR ANALYSIS OF THE DOPPLER-BROADENED SPECTRA OF LASER RADIATION FOR THE ASSESSMENT OF BLOOD MICROCIRCULATION DISTURBANCES IN DIABETES MELLITUS TYPE 2

Fundamental and Applied Problems of Engineering and Technology ◽

10.33979/2073-7408-2020-342-4-1-80-87 ◽

2020 ◽

Vol 4 (1) ◽

pp. 80-87

Author(s):

I.O. KOZLOV

Keyword(s):

Diabetes Mellitus ◽

Laser Radiation ◽

Laser Doppler Flowmetry ◽

Diabetes Mellitus Type 2 ◽

Feature Space ◽

Diabetes Mellitus Type ◽

Laser Doppler ◽

Doppler Flowmetry ◽

New Feature

The article is devoted to the development of laser Doppler flowmetry and analysis of the recorded signal to study the distribution of perfusion over the frequencies of Doppler broadening of laser radiation. The processing algorithm and the necessary technical conditions for the correct registration of the signal are shown. As examples of the proposed method implementation, the data are obtained from a healthy volunteer and a patient with diabetes mellitus type 2 and analyzed. According to the proposed method, processing of recorded data provides a new feature space for data analysis of laser Doppler flowmetry signal.

Download Full-text

Erratum to: Evotuning protocols for Transformer-based variant effect prediction on multi-domain proteins

Briefings in Bioinformatics ◽

10.1093/bib/bbab287 ◽

2021 ◽

Author(s):

Hideki Yamaguchi ◽

Yutaka Saito

Keyword(s):

Variant Effect ◽

Variant Effect Prediction

Download Full-text

Bioinformatics of Variant Effect Prediction

10.1201/9781003005926-8 ◽

2021 ◽

pp. 159-179

Author(s):

Ariel José Berenstein ◽

Franco Gino Brunello ◽

Adrian Turjanski ◽

Marcelo A. Martì

Keyword(s):

Variant Effect ◽

Variant Effect Prediction

Download Full-text

Nonnegative Matrix Factorization in Polynomial Feature Space

IEEE Transactions on Neural Networks ◽

10.1109/tnn.2008.2000162 ◽

2008 ◽

Vol 19 (6) ◽

pp. 1090-1100 ◽

Cited By ~ 51

Author(s):

I. Buciu ◽

N. Nikolaidis ◽

I. Pitas

Keyword(s):

Matrix Factorization ◽

Nonnegative Matrix Factorization ◽

Nonnegative Matrix ◽

Feature Space

Download Full-text

MalDeep: A Deep Learning Classification Framework against Malware Variants Based on Texture Visualization

Security and Communication Networks ◽

10.1155/2019/4895984 ◽

2019 ◽

Vol 2019 ◽

pp. 1-11

Author(s):

Yuntao Zhao ◽

Chunyu Xu ◽

Bo Bo ◽

Yongxin Feng

Keyword(s):

Deep Learning ◽

Classification Accuracy ◽

Feature Space ◽

Image Texture ◽

Accuracy Rate ◽

Texture Representation ◽

Malware Classification ◽

Classification Framework ◽

Average Accuracy ◽

New Feature

The increasing sophistication of malware variants such as encryption, polymorphism, and obfuscation calls for the new detection and classification technology. In this paper, MalDeep, a novel malware classification framework of deep learning based on texture visualization, is proposed against malicious variants. Through code mapping, texture partitioning, and texture extracting, we can study malware classification in a new feature space of image texture representation without decryption and disassembly. Furthermore, we built a malware classifier on convolutional neural network with two convolutional layers, two downsampling layers, and many full connection layers. We adopt the dataset, from Microsoft Malware Classification Challenge including 9 categories of malware families and 10868 variant samples, to train the model. The experiment results show that the established MalDeep has a higher accuracy rate for malware classification. In particular, for some backdoor families, the classification accuracy of the model reaches over 99%. Moreover, compared with other main antivirus software, MalDeep also outperforms others in the average accuracy for the variants from different families.

Download Full-text

Intelligent Bearing Diagnostics Using Wavelet Support Vector Machine

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.493.337 ◽

2014 ◽

Vol 493 ◽

pp. 337-342 ◽

Cited By ~ 1

Author(s):

Achmad Widodo ◽

I. Haryanto ◽

T. Prahasto

Keyword(s):

Support Vector Machine ◽

Input Data ◽

Intelligent System ◽

Feature Space ◽

Rolling Element Bearing ◽

Support Vector ◽

Fault Diagnostics ◽

Standard Data ◽

Rolling Element

This paper deals with implementation of intelligent system for fault diagnostics of rolling element bearing. In this work, the proposed intelligent system was basically created using support vector machine (SVM) due to its excellent performance in classification task. Moreover, SVM was modified by introducing wavelet function as kernel for mapping input data into feature space. Input data were vibration signals acquired from bearings through standard data acquisition process. Statistical features were then calculated from bearing signals, and extraction of salient features was conducted using component analysis. Results of fault diagnostics are shown by observing classification of bearing conditions which gives plausible accuracy in testing of the proposed system.

Download Full-text

A NEW FEATURE SELECTION METHOD FOR TEXT CLASSIFICATION

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001407005466 ◽

2007 ◽

Vol 21 (02) ◽

pp. 423-438 ◽

Cited By ~ 9

Author(s):

GULDEN UCHYIGIT ◽

KEITH CLARK

Keyword(s):

Feature Selection ◽

Text Classification ◽

Information Gain ◽

Feature Selection Method ◽

Feature Space ◽

Selection Method ◽

Computational Time ◽

Small Subset ◽

Selection Methods ◽

New Feature

Text classification is the problem of classifying a set of documents into a pre-defined set of classes. A major problem with text classification problems is the high dimensionality of the feature space. Only a small subset of these words are feature words which can be used in determining a document's class, while the rest adds noise and can make the results unreliable and significantly increase computational time. A common approach in dealing with this problem is feature selection where the number of words in the feature space are significantly reduced. In this paper we present the experiments of a comparative study of feature selection methods used for text classification. Ten feature selection methods were evaluated in this study including the new feature selection method, called the GU metric. The other feature selection methods evaluated in this study are: Chi-Squared (χ2) statistic, NGL coefficient, GSS coefficient, Mutual Information, Information Gain, Odds Ratio, Term Frequency, Fisher Criterion, BSS/WSS coefficient. The experimental evaluations show that the GU metric obtained the best F1 and F2 scores. The experiments were performed on the 20 Newsgroups data sets with the Naive Bayesian Probabilistic Classifier.

Download Full-text

MTBAN: An Enhanced Variant Effect Predictor Based on a Deep Generative Model

10.21203/rs.3.rs-649705/v1 ◽

2021 ◽

Author(s):

Ha Young Kim ◽

Woosung Jeon ◽

Dongsup Kim

Keyword(s):

Genetic Diseases ◽

Web Server ◽

Predictive Ability ◽

Generative Model ◽

Prediction Tool ◽

Convolutional Network ◽

Variant Effect ◽

Born Again ◽

User Friendly ◽

Variant Effect Prediction

Abstract The development of an accurate and reliable variant effect prediction tool is important for research in human genetic diseases. A large number of predictors have been developed towards this goal, yet many of these predictors suffer from the problem of data circularity. Here we present MTBAN (Mutation effect predictor using the Temporal convolutional network and the Born-Again Networks), a method for predicting the deleteriousness of variants. We apply a form of knowledge distillation technique known as the Born-Again Networks (BAN) to a previously developed deep autoregressive generative model, mutationTCN, to achieve an improved performance in variant effect prediction. As the model is fully unsupervised and trained only on the evolutionarily related sequences of a protein, it does not suffer from the problem of data circularity which is common across supervised predictors. When evaluated on a test dataset consisting of deleterious and benign human protein variants, MTBAN shows an outstanding predictive ability compared to other well-known variant effect predictors. We also offer a user-friendly web server to predict variant effects using MTBAN, freely accessible at http://mtban.kaist.ac.kr. To our knowledge, MTBAN is the first variant effect prediction tool based on a deep generative model that provides a user-friendly web server for the prediction of deleteriousness of variants.

Download Full-text

Deep Coupling Recurrent Auto-Encoder with Multi-Modal EEG and EOG for Vigilance Estimation

Entropy ◽

10.3390/e23101316 ◽

2021 ◽

Vol 23 (10) ◽

pp. 1316

Author(s):

Kuiyong Song ◽

Lianke Zhou ◽

Hongbin Wang

Keyword(s):

Data Analysis ◽

Traffic Safety ◽

Feature Fusion ◽

Pearson Correlation ◽

Metric Learning ◽

Feature Space ◽

Research Field ◽

Function Optimization ◽

Analysis Model ◽

New Feature

Vigilance estimation of drivers is a hot research field of current traffic safety. Wearable devices can monitor information regarding the driver’s state in real time, which is then analyzed by a data analysis model to provide an estimation of vigilance. The accuracy of the data analysis model directly affects the effect of vigilance estimation. In this paper, we propose a deep coupling recurrent auto-encoder (DCRA) that combines electroencephalography (EEG) and electrooculography (EOG). This model uses a coupling layer to connect two single-modal auto-encoders to construct a joint objective loss function optimization model, which consists of single-modal loss and multi-modal loss. The single-modal loss is measured by Euclidean distance, and the multi-modal loss is measured by a Mahalanobis distance of metric learning, which can effectively reflect the distance between different modal data so that the distance between different modes can be described more accurately in the new feature space based on the metric matrix. In order to ensure gradient stability in the long sequence learning process, a multi-layer gated recurrent unit (GRU) auto-encoder model was adopted. The DCRA integrates data feature extraction and feature fusion. Relevant comparative experiments show that the DCRA is better than the single-modal method and the latest multi-modal fusion. The DCRA has a lower root mean square error (RMSE) and a higher Pearson correlation coefficient (PCC).

Download Full-text