KCML
            : a machine‐learning framework for inference of multi‐scale gene functions from genetic perturbation screens

Heba Z Sailem; Jens Rittscher; Lucas Pelkmans

doi:10.15252/msb.20199083

KDML: a machine-learning framework for inference of multi-scale gene functions from genetic perturbation screens

10.1101/761106 ◽

2019 ◽

Cited By ~ 1

Author(s):

Heba Z. Sailem ◽

Jens Rittscher ◽

Lucas Pelkmans

Keyword(s):

Colorectal Cancer ◽

Machine Learning ◽

Large Scale ◽

Ad Hoc ◽

Olfactory Receptors ◽

Functional Enrichment ◽

Learning Framework ◽

Gene Functions ◽

Health And Disease ◽

Colorectal Cancer Patients

AbstractCharacterising context-dependent gene functions is crucial for understanding the genetic bases of health and disease. To date, inference of gene functions from large-scale genetic perturbation screens is based on ad-hoc analysis pipelines involving unsupervised clustering and functional enrichment. We present Knowledge-Driven Machine Learning (KDML), a framework that systematically predicts multiple functions for a given gene based on the similarity of its perturbation phenotype to those with known function. As proof of concept, we test KDML on three datasets describing phenotypes at the molecular, cellular and population levels, and show that it outperforms traditional analysis pipelines. In particular, KDML identified an abnormal multicellular organisation phenotype associated with the depletion of olfactory receptors and TGFβ and WNT signalling genes in colorectal cancer cells. We validate these predictions in colorectal cancer patients and show that olfactory receptors expression is predictive of worse patient outcome. These results highlight KDML as a systematic framework for discovering novel scale-crossing and clinically relevant gene functions. KDML is highly generalizable and applicable to various large-scale genetic perturbation screens.

Download Full-text

6mAPred-MSFF: A Deep Learning Model for Predicting DNA N6-Methyladenine Sites across Species Based on a Multi-Scale Feature Fusion Mechanism

Applied Sciences ◽

10.3390/app11167731 ◽

2021 ◽

Vol 11 (16) ◽

pp. 7731

Author(s):

Rao Zeng ◽

Minghong Liao

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Prediction Accuracy ◽

Cross Validation ◽

Feature Fusion ◽

Experimental Comparison ◽

Scale Feature ◽

Learning Framework ◽

Multi Scale ◽

Fold Cross Validation

DNA methylation is one of the most extensive epigenetic modifications. DNA N6-methyladenine (6mA) plays a key role in many biology regulation processes. An accurate and reliable genome-wide identification of 6mA sites is crucial for systematically understanding its biological functions. Some machine learning tools can identify 6mA sites, but their limited prediction accuracy and lack of robustness limit their usability in epigenetic studies, which implies the great need of developing new computational methods for this problem. In this paper, we developed a novel computational predictor, namely the 6mAPred-MSFF, which is a deep learning framework based on a multi-scale feature fusion mechanism to identify 6mA sites across different species. In the predictor, we integrate the inverted residual block and multi-scale attention mechanism to build lightweight and deep neural networks. As compared to existing predictors using traditional machine learning, our deep learning framework needs no prior knowledge of 6mA or manually crafted sequence features and sufficiently capture better characteristics of 6mA sites. By benchmarking comparison, our deep learning method outperforms the state-of-the-art methods on the 5-fold cross-validation test on the seven datasets of six species, demonstrating that the proposed 6mAPred-MSFF is more effective and generic. Specifically, our proposed 6mAPred-MSFF gives the sensitivity and specificity of the 5-fold cross-validation on the 6mA-rice-Lv dataset as 97.88% and 94.64%, respectively. Our model trained with the rice data predicts well the 6mA sites of other five species: Arabidopsis thaliana, Fragaria vesca, Rosa chinensis, Homo sapiens, and Drosophila melanogaster with a prediction accuracy 98.51%, 93.02%, and 91.53%, respectively. Moreover, via experimental comparison, we explored performance impact by training and testing our proposed model under different encoding schemes and feature descriptors.

Download Full-text

DeepEP: a deep learning framework for identifying essential proteins

BMC Bioinformatics ◽

10.1186/s12859-019-3076-y ◽

2019 ◽

Vol 20 (S16) ◽

Cited By ~ 9

Author(s):

Min Zeng ◽

Min Li ◽

Fang-Xiang Wu ◽

Yaohang Li ◽

Yi Pan

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Convolutional Neural Networks ◽

Sampling Method ◽

Ppi Network ◽

Essential Proteins ◽

Learning Framework ◽

Multi Scale ◽

Topological Features

Abstract Background Essential proteins are crucial for cellular life and thus, identification of essential proteins is an important topic and a challenging problem for researchers. Recently lots of computational approaches have been proposed to handle this problem. However, traditional centrality methods cannot fully represent the topological features of biological networks. In addition, identifying essential proteins is an imbalanced learning problem; but few current shallow machine learning-based methods are designed to handle the imbalanced characteristics. Results We develop DeepEP based on a deep learning framework that uses the node2vec technique, multi-scale convolutional neural networks and a sampling technique to identify essential proteins. In DeepEP, the node2vec technique is applied to automatically learn topological and semantic features for each protein in protein-protein interaction (PPI) network. Gene expression profiles are treated as images and multi-scale convolutional neural networks are applied to extract their patterns. In addition, DeepEP uses a sampling method to alleviate the imbalanced characteristics. The sampling method samples the same number of the majority and minority samples in a training epoch, which is not biased to any class in training process. The experimental results show that DeepEP outperforms traditional centrality methods. Moreover, DeepEP is better than shallow machine learning-based methods. Detailed analyses show that the dense vectors which are generated by node2vec technique contribute a lot to the improved performance. It is clear that the node2vec technique effectively captures the topological and semantic properties of PPI network. The sampling method also improves the performance of identifying essential proteins. Conclusion We demonstrate that DeepEP improves the prediction performance by integrating multiple deep learning techniques and a sampling method. DeepEP is more effective than existing methods.

Download Full-text

Cellular intelligence: dynamic specialization through non-equilibrium multi-scale compartmentalization

10.1101/2021.06.25.449951 ◽

2021 ◽

Author(s):

Rémy V Tuyéras ◽

Leandro Z Agudelo ◽

Soumya P Ram ◽

Anjanet R Loon ◽

Burak Kutlu ◽

...

Keyword(s):

Machine Learning ◽

Data Mining ◽

Learning Algorithm ◽

Living Systems ◽

Emergent Properties ◽

New Approach ◽

Learning Framework ◽

Multi Scale ◽

Reference Machine ◽

Non Equilibrium

Intelligence is usually associated with the ability to perceive, retain and use information to adapt to changes in one's environment. In this context, systems of living cells can be thought of as intelligent entities. Here, we show that the concepts of non-equilibrium tuning and compartmentalization are sufficient to model manifestations of cellular intelligence such as specialization, division, fusion and communication using the language of operads. We implement our framework as an unsupervised learning algorithm, IntCyt, which we show is able to memorize, organize and abstract reference machine-learning datasets through generative and self-supervised tasks. Overall, our learning framework captures emergent properties programmed in living systems, and provides a powerful new approach for data mining. to memorize, organize and abstract reference machine-learning datasets through generative and self-supervised tasks. Overall, our learning framework captures emergent properties programmed in living systems, and provides a powerful new approach for data mining.

Download Full-text

Hexagonal Image Processing in the Context of Machine Learning: Conception of a Biologically Inspired Hexagonal Deep Learning Framework

2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA) ◽

10.1109/icmla.2019.00300 ◽

2019 ◽

Cited By ~ 1

Author(s):

Tobias Schlosser ◽

Michael Friedrich ◽

Danny Kowerko

Keyword(s):

Machine Learning ◽

Image Processing ◽

Deep Learning ◽

Biologically Inspired ◽

Learning Framework ◽

Learning Conception ◽

Hexagonal Image Processing

Download Full-text

Non-Intrusive Parametric Model Order Reduction with Error Correction Modeling for Changing Well Locations Using a Machine Learning Framework

10.2118/199042-ms ◽

2020 ◽

Author(s):

Hardikkumar Zalavadia ◽

Eduardo Gildin

Keyword(s):

Machine Learning ◽

Error Correction ◽

Model Order Reduction ◽

Order Reduction ◽

Parametric Model ◽

Model Order ◽

Parametric Model Order Reduction ◽

Learning Framework ◽

Error Correction Modeling

Download Full-text

Precipitation Modeling for Extreme Weather Based on Sparse Hybrid Machine Learning and Markov Chain Random Field in a Multi-Scale Subspace

Water ◽

10.3390/w13091241 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1241

Author(s):

Ming-Hsi Lee ◽

Yenming J. Chen

Keyword(s):

Machine Learning ◽

Markov Chain ◽

Random Field ◽

Long Range ◽

Weather Conditions ◽

Extreme Weather ◽

Prediction Algorithm ◽

Learning Method ◽

Multi Scale ◽

Hybrid Machine

This paper proposes to apply a Markov chain random field conditioning method with a hybrid machine learning method to provide long-range precipitation predictions under increasingly extreme weather conditions. Existing precipitation models are limited in time-span, and long-range simulations cannot predict rainfall distribution for a specific year. This paper proposes a hybrid (ensemble) learning method to perform forecasting on a multi-scaled, conditioned functional time series over a sparse l1 space. Therefore, on the basis of this method, a long-range prediction algorithm is developed for applications, such as agriculture or construction works. Our findings show that the conditioning method and multi-scale decomposition in the parse space l1 are proved useful in resisting statistical variation due to increasingly extreme weather conditions. Because the predictions are year-specific, we verify our prediction accuracy for the year we are interested in, but not for other years.

Download Full-text

SCOUR: a stepwise machine learning framework for predicting metabolite-dependent regulatory interactions

BMC Bioinformatics ◽

10.1186/s12859-021-04281-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Justin Y. Lee ◽

Britney Nguyen ◽

Carlos Orosco ◽

Mark P. Styczynski

Keyword(s):

Machine Learning ◽

Metabolic Networks ◽

Sampling Frequency ◽

Low Noise ◽

Training Data ◽

High Noise ◽

Regulatory Interactions ◽

Learning Framework ◽

Metabolic Systems ◽

Noise Data

Abstract Background The topology of metabolic networks is both well-studied and remarkably well-conserved across many species. The regulation of these networks, however, is much more poorly characterized, though it is known to be divergent across organisms—two characteristics that make it difficult to model metabolic networks accurately. While many computational methods have been built to unravel transcriptional regulation, there have been few approaches developed for systems-scale analysis and study of metabolic regulation. Here, we present a stepwise machine learning framework that applies established algorithms to identify regulatory interactions in metabolic systems based on metabolic data: stepwise classification of unknown regulation, or SCOUR. Results We evaluated our framework on both noiseless and noisy data, using several models of varying sizes and topologies to show that our approach is generalizable. We found that, when testing on data under the most realistic conditions (low sampling frequency and high noise), SCOUR could identify reaction fluxes controlled only by the concentration of a single metabolite (its primary substrate) with high accuracy. The positive predictive value (PPV) for identifying reactions controlled by the concentration of two metabolites ranged from 32 to 88% for noiseless data, 9.2 to 49% for either low sampling frequency/low noise or high sampling frequency/high noise data, and 6.6–27% for low sampling frequency/high noise data, with results typically sufficiently high for lab validation to be a practical endeavor. While the PPVs for reactions controlled by three metabolites were lower, they were still in most cases significantly better than random classification. Conclusions SCOUR uses a novel approach to synthetically generate the training data needed to identify regulators of reaction fluxes in a given metabolic system, enabling metabolomics and fluxomics data to be leveraged for regulatory structure inference. By identifying and triaging the most likely candidate regulatory interactions, SCOUR can drastically reduce the amount of time needed to identify and experimentally validate metabolic regulatory interactions. As high-throughput experimental methods for testing these interactions are further developed, SCOUR will provide critical impact in the development of predictive metabolic models in new organisms and pathways.

Download Full-text

An Efficient Machine Learning Framework for Stress Prediction via Sensor Integrated Keyboard Data

IEEE Access ◽

10.1109/access.2021.3094334 ◽

2021 ◽

pp. 1-1

Author(s):

P.B. Pankajavalli ◽

G.S. Karthick ◽

R. Sakthivel

Keyword(s):

Machine Learning ◽

Learning Framework ◽

Stress Prediction ◽

Efficient Machine

Download Full-text

A digital-twin and machine-learning framework for the design of multiobjective agrophotovoltaic solar farms

Computational Mechanics ◽

10.1007/s00466-021-02035-z ◽

2021 ◽

Author(s):

T. I. Zohdi

Keyword(s):

Machine Learning ◽

Digital Twin ◽

Learning Framework

Download Full-text

KCML : a machine‐learning framework for inference of multi‐scale gene functions from genetic perturbation screens

KDML: a machine-learning framework for inference of multi-scale gene functions from genetic perturbation screens

6mAPred-MSFF: A Deep Learning Model for Predicting DNA N6-Methyladenine Sites across Species Based on a Multi-Scale Feature Fusion Mechanism

DeepEP: a deep learning framework for identifying essential proteins

Cellular intelligence: dynamic specialization through non-equilibrium multi-scale compartmentalization

Hexagonal Image Processing in the Context of Machine Learning: Conception of a Biologically Inspired Hexagonal Deep Learning Framework

Non-Intrusive Parametric Model Order Reduction with Error Correction Modeling for Changing Well Locations Using a Machine Learning Framework

Precipitation Modeling for Extreme Weather Based on Sparse Hybrid Machine Learning and Markov Chain Random Field in a Multi-Scale Subspace

SCOUR: a stepwise machine learning framework for predicting metabolite-dependent regulatory interactions

An Efficient Machine Learning Framework for Stress Prediction via Sensor Integrated Keyboard Data

A digital-twin and machine-learning framework for the design of multiobjective agrophotovoltaic solar farms