MKL-GRNI: A parallel multiple kernel learning approach for supervised inference of large-scale gene regulatory networks

PeerJ Computer Science ◽

10.7717/peerj-cs.363 ◽

2021 ◽

Vol 7 ◽

pp. e363

Author(s):

Nisar Wani ◽

Khalid Raza

Keyword(s):

Large Scale ◽

Multiple Kernel Learning ◽

Genomic Data ◽

Classification Problem ◽

Kernel Learning ◽

Biological Knowledge ◽

Inference Models ◽

Multiple Kernel ◽

Heterogeneous Datasets ◽

Gene Regulatory

High throughput multi-omics data generation coupled with heterogeneous genomic data fusion are defining new ways to build computational inference models. These models are scalable and can support very large genome sizes with the added advantage of exploiting additional biological knowledge from the integration framework. However, the limitation with such an arrangement is the huge computational cost involved when learning from very large datasets in a sequential execution environment. To overcome this issue, we present a multiple kernel learning (MKL) based gene regulatory network (GRN) inference approach wherein multiple heterogeneous datasets are fused using MKL paradigm. We formulate the GRN learning problem as a supervised classification problem, whereby genes regulated by a specific transcription factor are separated from other non-regulated genes. A parallel execution architecture is devised to learn a large scale GRN by decomposing the initial classification problem into a number of subproblems that run as multiple processes on a multi-processor machine. We evaluate the approach in terms of increased speedup and inference potential using genomic data from Escherichia coli, Saccharomyces cerevisiae and Homo sapiens. The results thus obtained demonstrate that the proposed method exhibits better classification accuracy and enhanced speedup compared to other state-of-the-art methods while learning large scale GRNs from multiple and heterogeneous datasets.

Download Full-text

Modelling and Recognition of Protein Contact Networks by Multiple Kernel Learning and Dissimilarity Representations

Entropy ◽

10.3390/e22070794 ◽

2020 ◽

Vol 22 (7) ◽

pp. 794

Author(s):

Alessio Martino ◽

Enrico De Santis ◽

Alessandro Giuliani ◽

Antonello Rizzi

Keyword(s):

Knowledge Discovery ◽

Classification System ◽

Multiple Kernel Learning ◽

Classification Problem ◽

Kernel Functions ◽

Kernel Learning ◽

Biological Knowledge ◽

Training Procedure ◽

Kernel Weights ◽

Multiple Kernel

Multiple kernel learning is a paradigm which employs a properly constructed chain of kernel functions able to simultaneously analyse different data or different representations of the same data. In this paper, we propose an hybrid classification system based on a linear combination of multiple kernels defined over multiple dissimilarity spaces. The core of the training procedure is the joint optimisation of kernel weights and representatives selection in the dissimilarity spaces. This equips the system with a two-fold knowledge discovery phase: by analysing the weights, it is possible to check which representations are more suitable for solving the classification problem, whereas the pivotal patterns selected as representatives can give further insights on the modelled system, possibly with the help of field-experts. The proposed classification system is tested on real proteomic data in order to predict proteins’ functional role starting from their folded structure: specifically, a set of eight representations are drawn from the graph-based protein folded description. The proposed multiple kernel-based system has also been benchmarked against a clustering-based classification system also able to exploit multiple dissimilarities simultaneously. Computational results show remarkable classification capabilities and the knowledge discovery analysis is in line with current biological knowledge, suggesting the reliability of the proposed system.

Download Full-text

Computational and Comparative Study on Multiple Kernel Learning Approaches for the Classification Problem of Alzheimer’s Disease

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering - Nature of Computation and Communication ◽

10.1007/978-3-319-46909-6_30 ◽

2016 ◽

pp. 334-341

Author(s):

Ahlam Mallak ◽

◽

Jeonghwan Gwak ◽

Jong-In Song ◽

Sang-Woong Lee

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Comparative Study ◽

Multiple Kernel Learning ◽

Classification Problem ◽

Kernel Learning ◽

Learning Approaches ◽

Multiple Kernel

Download Full-text

Multi-Nyström Method Based on Multiple Kernel Learning for Large Scale Imbalanced Classification

Computational Intelligence and Neuroscience ◽

10.1155/2021/9911871 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Ling Wang ◽

Hongqiao Wang ◽

Guangyuan Fu

Keyword(s):

Kernel Methods ◽

Large Scale ◽

Multiple Kernel Learning ◽

Nonlinear Problems ◽

Low Rank ◽

Kernel Learning ◽

Low Rank Approximation ◽

Nyström Method ◽

Nystrom Method ◽

Multiple Kernel

Extensions of kernel methods for the class imbalance problems have been extensively studied. Although they work well in coping with nonlinear problems, the high computation and memory costs severely limit their application to real-world imbalanced tasks. The Nyström method is an effective technique to scale kernel methods. However, the standard Nyström method needs to sample a sufficiently large number of landmark points to ensure an accurate approximation, which seriously affects its efficiency. In this study, we propose a multi-Nyström method based on mixtures of Nyström approximations to avoid the explosion of subkernel matrix, whereas the optimization to mixture weights is embedded into the model training process by multiple kernel learning (MKL) algorithms to yield more accurate low-rank approximation. Moreover, we select subsets of landmark points according to the imbalance distribution to reduce the model’s sensitivity to skewness. We also provide a kernel stability analysis of our method and show that the model solution error is bounded by weighted approximate errors, which can help us improve the learning process. Extensive experiments on several large scale datasets show that our method can achieve a higher classification accuracy and a dramatical speedup of MKL algorithms.

Download Full-text

A multitask multiple kernel learning formulation for discriminating early- and late-stage cancers

Bioinformatics ◽

10.1093/bioinformatics/btaa168 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3766-3772 ◽

Cited By ~ 1

Author(s):

Arezou Rahimi ◽

Mehmet Gönen

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Late Stage ◽

Multiple Kernel Learning ◽

Genomic Data ◽

Kernel Learning ◽

Disease Stage ◽

Support Vector ◽

Genomic Information ◽

Multiple Kernel

Abstract Motivation Genomic information is increasingly being used in diagnosis, prognosis and treatment of cancer. The severity of the disease is usually measured by the tumor stage. Therefore, identifying pathways playing an important role in progression of the disease stage is of great interest. Given that there are similarities in the underlying mechanisms of different cancers, in addition to the considerable correlation in the genomic data, there is a need for machine learning methods that can take these aspects of genomic data into account. Furthermore, using machine learning for studying multiple cancer cohorts together with a collection of molecular pathways creates an opportunity for knowledge extraction. Results We studied the problem of discriminating early- and late-stage tumors of several cancers using genomic information while enforcing interpretability on the solutions. To this end, we developed a multitask multiple kernel learning (MTMKL) method with a co-clustering step based on a cutting-plane algorithm to identify the relationships between the input tasks and kernels. We tested our algorithm on 15 cancer cohorts and observed that, in most cases, MTMKL outperforms other algorithms (including random forests, support vector machine and single-task multiple kernel learning) in terms of predictive power. Using the aggregate results from multiple replications, we also derived similarity matrices between cancer cohorts, which are, in many cases, in agreement with available relationships reported in the relevant literature. Availability and implementation Our implementations of support vector machine and multiple kernel learning algorithms in R are available at https://github.com/arezourahimi/mtgsbc together with the scripts that replicate the reported experiments. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Multiple-kernel learning for genomic data mining and prediction

BMC Bioinformatics ◽

10.1186/s12859-019-2992-1 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 10

Author(s):

Christopher M. Wilson ◽

Kaiqiao Li ◽

Xiaoqing Yu ◽

Pei-Fen Kuan ◽

Xuefeng Wang

Keyword(s):

Data Mining ◽

Multiple Kernel Learning ◽

Genomic Data ◽

Kernel Learning ◽

Multiple Kernel ◽

Genomic Data Mining

Download Full-text

CLASS-PAIR-GUIDED MULTIPLE KERNEL LEARNING OF INTEGRATING HETEROGENEOUS FEATURES FOR CLASSIFICATION

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-3-w3-195-2017 ◽

2017 ◽

Vol XLII-3/W3 ◽

pp. 195-200

Author(s):

Q. Wang ◽

Y. Gu ◽

T. Liu ◽

H. Liu ◽

X. Jin

Keyword(s):

Multiple Kernel Learning ◽

Real Data ◽

Classification Problem ◽

Kernel Learning ◽

Data Sets ◽

Specific Class ◽

Multiple Kernel ◽

Remote Sensing Image Classification ◽

Heterogeneous Features ◽

Class Pair

In recent years, many studies on remote sensing image classification have shown that using multiple features from different data sources can effectively improve the classification accuracy. As a very powerful means of learning, multiple kernel learning (MKL) can conveniently be embedded in a variety of characteristics. The conventional combined kernel learned by MKL can be regarded as the compromise of all basic kernels for all classes in classification. It is the best of the whole, but not optimal for each specific class. For this problem, this paper proposes a class-pair-guided MKL method to integrate the heterogeneous features (HFs) from multispectral image (MSI) and light detection and ranging (LiDAR) data. In particular, the <q>one-against-one</q> strategy is adopted, which converts multiclass classification problem to a plurality of two-class classification problem. Then, we select the best kernel from pre-constructed basic kernels set for each class-pair by kernel alignment (KA) in the process of classification. The advantage of the proposed method is that only the best kernel for the classification of any two classes can be retained, which leads to greatly enhanced discriminability. Experiments are conducted on two real data sets, and the experimental results show that the proposed method achieves the best performance in terms of classification accuracies in integrating the HFs for classification when compared with several state-of-the-art algorithms.

Download Full-text

Bilinear Formulated Multiple Kernel Learning for Multi-class Classification Problem

Lecture Notes in Computer Science - Neural Information Processing. Models and Applications ◽

10.1007/978-3-642-17534-3_13 ◽

2010 ◽

pp. 99-107

Author(s):

Takumi Kobayashi ◽

Nobuyuki Otsu

Keyword(s):

Multiple Kernel Learning ◽

Classification Problem ◽

Kernel Learning ◽

Multiple Kernel ◽

Multi Class Classification

Download Full-text

Integrating genomic and infrared spectral data improves the prediction of milk protein composition in dairy cattle

Genetics Selection Evolution ◽

10.1186/s12711-021-00620-7 ◽

2021 ◽

Vol 53 (1) ◽

Author(s):

Toshimi Baba ◽

Sara Pegolo ◽

Lucio F. M. Mota ◽

Francisco Peñagaricano ◽

Giovanni Bittante ◽

...

Keyword(s):

Spectral Data ◽

Whey Protein ◽

Milk Protein ◽

Multiple Kernel Learning ◽

Genomic Data ◽

Ftir Spectra ◽

Kernel Learning ◽

Pedigree Data ◽

Multiple Kernel ◽

On Farm

Abstract Background Over the past decade, Fourier transform infrared (FTIR) spectroscopy has been used to predict novel milk protein phenotypes. Genomic data might help predict these phenotypes when integrated with milk FTIR spectra. The objective of this study was to investigate prediction accuracy for milk protein phenotypes when heterogeneous on-farm, genomic, and pedigree data were integrated with the spectra. To this end, we used the records of 966 Italian Brown Swiss cows with milk FTIR spectra, on-farm information, medium-density genetic markers, and pedigree data. True and total whey protein, and five casein, and two whey protein traits were analyzed. Multiple kernel learning constructed from spectral and genomic (pedigree) relationship matrices and multilayer BayesB assigning separate priors for FTIR and markers were benchmarked against a baseline partial least squares (PLS) regression. Seven combinations of covariates were considered, and their predictive abilities were evaluated by repeated random sub-sampling and herd cross-validations (CV). Results Addition of the on-farm effects such as herd, days in milk, and parity to spectral data improved predictions as compared to those obtained using the spectra alone. Integrating genomics and/or the top three markers with a large effect further enhanced the predictions. Pedigree data also improved prediction, but to a lesser extent than genomic data. Multiple kernel learning and multilayer BayesB increased predictive performance, whereas PLS did not. Overall, multilayer BayesB provided better predictions than multiple kernel learning, and lower prediction performance was observed in herd CV compared to repeated random sub-sampling CV. Conclusions Integration of genomic information with milk FTIR spectral can enhance milk protein trait predictions by 25% and 7% on average for repeated random sub-sampling and herd CV, respectively. Multiple kernel learning and multilayer BayesB outperformed PLS when used to integrate heterogeneous data for phenotypic predictions.

Download Full-text