scholarly journals MKL-GRNI: A parallel multiple kernel learning approach for supervised inference of large-scale gene regulatory networks

2021 ◽  
Vol 7 ◽  
pp. e363
Author(s):  
Nisar Wani ◽  
Khalid Raza

High throughput multi-omics data generation coupled with heterogeneous genomic data fusion are defining new ways to build computational inference models. These models are scalable and can support very large genome sizes with the added advantage of exploiting additional biological knowledge from the integration framework. However, the limitation with such an arrangement is the huge computational cost involved when learning from very large datasets in a sequential execution environment. To overcome this issue, we present a multiple kernel learning (MKL) based gene regulatory network (GRN) inference approach wherein multiple heterogeneous datasets are fused using MKL paradigm. We formulate the GRN learning problem as a supervised classification problem, whereby genes regulated by a specific transcription factor are separated from other non-regulated genes. A parallel execution architecture is devised to learn a large scale GRN by decomposing the initial classification problem into a number of subproblems that run as multiple processes on a multi-processor machine. We evaluate the approach in terms of increased speedup and inference potential using genomic data from Escherichia coli, Saccharomyces cerevisiae and Homo sapiens. The results thus obtained demonstrate that the proposed method exhibits better classification accuracy and enhanced speedup compared to other state-of-the-art methods while learning large scale GRNs from multiple and heterogeneous datasets.

Entropy ◽  
2020 ◽  
Vol 22 (7) ◽  
pp. 794
Author(s):  
Alessio Martino ◽  
Enrico De Santis ◽  
Alessandro Giuliani ◽  
Antonello Rizzi

Multiple kernel learning is a paradigm which employs a properly constructed chain of kernel functions able to simultaneously analyse different data or different representations of the same data. In this paper, we propose an hybrid classification system based on a linear combination of multiple kernels defined over multiple dissimilarity spaces. The core of the training procedure is the joint optimisation of kernel weights and representatives selection in the dissimilarity spaces. This equips the system with a two-fold knowledge discovery phase: by analysing the weights, it is possible to check which representations are more suitable for solving the classification problem, whereas the pivotal patterns selected as representatives can give further insights on the modelled system, possibly with the help of field-experts. The proposed classification system is tested on real proteomic data in order to predict proteins’ functional role starting from their folded structure: specifically, a set of eight representations are drawn from the graph-based protein folded description. The proposed multiple kernel-based system has also been benchmarked against a clustering-based classification system also able to exploit multiple dissimilarities simultaneously. Computational results show remarkable classification capabilities and the knowledge discovery analysis is in line with current biological knowledge, suggesting the reliability of the proposed system.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Ling Wang ◽  
Hongqiao Wang ◽  
Guangyuan Fu

Extensions of kernel methods for the class imbalance problems have been extensively studied. Although they work well in coping with nonlinear problems, the high computation and memory costs severely limit their application to real-world imbalanced tasks. The Nyström method is an effective technique to scale kernel methods. However, the standard Nyström method needs to sample a sufficiently large number of landmark points to ensure an accurate approximation, which seriously affects its efficiency. In this study, we propose a multi-Nyström method based on mixtures of Nyström approximations to avoid the explosion of subkernel matrix, whereas the optimization to mixture weights is embedded into the model training process by multiple kernel learning (MKL) algorithms to yield more accurate low-rank approximation. Moreover, we select subsets of landmark points according to the imbalance distribution to reduce the model’s sensitivity to skewness. We also provide a kernel stability analysis of our method and show that the model solution error is bounded by weighted approximate errors, which can help us improve the learning process. Extensive experiments on several large scale datasets show that our method can achieve a higher classification accuracy and a dramatical speedup of MKL algorithms.


2020 ◽  
Vol 36 (12) ◽  
pp. 3766-3772 ◽  
Author(s):  
Arezou Rahimi ◽  
Mehmet Gönen

Abstract Motivation Genomic information is increasingly being used in diagnosis, prognosis and treatment of cancer. The severity of the disease is usually measured by the tumor stage. Therefore, identifying pathways playing an important role in progression of the disease stage is of great interest. Given that there are similarities in the underlying mechanisms of different cancers, in addition to the considerable correlation in the genomic data, there is a need for machine learning methods that can take these aspects of genomic data into account. Furthermore, using machine learning for studying multiple cancer cohorts together with a collection of molecular pathways creates an opportunity for knowledge extraction. Results We studied the problem of discriminating early- and late-stage tumors of several cancers using genomic information while enforcing interpretability on the solutions. To this end, we developed a multitask multiple kernel learning (MTMKL) method with a co-clustering step based on a cutting-plane algorithm to identify the relationships between the input tasks and kernels. We tested our algorithm on 15 cancer cohorts and observed that, in most cases, MTMKL outperforms other algorithms (including random forests, support vector machine and single-task multiple kernel learning) in terms of predictive power. Using the aggregate results from multiple replications, we also derived similarity matrices between cancer cohorts, which are, in many cases, in agreement with available relationships reported in the relevant literature. Availability and implementation Our implementations of support vector machine and multiple kernel learning algorithms in R are available at https://github.com/arezourahimi/mtgsbc together with the scripts that replicate the reported experiments. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Christopher M. Wilson ◽  
Kaiqiao Li ◽  
Xiaoqing Yu ◽  
Pei-Fen Kuan ◽  
Xuefeng Wang

Author(s):  
Q. Wang ◽  
Y. Gu ◽  
T. Liu ◽  
H. Liu ◽  
X. Jin

In recent years, many studies on remote sensing image classification have shown that using multiple features from different data sources can effectively improve the classification accuracy. As a very powerful means of learning, multiple kernel learning (MKL) can conveniently be embedded in a variety of characteristics. The conventional combined kernel learned by MKL can be regarded as the compromise of all basic kernels for all classes in classification. It is the best of the whole, but not optimal for each specific class. For this problem, this paper proposes a class-pair-guided MKL method to integrate the heterogeneous features (HFs) from multispectral image (MSI) and light detection and ranging (LiDAR) data. In particular, the <q>one-against-one</q> strategy is adopted, which converts multiclass classification problem to a plurality of two-class classification problem. Then, we select the best kernel from pre-constructed basic kernels set for each class-pair by kernel alignment (KA) in the process of classification. The advantage of the proposed method is that only the best kernel for the classification of any two classes can be retained, which leads to greatly enhanced discriminability. Experiments are conducted on two real data sets, and the experimental results show that the proposed method achieves the best performance in terms of classification accuracies in integrating the HFs for classification when compared with several state-of-the-art algorithms.


2021 ◽  
Vol 53 (1) ◽  
Author(s):  
Toshimi Baba ◽  
Sara Pegolo ◽  
Lucio F. M. Mota ◽  
Francisco Peñagaricano ◽  
Giovanni Bittante ◽  
...  

Abstract Background Over the past decade, Fourier transform infrared (FTIR) spectroscopy has been used to predict novel milk protein phenotypes. Genomic data might help predict these phenotypes when integrated with milk FTIR spectra. The objective of this study was to investigate prediction accuracy for milk protein phenotypes when heterogeneous on-farm, genomic, and pedigree data were integrated with the spectra. To this end, we used the records of 966 Italian Brown Swiss cows with milk FTIR spectra, on-farm information, medium-density genetic markers, and pedigree data. True and total whey protein, and five casein, and two whey protein traits were analyzed. Multiple kernel learning constructed from spectral and genomic (pedigree) relationship matrices and multilayer BayesB assigning separate priors for FTIR and markers were benchmarked against a baseline partial least squares (PLS) regression. Seven combinations of covariates were considered, and their predictive abilities were evaluated by repeated random sub-sampling and herd cross-validations (CV). Results Addition of the on-farm effects such as herd, days in milk, and parity to spectral data improved predictions as compared to those obtained using the spectra alone. Integrating genomics and/or the top three markers with a large effect further enhanced the predictions. Pedigree data also improved prediction, but to a lesser extent than genomic data. Multiple kernel learning and multilayer BayesB increased predictive performance, whereas PLS did not. Overall, multilayer BayesB provided better predictions than multiple kernel learning, and lower prediction performance was observed in herd CV compared to repeated random sub-sampling CV. Conclusions Integration of genomic information with milk FTIR spectral can enhance milk protein trait predictions by 25% and 7% on average for repeated random sub-sampling and herd CV, respectively. Multiple kernel learning and multilayer BayesB outperformed PLS when used to integrate heterogeneous data for phenotypic predictions.


Author(s):  
Guo ◽  
Xiaoqian Zhang ◽  
Zhigui Liu ◽  
Xuqian Xue ◽  
Qian Wang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document