scholarly journals DEMLP: DeepWalk Embedding in MLP for miRNA-Disease Association Prediction

2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Xun Wang ◽  
Fuyu Wang ◽  
Xinzeng Wang ◽  
Sibo Qiao ◽  
Yu Zhuang

miRNAs significantly affect multifarious biological processes involving human disease. Biological experiments always need enormous financial support and time cost. Taking expense and difficulty into consideration, to predict the potential miRNA-disease associations, a lot of high-efficiency computational methods by computer have been developed, based on a network generated by miRNA-disease association dataset. However, there exist many challenges. Firstly, the association between miRNAs and diseases is intricate. These methods should consider the influence of the neighborhoods of each node from the network. Secondly, how to measure whether there is an association between two nodes of the network is also an important problem. In our study, we innovatively integrate graph node embedding with a multilayer perceptron and propose a method DEMLP. To begin with, we construct a miRNA-disease network by miRNA-disease adjacency matrix (MDA). Then, low-dimensional embedding representation vectors of nodes are learned from the miRNA-disease network by DeepWalk. Finally, we use these low-dimensional embedding representation vectors as input to train the multilayer perceptron. Experiments show that our proposed method that only utilized the miRNA–disease association information can effectively predict miRNA-disease associations. To evaluate the effectiveness of DEMLP in a miRNA-disease network from HMDD v3.2, we apply fivefold crossvalidation in our study. The ROC-AUC computed result value of DEMLP is 0.943, and the PR-AUC value of DEMLP is 0.937. Compared with other state-of-the-art methods, our method shows good performance using only the miRNA-disease interaction network.

2018 ◽  
Vol 2018 ◽  
pp. 1-10 ◽  
Author(s):  
Bo Wang ◽  
Jing Zhang

Long noncoding RNAs (lncRNAs) have an important role in various life processes of the body, especially cancer. The analysis of disease prognosis is ignored in current prediction on lncRNA–disease associations. In this study, a multiple linear regression model was constructed for lncRNA–disease association prediction based on clinical prognosis data (MlrLDAcp), which integrated the cancer data of clinical prognosis and the expression quantity of lncRNA transcript. MlrLDAcp could realize not only cancer survival prediction but also lncRNA–disease association prediction. Ultimately, 60 lncRNAs most closely related to prostate cancer survival were selected from 481 alternative lncRNAs. Then, the multiple linear regression relationship between the prognosis survival of 176 patients with prostate cancer and 60 lncRNAs was also given. Compared with previous studies, MlrLDAcp had a predominant survival predictive ability and could effectively predict lncRNA–disease associations. MlrLDAcp had an area under the curve (AUC) value of 0.875 for survival prediction and an AUC value of 0.872 for lncRNA–disease association prediction. It could be an effective biological method for biomedical research.


Genes ◽  
2019 ◽  
Vol 10 (8) ◽  
pp. 608 ◽  
Author(s):  
Yan Li ◽  
Junyi Li ◽  
Naizheng Bian

Identifying associations between lncRNAs and diseases can help understand disease-related lncRNAs and facilitate disease diagnosis and treatment. The dual-network integrated logistic matrix factorization (DNILMF) model has been used for drug–target interaction prediction, and good results have been achieved. We firstly applied DNILMF to lncRNA–disease association prediction (DNILMF-LDA). We combined different similarity kernel matrices of lncRNAs and diseases by using nonlinear fusion to extract the most important information in fused matrices. Then, lncRNA–disease association networks and similarity networks were built simultaneously. Finally, the Gaussian process mutual information (GP-MI) algorithm of Bayesian optimization was adopted to optimize the model parameters. The 10-fold cross-validation result showed that the area under receiving operating characteristic (ROC) curve (AUC) value of DNILMF-LDA was 0.9202, and the area under precision-recall (PR) curve (AUPR) was 0.5610. Compared with LRLSLDA, SIMCLDA, BiwalkLDA, and TPGLDA, the AUC value of our method increased by 38.81%, 13.07%, 8.35%, and 6.75%, respectively. The AUPR value of our method increased by 52.66%, 40.05%, 37.01%, and 44.25%. These results indicate that DNILMF-LDA is an effective method for predicting the associations between lncRNAs and diseases.


2020 ◽  
Vol 49 (D1) ◽  
pp. D86-D91
Author(s):  
Bailing Zhou ◽  
Baohua Ji ◽  
Kui Liu ◽  
Guodong Hu ◽  
Fei Wang ◽  
...  

Abstract Long non-coding RNAs (lncRNAs) play important functional roles in many diverse biological processes. However, not all expressed lncRNAs are functional. Thus, it is necessary to manually collect all experimentally validated functional lncRNAs (EVlncRNA) with their sequences, structures, and functions annotated in a central database. The first release of such a database (EVLncRNAs) was made using the literature prior to 1 May 2016. Since then (till 15 May 2020), 19 245 articles related to lncRNAs have been published. In EVLncRNAs 2.0, these articles were manually examined for a major expansion of the data collected. Specifically, the number of annotated EVlncRNAs, associated diseases, lncRNA-disease associations, and interaction records were increased by 260%, 320%, 484% and 537%, respectively. Moreover, the database has added several new categories: 8 lncRNA structures, 33 exosomal lncRNAs, 188 circular RNAs, and 1079 drug-resistant, chemoresistant, and stress-resistant lncRNAs. All records have checked against known retraction and fake articles. This release also comes with a highly interactive visual interaction network that facilitates users to track the underlying relations among lncRNAs, miRNAs, proteins, genes and other functional elements. Furthermore, it provides links to four new bioinformatics tools with improved data browsing and searching functionality. EVLncRNAs 2.0 is freely available at https://www.sdklab-biophysics-dzu.net/EVLncRNAs2/.


Author(s):  
Nan Sheng ◽  
Hui Cui ◽  
Tiangang Zhang ◽  
Ping Xuan

Abstract As the abnormalities of long non-coding RNAs (lncRNAs) are closely related to various human diseases, identifying disease-related lncRNAs is important for understanding the pathogenesis of complex diseases. Most of current data-driven methods for disease-related lncRNA candidate prediction are based on diseases and lncRNAs. Those methods, however, fail to consider the deeply embedded node attributes of lncRNA–disease pairs, which contain multiple relations and representations across lncRNAs, diseases and miRNAs. Moreover, the low-dimensional feature distribution at the pairwise level has not been taken into account. We propose a prediction model, VADLP, to extract, encode and adaptively integrate multi-level representations. Firstly, a triple-layer heterogeneous graph is constructed with weighted inter-layer and intra-layer edges to integrate the similarities and correlations among lncRNAs, diseases and miRNAs. We then define three representations including node attributes, pairwise topology and feature distribution. Node attributes are derived from the graph by an embedding strategy to represent the lncRNA–disease associations, which are inferred via their common lncRNAs, diseases and miRNAs. Pairwise topology is formulated by random walk algorithm and encoded by a convolutional autoencoder to represent the hidden topological structural relations between a pair of lncRNA and disease. The new feature distribution is modeled by a variance autoencoder to reveal the underlying lncRNA–disease relationship. Finally, an attentional representation-level integration module is constructed to adaptively fuse the three representations for lncRNA–disease association prediction. The proposed model is tested over a public dataset with a comprehensive list of evaluations. Our model outperforms six state-of-the-art lncRNA–disease prediction models with statistical significance. The ablation study showed the important contributions of three representations. In particular, the improved recall rates under different top $k$ values demonstrate that our model is powerful in discovering true disease-related lncRNAs in the top-ranked candidates. Case studies of three cancers further proved the capacity of our model to discover potential disease-related lncRNAs.


2020 ◽  
Author(s):  
Bo-Ya Ji ◽  
Zhu-Hong You ◽  
Zhan-Heng Chen ◽  
Leon Wong ◽  
Hai-Cheng Yi

Abstract Background As an important non-coding RNA newly discovered in recent years, MicroRNA (miRNA) plays an important role in a series of life processes and is closely associated with a variety of human diseases. Hence, the identification of potential miRNA-disease associations can make great contributions to the research and treatment of human diseases. However, to our knowledge, many of the existing state-of-the-art computational methods only utilize the single type of known association information between miRNAs and diseases to predict their potential associations, without focusing on their interactions or associations with other types of molecules. Results In this paper, a network embedding-based the tripartite miRNA-protein-disease network (NEMPD) method was proposed for the prediction of miRNA-disease associations. Firstly, a tripartite miRNA-protein-disease network is created by integrating known miRNA-protein and protein-disease associations. Then, we utilize the network representation method-Learning Graph Representations with Global Structural Information (GraRep) to obtain the behavior information (associations with proteins in the network) of miRNAs and diseases. Secondly, the behavior information of miRNAs and diseases is combined with the attribute information of them (disease semantic similarity and miRNA sequence information) to represent miRNA-disease pairs. Thirdly, the prediction model was established based on these known miRNA-disease pairs and the Random Forest algorithm. In the results, under five-fold cross validation, the average prediction accuracy, sensitivity, and AUC of NEMPD is 85.41%, 80.96%, and 91.58%. Furthermore, the performance of NEMPD was also validated by the case studies. Among the top 50 predicted disease-related miRNAs, 48 (breast neoplasms), 47 (colon neoplasms), 47 (lung neoplasms) were confirmed by two other databases. Conclusions NEMPD has a good performance in predicting the potential associations between miRNAs and diseases and has great potency in the field of miRNA-disease association prediction in the future.


2021 ◽  
Vol 21 ◽  
Author(s):  
Biao Du ◽  
Lin Tang ◽  
Lin Liu ◽  
Wei Zhou

Background: Increasing research reveals that long non-coding RNAs (lncRNAs) play an important role in various biological processes of human diseases. Nonetheless, only a handful of lncRNA-disease associations have been experimentally verified. The study of lncRNA-disease association prediction based on the computational model has provided a preliminary basis for biological experiments to a great degree so as to cut down the huge cost of wet lab experiments. Objective: This study aims to learn the real distribution of lncRNA-disease association from a limited number of known lncRNA-disease association data. This paper proposes a new lncRNA-disease association prediction model called LDA-GAN based on a generative adversarial network (GAN). Method: Aiming at the problems of slow convergence rate, training instabilities, and unavailability of discrete data in traditional GAN, LDA-GAN utilizes the Gumbel-softmax technology to construct a differentiable process for simulating discrete sampling. Meanwhile, the generator and the discriminator of LDA-GAN are integrated to establish the overall optimization goal based on the pairwise loss function. Results: Experiments on standard datasets demonstrate that LDA-GAN achieves not only high stability and high efficiency in the process of confrontation learning but also gives full play to the semi-supervised learning advantage of generative adversarial learning framework for unlabeled data, which further improves the prediction accuracy of lncRNA-disease association. Besides, case studies show that LDA-GAN can accurately generate potential diseases for several lncRNAs.


2020 ◽  
Author(s):  
Bo-Ya Ji ◽  
Zhu-Hong You ◽  
Zhan-Heng Chen ◽  
Leon Wong ◽  
Hai-Cheng Yi

Abstract Background: As an important non-coding RNA newly discovered in recent years, MicroRNA (miRNA) plays an important role in a series of life processes and is closely associated with a variety of human diseases. Hence, the identification of potential miRNA-disease associations can make great contributions to the research and treatment of human diseases. However, to our knowledge, many of the existing state-of-the-art computational methods only utilize the single type of known association information between miRNAs and diseases to predict their potential associations, without focusing on their interactions or associations with other types of molecules.Results: In this paper, a network embedding-based the tripartite miRNA-protein-disease network (NEMPD) method was proposed for the prediction of miRNA-disease associations. Firstly, a tripartite miRNA-protein-disease network is created by integrating known miRNA-protein and protein-disease associations. Then, we utilize the network representation method-Learning Graph Representations with Global Structural Information (GraRep) to obtain the behavior information (associations with proteins in the network) of miRNAs and diseases. Secondly, the behavior information of miRNAs and diseases is combined with the attribute information of them (disease semantic similarity and miRNA sequence information) to represent miRNA-disease pairs. Thirdly, the prediction model was established based on these known miRNA-disease pairs and the Random Forest algorithm. In the results, under five-fold cross validation, the prediction accuracy, sensitivity, and AUC of NEMPD is 85.41%, 80.96%, and 91.58%. Furthermore, the performance of NEMPD was also validated by the case studies. Among the top 50 predicted disease-related miRNAs, 48 (breast neoplasms), 47 (colon neoplasms), 47 (lung neoplasms) were confirmed by two other databases.Conclusions: NEMPD has a good performance in predicting the potential associations between miRNAs and diseases and has great potency in the field of miRNA-disease association prediction in the future.


2019 ◽  
Vol 20 (S18) ◽  
Author(s):  
Jialu Hu ◽  
Yiqun Gao ◽  
Jing Li ◽  
Yan Zheng ◽  
Jingru Wang ◽  
...  

Abstract Backgrounds There is evidence to suggest that lncRNAs are associated with distinct and diverse biological processes. The dysfunction or mutation of lncRNAs are implicated in a wide range of diseases. An accurate computational model can benefit the diagnosis of diseases and help us to gain a better understanding of the molecular mechanism. Although many related algorithms have been proposed, there is still much room to improve the accuracy of the algorithm. Results We developed a novel algorithm, BiWalkLDA, to predict disease-related lncRNAs in three real datasets, which have 528 lncRNAs, 545 diseases and 1216 interactions in total. To compare performance with other algorithms, the leave-one-out validation test was performed for BiWalkLDA and three other existing algorithms, SIMCLDA, LDAP and LRLSLDA. Additional tests were carefully designed to analyze the parameter effects such as α, β, l and r, which could help user to select the best choice of these parameters in their own application. In a case study of prostate cancer, eight out of the top-ten disease-related lncRNAs reported by BiWalkLDA were previously confirmed in literatures. Conclusions In this paper, we develop an algorithm, BiWalkLDA, to predict lncRNA-disease association by using bi-random walks. It constructs a lncRNA-disease network by integrating interaction profile and gene ontology information. Solving cold-start problem by using neighbors’ interaction profile information. Then, bi-random walks was applied to three real biological datasets. Results show that our method outperforms other algorithms in predicting lncRNA-disease association in terms of both accuracy and specificity. Availability https://github.com/screamer/BiwalkLDA


2020 ◽  
Vol 21 (11) ◽  
pp. 1078-1084
Author(s):  
Ruizhi Fan ◽  
Chenhua Dong ◽  
Hu Song ◽  
Yixin Xu ◽  
Linsen Shi ◽  
...  

: Recently, an increasing number of biological and clinical reports have demonstrated that imbalance of microbial community has the ability to play important roles among several complex diseases concerning human health. Having a good knowledge of discovering potential of microbe-disease relationships, which provides the ability to having a better understanding of some issues, including disease pathology, further boosts disease diagnostics and prognostics, has been taken into account. Nevertheless, a few computational approaches can meet the need of huge scale of microbe-disease association discovery. In this work, we proposed the EHAI model, which is Enhanced Human microbe- disease Association Identification. EHAI employed the microbe-disease associations, and then Gaussian interaction profile kernel similarity has been utilized to enhance the basic microbe-disease association. Actually, some known microbe-disease associations and a large amount of associations are still unavailable among the datasets. The ‘super-microbe’ and ‘super-disease’ were employed to enhance the model. Computational results demonstrated that such super-classes have the ability to be helpful to the performance of EHAI. Therefore, it is anticipated that EHAI can be treated as an important biological tool in this field.


2009 ◽  
Vol 7 (44) ◽  
pp. 423-437 ◽  
Author(s):  
Tijana Milenković ◽  
Vesna Memišević ◽  
Anand K. Ganesan ◽  
Nataša Pržulj

Many real-world phenomena have been described in terms of large networks. Networks have been invaluable models for the understanding of biological systems. Since proteins carry out most biological processes, we focus on analysing protein–protein interaction (PPI) networks. Proteins interact to perform a function. Thus, PPI networks reflect the interconnected nature of biological processes and analysing their structural properties could provide insights into biological function and disease. We have already demonstrated, by using a sensitive graph theoretic method for comparing topologies of node neighbourhoods called ‘graphlet degree signatures’, that proteins with similar surroundings in PPI networks tend to perform the same functions. Here, we explore whether the involvement of genes in cancer suggests the similarity of their topological ‘signatures’ as well. By applying a series of clustering methods to proteins' topological signature similarities, we demonstrate that the obtained clusters are significantly enriched with cancer genes. We apply this methodology to identify novel cancer gene candidates, validating 80 per cent of our predictions in the literature. We also validate predictions biologically by identifying cancer-related negative regulators of melanogenesis identified in our siRNA screen. This is encouraging, since we have done this solely from PPI network topology. We provide clear evidence that PPI network structure around cancer genes is different from the structure around non-cancer genes. Understanding the underlying principles of this phenomenon is an open question, with a potential for increasing our understanding of complex diseases.


Sign in / Sign up

Export Citation Format

Share Document