scholarly journals LOMDA: Linear optimization for miRNA-disease association prediction

2019 ◽  
Author(s):  
Yan-Li Lee ◽  
Ratha Pech ◽  
Maryna Po ◽  
Dong Hao ◽  
Tao Zhou

AbstractMicroRNAs (miRNAs) have been playing a crucial role in many important biological processes e.g., pathogenesis of diseases. Currently, the validated associations between miRNAs and diseases are insufficient comparing to the hidden associations. Testing all these hidden associations by biological experiments is expensive, laborious, and time consuming. Therefore, computationally inferring hidden associations from biological datasets for further laboratory experiments has attracted increasing interests from different communities ranging from biological to computational science. In this work, we propose an effective and efficient method to predict associations between miRNAs and diseases, namely linear optimization (LOMDA). The proposed method uses the heterogenous matrix incorporating of miRNA functional similarity information, disease similarity information and known miRNA-disease associations. Compared with the other methods, LOMDA performs best in terms of AUC (0.970), precision (0.566), and accuracy (0.971) in average over 15 diseases in local 5-fold cross-validation. Moreover, LOMDA has also been applied to two types of case studies. In the first case study, 30 predictions from breast neoplasms, 24 from colon neoplasms, and 26 from kidney neoplasms among top 30 predicted miRNAs are confirmed. In the second case study, for new diseases without any known associations, top 30 predictions from hepatocellular carcinoma and 29 from lung neoplasms among top 30 predicted miRNAs are confirmed.Author summaryIdentifying associations between miRNAs and diseases is significant in investigation of pathogenesis, diagnosis, treatment and preventions of related diseases. Employing computational methods to predict the hidden associations based on known associations and focus on those predicted associations can sharply reduce the experimental costs. We developed a computational method LOMDA based on the linear optimization technique to predict the hidden associations. In addition to the observed associations, LOMDA also can employ the auxiliary information (diseases and miRNAs similarity information) flexibly and effectively. Numerical experiments on global 5-fold cross validation show that the use of the auxiliary information can greatly improve the prediction performance. Meanwhile, the result on local 5-fold cross validation shows that LOMDA performs best among the seven related methods. We further test the prediction performance of LOMDA for two types of diseases based on HDMMv2.0 (2014), including (i) diseases with all the known associations, and (ii) new diseases without known associations. Three independent or updated databases (dbDEMC, 2010; miR2Disease, 2009; HDMMv3.2, 2019) are introduced to evaluate the prediction results. As a result, most miRNAs for target diseases are confirmed by at least one of the three databases. So, we believe that LOMDA can guide experiments to identify the hidden miRNA-disease associations.

2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Yubin Xiao ◽  
Zheng Xiao ◽  
Xiang Feng ◽  
Zhiping Chen ◽  
Linai Kuang ◽  
...  

Abstract Background Accumulating evidence has demonstrated that long non-coding RNAs (lncRNAs) are closely associated with human diseases, and it is useful for the diagnosis and treatment of diseases to get the relationships between lncRNAs and diseases. Due to the high costs and time complexity of traditional bio-experiments, in recent years, more and more computational methods have been proposed by researchers to infer potential lncRNA-disease associations. However, there exist all kinds of limitations in these state-of-the-art prediction methods as well. Results In this manuscript, a novel computational model named FVTLDA is proposed to infer potential lncRNA-disease associations. In FVTLDA, its major novelty lies in the integration of direct and indirect features related to lncRNA-disease associations such as the feature vectors of lncRNA-disease pairs and their corresponding association probability fractions, which guarantees that FVTLDA can be utilized to predict diseases without known related-lncRNAs and lncRNAs without known related-diseases. Moreover, FVTLDA neither relies solely on known lncRNA-disease nor requires any negative samples, which guarantee that it can infer potential lncRNA-disease associations more equitably and effectively than traditional state-of-the-art prediction methods. Additionally, to avoid the limitations of single model prediction techniques, we combine FVTLDA with the Multiple Linear Regression (MLR) and the Artificial Neural Network (ANN) for data analysis respectively. Simulation experiment results show that FVTLDA with MLR can achieve reliable AUCs of 0.8909, 0.8936 and 0.8970 in 5-Fold Cross Validation (fivefold CV), 10-Fold Cross Validation (tenfold CV) and Leave-One-Out Cross Validation (LOOCV), separately, while FVTLDA with ANN can achieve reliable AUCs of 0.8766, 0.8830 and 0.8807 in fivefold CV, tenfold CV, and LOOCV respectively. Furthermore, in case studies of gastric cancer, leukemia and lung cancer, experiment results show that there are 8, 8 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with MLR, and 8, 7 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with ANN, having been verified by recent literature. Comparing with the representative prediction model of KATZLDA, comparison results illustrate that FVTLDA with MLR and FVTLDA with ANN can achieve the average case study contrast scores of 0.8429 and 0.8515 respectively, which are both notably higher than the average case study contrast score of 0.6375 achieved by KATZLDA. Conclusion The simulation results show that FVTLDA has good prediction performance, which is a good supplement to future bioinformatics research.


2020 ◽  
Author(s):  
Yubin Xiao ◽  
Zheng Xiao ◽  
Xiang Feng ◽  
Zhiping Chen ◽  
Linai Kuang ◽  
...  

Abstract Background: Accumulating evidence has demonstrated that long non-coding RNAs (lncRNAs) are closely associated with human diseases, and it is useful for the diagnosis and treatment of diseases to get the relationships between lncRNAs and diseases. Due to the high costs and time complexity of traditional bio-experiments, in recent years, more and more computational methods have been proposed by researchers to infer potential lncRNA-disease associations. However, there exist all kinds of limitations in these state-of-the-art prediction methods as well.Results: In this manuscript, a novel computational model named FVTLDA is proposed to infer potential lncRNA-disease associations. In FVTLDA, its major novelty lies in the integration of direct and indirect features related to lncRNA-disease associations such as the feature vectors of lncRNA-disease pairs and their corresponding association probability fractions, which guarantees that FVTLDA can be utilized to predict diseases without known related-lncRNAs and lncRNAs without known related-diseases. Moreover, FVTLDA neither relies solely on known lncRNA-disease nor requires any negative samples, which guarantee that it can infer potential lncRNA-disease associations more equitably and effectively than traditional state-of-the-art prediction methods. Additionally, to avoid the limitations of single model prediction techniques, we combine FVTLDA with the Multiple Linear Regression (MLR) and the Artificial Neural Network (ANN) for data analysis respectively. Simulation experiment results show that FVTLDA with MLR can achieve reliable AUCs of 0.8909, 0.8936 and 0.8970 in 5-Fold Cross Validation (5-fold CV), 10-Fold Cross Validation (10-fold CV) and Leave-One-Out Cross Validation (LOOCV), separately, while FVTLDA with ANN can achieve reliable AUCs of 0.8766, 0.8830 and 0.8807 in 5-fold CV, 10-fold CV, and LOOCV respectively. Furthermore, in case studies of gastric cancer, leukemia and lung cancer, experiment results show that there are 8, 8 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with MLR, and 8, 7 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with ANN, having been verified by recent literature. Comparing with the representative prediction model of KATZLDA, comparison results illustrate that FVTLDA with MLR and FVTLDA with ANN can achieve the average case study contrast scores of 0.8429 and 0.8515 respectively, which are both notably higher than the average case study contrast score of 0.6375 achieved by KATZLDA.Conclusion: The simulation results show that FVTLDA has good prediction performance, which is a good supplement to future bioinformatics research.


2019 ◽  
Author(s):  
Yubin Xiao ◽  
Zheng Xiao ◽  
Xiang Feng ◽  
Zhiping Chen ◽  
Linai Kuang ◽  
...  

Abstract BackgroundAccumulating evidence has demonstrated that lncRNAs are closely associated with human diseases, and it is helpful for the diagnosis and treatment of diseases to get the relationships between lncRNAs and diseases. Due to the high costs and time complexity of traditional bio-experiments, in recent years, more and more computational methods have been proposed by researchers to infer potential lncRNA-disease associations. However, there exist all kinds of limitations in these prediction methods as well. ResultsIn this manuscript, a novel computational model named FVTLDA is proposed to infer potential lncRNA-disease associations. In FVTLDA, its major novelty lies in the integration of direct and indirect features related to lncRNA-disease associations such as the feature vectors of lncRNA-disease pairs and their corresponding association probability fractions, which guarantees that FVTLDA can be utilized to predict diseases without known related-lncRNAs and lncRNAs without known related-diseases. Moreover, FVTLDA neither relies solely on known lncRNA-disease nor requires any negative samples, which guarantee that it can infer potential lncRNA-disease associations more equitably and effectively than traditional methods. Additionally, to avoid the limitations of single model prediction techniques, we combine FVTLDA with the Multiple Linear Regression (MLR) and the Artificial Neural Network (ANN) for data analysis respectively. Simulation experiment results show that FVTLDA with MLR can achieve reliable AUCs of 0.8909, 0.8936 and 0.8970 in 5-Fold Cross Validation, 10-Fold Cross Validation and Leave-One-Out Cross Validation, separately, while FVTLDA with ANN can achieve reliable AUCs of 0.8766, 0.8830 and 0.8807 in 5-fold CV, 10-fold CV, and LOOCV respectively. Furthermore, in case studies of gastric cancer, leukemia and lung cancer, experiment results show that there are 8, 8 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with MLR, and 8, 7 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with ANN, having been verified by recent literature. Moreover, comparing with the representative prediction model of KATZLDA, results illustrate that FVTLDA with MLR and FVTLDA with ANN can achieve the average case study contrast scores of 0.8429 and 0.8515 respectively, which are both higher than the average case study contrast score of 0.6375 achieved by KATZLDA.


2021 ◽  
Vol 12 ◽  
Author(s):  
Cunmei Ji ◽  
Yutian Wang ◽  
Jiancheng Ni ◽  
Chunhou Zheng ◽  
Yansen Su

In recent years, more and more evidence has shown that microRNAs (miRNAs) play an important role in the regulation of post-transcriptional gene expression, and are closely related to human diseases. Many studies have also revealed that miRNAs can be served as promising biomarkers for the potential diagnosis and treatment of human diseases. The interactions between miRNA and human disease have rarely been demonstrated, and the underlying mechanism of miRNA is not clear. Therefore, computational approaches has attracted the attention of researchers, which can not only save time and money, but also improve the efficiency and accuracy of biological experiments. In this work, we proposed a Heterogeneous Graph Attention Networks (GAT) based method for miRNA-disease associations prediction, named HGATMDA. We constructed a heterogeneous graph for miRNAs and diseases, introduced weighted DeepWalk and GAT methods to extract features of miRNAs and diseases from the graph. Moreover, a fully-connected neural networks is used to predict correlation scores between miRNA-disease pairs. Experimental results under five-fold cross validation (five-fold CV) showed that HGATMDA achieved better prediction performance than other state-of-the-art methods. In addition, we performed three case studies on breast neoplasms, lung neoplasms and kidney neoplasms. The results showed that for the three diseases mentioned above, 50 out of top 50 candidates were confirmed by the validation datasets. Therefore, HGATMDA is suitable as an effective tool to identity potential diseases-related miRNAs.


2020 ◽  
Author(s):  
Yubin Xiao ◽  
Zheng Xiao ◽  
Xiang Feng ◽  
Zhiping Chen ◽  
Linai Kuang ◽  
...  

Abstract Background: Accumulating evidence has demonstrated that long non-coding RNAs (lncRNAs) are closely associated with human diseases, and it is helpful for the diagnosis and treatment of diseases to get the relationships between lncRNAs and diseases. Due to the high costs and time complexity of traditional bio-experiments, in recent years, more and more computational methods have been proposed by researchers to infer potential lncRNA-disease associations. However, there exist all kinds of limitations in these state-of-the-art prediction methods as well.Results: In this manuscript, a novel computational model named FVTLDA is proposed to infer potential lncRNA-disease associations. In FVTLDA, its major novelty lies in the integration of direct and indirect features related to lncRNA-disease associations such as the feature vectors of lncRNA-disease pairs and their corresponding association probability fractions, which guarantees that FVTLDA can be utilized to predict diseases without known related-lncRNAs and lncRNAs without known related-diseases. Moreover, FVTLDA neither relies solely on known lncRNA-disease nor requires any negative samples, which guarantee that it can infer potential lncRNA-disease associations more equitably and effectively than traditional state-of-the-art prediction methods. Additionally, to avoid the limitations of single model prediction techniques, we combine FVTLDA with the Multiple Linear Regression (MLR) and the Artificial Neural Network (ANN) for data analysis respectively. Simulation experiment results show that FVTLDA with MLR can achieve reliable AUCs of 0.8909, 0.8936 and 0.8970 in 5-Fold Cross Validation (5-fold CV), 10-Fold Cross Validation (10-fold CV) and Leave-One-Out Cross Validation (LOOCV), separately, while FVTLDA with ANN can achieve reliable AUCs of 0.8766, 0.8830 and 0.8807 in 5-fold CV, 10-fold CV, and LOOCV respectively. Furthermore, in case studies of gastric cancer, leukemia and lung cancer, experiment results show that there are 8, 8 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with MLR, and 8, 7 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with ANN, having been verified by recent literature. Moreover, comparing with the representative prediction model of KATZLDA, comparison results illustrate that FVTLDA with MLR and FVTLDA with ANN can achieve the average case study contrast scores of 0.8429 and 0.8515 respectively, which are both notably higher than the average case study contrast score of 0.6375 achieved by KATZLDA.Conclusion: The simulation results show that FVTLDA has good prediction performance, which is a good supplement to future bioinformatics research.


2018 ◽  
Vol 19 (10) ◽  
pp. 3052 ◽  
Author(s):  
Bi Zhao ◽  
Bin Xue

Using computational techniques to identify intrinsically disordered residues is practical and effective in biological studies. Therefore, designing novel high-accuracy strategies is always preferable when existing strategies have a lot of room for improvement. Among many possibilities, a meta-strategy that integrates the results of multiple individual predictors has been broadly used to improve the overall performance of predictors. Nonetheless, a simple and direct integration of individual predictors may not effectively improve the performance. In this project, dual-threshold two-step significance voting and neural networks were used to integrate the predictive results of four individual predictors, including: DisEMBL, IUPred, VSL2, and ESpritz. The new meta-strategy has improved the prediction performance of intrinsically disordered residues significantly, compared to all four individual predictors and another four recently-designed predictors. The improvement was validated using five-fold cross-validation and in independent test datasets.


Land ◽  
2020 ◽  
Vol 9 (6) ◽  
pp. 174
Author(s):  
Desheng Wang ◽  
A-Xing Zhu

Digital soil mapping (DSM) is currently the primary framework for predicting the spatial variation of soil information (soil type or soil properties). Random forests and similarity-based methods have been used widely in DSM. However, the accuracy of the similarity-based approach is limited, and the performance of random forests is affected by the quality of the feature set. The objective of this study was to present a method for soil mapping by integrating the similarity-based approach and the random forests method. The Heshan area (Heilongjiang province, China) was selected as the case study for mapping soil subgroups. The results of the regular validation samples showed that the overall accuracy of the integrated method (71.79%) is higher than that of a similarity-based approach (58.97%) and random forests (66.67%). The results of the 5-fold cross-validation showed that the overall accuracy of the integrated method, similarity-based approach, and random forests range from 55% to 72.73%, 43.48% to 69.57%, and 54.17% to 70.83%, with an average accuracy of 66.61%, 57.39%, and 59.62%, respectively. These results suggest that the proposed method can produce a high-quality covariate set and achieve a better performance than either the random forests or similarity-based approach alone.


Author(s):  
Mahendra Awale ◽  
Jean-Louis Reymond

<div>Here we report PPB2 as a target prediction tool assigning targets to a query molecule based on ChEMBL data. PPB2 computes ligand similarities using molecular fingerprints encoding composition (MQN), molecular shape and pharmacophores (Xfp), and substructures (ECfp4), and features an unprecedented combination of nearest neighbor (NN) searches and Naïve Bayes (NB) machine learning, together with simple NN searches, NB and Deep Neural Network (DNN) machine learning models as further options. Although NN(ECfp4) gives the best results in terms of recall in a 10-fold cross-validation study, combining NN searches with NB machine learning provides superior precision statistics, as well as better results in a case study predicting off-targets of a recently reported TRPV6 calcium channel inhibitor, illustrating the value of this combined approach. PPB2 is available to assess possible off-targets of small molecule drug-like compounds by public access at ppb2.gdb.tools.</div>


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Rong Zhu ◽  
Yong Wang ◽  
Jin-Xing Liu ◽  
Ling-Yun Dai

Abstract Background Identifying lncRNA-disease associations not only helps to better comprehend the underlying mechanisms of various human diseases at the lncRNA level but also speeds up the identification of potential biomarkers for disease diagnoses, treatments, prognoses, and drug response predictions. However, as the amount of archived biological data continues to grow, it has become increasingly difficult to detect potential human lncRNA-disease associations from these enormous biological datasets using traditional biological experimental methods. Consequently, developing new and effective computational methods to predict potential human lncRNA diseases is essential. Results Using a combination of incremental principal component analysis (IPCA) and random forest (RF) algorithms and by integrating multiple similarity matrices, we propose a new algorithm (IPCARF) based on integrated machine learning technology for predicting lncRNA-disease associations. First, we used two different models to compute a semantic similarity matrix of diseases from a directed acyclic graph of diseases. Second, a characteristic vector for each lncRNA-disease pair is obtained by integrating disease similarity, lncRNA similarity, and Gaussian nuclear similarity. Then, the best feature subspace is obtained by applying IPCA to decrease the dimension of the original feature set. Finally, we train an RF model to predict potential lncRNA-disease associations. The experimental results show that the IPCARF algorithm effectively improves the AUC metric when predicting potential lncRNA-disease associations. Before the parameter optimization procedure, the AUC value predicted by the IPCARF algorithm under 10-fold cross-validation reached 0.8529; after selecting the optimal parameters using the grid search algorithm, the predicted AUC of the IPCARF algorithm reached 0.8611. Conclusions We compared IPCARF with the existing LRLSLDA, LRLSLDA-LNCSIM, TPGLDA, NPCMF, and ncPred prediction methods, which have shown excellent performance in predicting lncRNA-disease associations. The compared results of 10-fold cross-validation procedures show that the predictions of the IPCARF method are better than those of the other compared methods.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Feng Zhou ◽  
Meng-Meng Yin ◽  
Cui-Na Jiao ◽  
Zhen Cui ◽  
Jing-Xiu Zhao ◽  
...  

Abstract Background With the rapid development of various advanced biotechnologies, researchers in related fields have realized that microRNAs (miRNAs) play critical roles in many serious human diseases. However, experimental identification of new miRNA–disease associations (MDAs) is expensive and time-consuming. Practitioners have shown growing interest in methods for predicting potential MDAs. In recent years, an increasing number of computational methods for predicting novel MDAs have been developed, making a huge contribution to the research of human diseases and saving considerable time. In this paper, we proposed an efficient computational method, named bipartite graph-based collaborative matrix factorization (BGCMF), which is highly advantageous for predicting novel MDAs. Results By combining two improved recommendation methods, a new model for predicting MDAs is generated. Based on the idea that some new miRNAs and diseases do not have any associations, we adopt the bipartite graph based on the collaborative matrix factorization method to complete the prediction. The BGCMF achieves a desirable result, with AUC of up to 0.9514 ± (0.0007) in the five-fold cross-validation experiments. Conclusions Five-fold cross-validation is used to evaluate the capabilities of our method. Simulation experiments are implemented to predict new MDAs. More importantly, the AUC value of our method is higher than those of some state-of-the-art methods. Finally, many associations between new miRNAs and new diseases are successfully predicted by performing simulation experiments, indicating that BGCMF is a useful method to predict more potential miRNAs with roles in various diseases.


Sign in / Sign up

Export Citation Format

Share Document