Comparison and integration of computational methods for deleterious synonymous mutation prediction

2019 ◽  
Vol 21 (3) ◽  
pp. 970-981 ◽  
Author(s):  
Na Cheng ◽  
Menglu Li ◽  
Le Zhao ◽  
Bo Zhang ◽  
Yuhua Yang ◽  
...  

Abstract Synonymous mutations do not change the encoded amino acids but may alter the structure or function of an mRNA in ways that impact gene function. Advances in next generation sequencing technologies have detected numerous synonymous mutations in the human genome. Several computational models have been proposed to predict deleterious synonymous mutations, which have greatly facilitated the development of this important field. Consequently, there is an urgent need to assess the state-of-the-art computational methods for deleterious synonymous mutation prediction to further advance the existing methodologies and to improve performance. In this regard, we systematically compared a total of 10 computational methods (including specific method for deleterious synonymous mutation and general method for single nucleotide mutation) in terms of the algorithms used, calculated features, performance evaluation and software usability. In addition, we constructed two carefully curated independent test datasets and accordingly assessed the robustness and scalability of these different computational methods for the identification of deleterious synonymous mutations. In an effort to improve predictive performance, we established an ensemble model, named Prediction of Deleterious Synonymous Mutation (PrDSM), which averages the ratings generated by the three most accurate predictors. Our benchmark tests demonstrated that the ensemble model PrDSM outperformed the reviewed tools for the prediction of deleterious synonymous mutations. Using the ensemble model, we developed an accessible online predictor, PrDSM, available at http://bioinfo.ahu.edu.cn:8080/PrDSM/. We hope that this comprehensive survey and the proposed strategy for building more accurate models can serve as a useful guide for inspiring future developments of computational methods for deleterious synonymous mutation prediction.

Author(s):  
Chun-Chun Wang ◽  
Yan Zhao ◽  
Xing Chen

Abstract Effective drugs are urgently needed to overcome human complex diseases. However, the research and development of novel drug would take long time and cost much money. Traditional drug discovery follows the rule of one drug-one target, while some studies have demonstrated that drugs generally perform their task by affecting related pathway rather than targeting single target. Thus, the new strategy of drug discovery, namely pathway-based drug discovery, have been proposed. Obviously, identifying associations between drugs and pathways plays a key role in the development of pathway-based drug discovery. Revealing the drug-pathway associations by experiment methods would take much time and cost. Therefore, some computational models were established to predict potential drug-pathway associations. In this review, we first introduced the background of drug and the concept of drug-pathway associations. Then, some publicly accessible databases and web servers about drug-pathway associations were listed. Next, we summarized some state-of-the-art computational methods in the past years for inferring drug-pathway associations and divided these methods into three classes, namely Bayesian spare factor-based, matrix decomposition-based and other machine learning methods. In addition, we introduced several evaluation strategies to estimate the predictive performance of various computational models. In the end, we discussed the advantages and limitations of existing computational methods and provided some suggestions about the future directions of the data collection and the calculation models development.


2020 ◽  
Vol 27 ◽  
Author(s):  
Gabriela Bitencourt-Ferreira ◽  
Camila Rizzotto ◽  
Walter Filgueira de Azevedo Junior

Background: Analysis of atomic coordinates of protein-ligand complexes can provide three-dimensional data to generate computational models to evaluate binding affinity and thermodynamic state functions. Application of machine learning techniques can create models to assess protein-ligand potential energy and binding affinity. These methods show superior predictive performance when compared with classical scoring functions available in docking programs. Objective: Our purpose here is to review the development and application of the program SAnDReS. We describe the creation of machine learning models to assess the binding affinity of protein-ligand complexes. Method: SAnDReS implements machine learning methods available in the scikit-learn library. This program is available for download at https://github.com/azevedolab/sandres. SAnDReS uses crystallographic structures, binding, and thermodynamic data to create targeted scoring functions. Results: Recent applications of the program SAnDReS to drug targets such as Coagulation factor Xa, cyclin-dependent kinases, and HIV-1 protease were able to create targeted scoring functions to predict inhibition of these proteins. These targeted models outperform classical scoring functions. Conclusion: Here, we reviewed the development of machine learning scoring functions to predict binding affinity through the application of the program SAnDReS. Our studies show the superior predictive performance of the SAnDReS-developed models when compared with classical scoring functions available in the programs such as AutoDock4, Molegro Virtual Docker, and AutoDock Vina.


2019 ◽  
Vol 20 (5) ◽  
pp. 565-578 ◽  
Author(s):  
Lidong Wang ◽  
Ruijun Zhang

Ubiquitination is an important post-translational modification (PTM) process for the regulation of protein functions, which is associated with cancer, cardiovascular and other diseases. Recent initiatives have focused on the detection of potential ubiquitination sites with the aid of physicochemical test approaches in conjunction with the application of computational methods. The identification of ubiquitination sites using laboratory tests is especially susceptible to the temporality and reversibility of the ubiquitination processes, and is also costly and time-consuming. It has been demonstrated that computational methods are effective in extracting potential rules or inferences from biological sequence collections. Up to the present, the computational strategy has been one of the critical research approaches that have been applied for the identification of ubiquitination sites, and currently, there are numerous state-of-the-art computational methods that have been developed from machine learning and statistical analysis to undertake such work. In the present study, the construction of benchmark datasets is summarized, together with feature representation methods, feature selection approaches and the classifiers involved in several previous publications. In an attempt to explore pertinent development trends for the identification of ubiquitination sites, an independent test dataset was constructed and the predicting results obtained from five prediction tools are reported here, together with some related discussions.


Author(s):  
Erandi Lakshika ◽  
Michael Barlow

Computational aesthetics is an area of research that attempts to develop computational methods that can perform human-like aesthetic judgements. Aesthetic judgements are often subjective, and as such, the development of computational models of aesthetics is highly challenging. This chapter summarizes the advancements in the area of computational aesthetics and how computational intelligence techniques are applied in art and aesthetics ranging from simple classification problems to more advanced problems such as automatic generation of art artefacts, stories, and simulations. The chapter concludes by summarizing major challenges that need to be addressed, and future directions that need to be undertaken in order to make significant advancements in the area of computational aesthetics and its applications.


2020 ◽  
Vol 21 (4) ◽  
pp. 1508
Author(s):  
Yi Zhang ◽  
Min Chen ◽  
Ang Li ◽  
Xiaohui Cheng ◽  
Hong Jin ◽  
...  

Long non-coding RNAs (long ncRNAs, lncRNAs) of all kinds have been implicated in a range of cell developmental processes and diseases, while they are not translated into proteins. Inferring diseases associated lncRNAs by computational methods can be helpful to understand the pathogenesis of diseases, but those current computational methods still have not achieved remarkable predictive performance: such as the inaccurate construction of similarity networks and inadequate numbers of known lncRNA–disease associations. In this research, we proposed a lncRNA–disease associations inference based on integrated space projection scores (LDAI-ISPS) composed of the following key steps: changing the Boolean network of known lncRNA–disease associations into the weighted networks via combining all the global information (e.g., disease semantic similarities, lncRNA functional similarities, and known lncRNA–disease associations); obtaining the space projection scores via vector projections of the weighted networks to form the final prediction scores without biases. The leave-one-out cross validation (LOOCV) results showed that, compared with other methods, LDAI-ISPS had a higher accuracy with area-under-the-curve (AUC) value of 0.9154 for inferring diseases, with AUC value of 0.8865 for inferring new lncRNAs (whose associations related to diseases are unknown), with AUC value of 0.7518 for inferring isolated diseases (whose associations related to lncRNAs are unknown). A case study also confirmed the predictive performance of LDAI-ISPS as a helper for traditional biological experiments in inferring the potential LncRNA–disease associations and isolated diseases.


2017 ◽  
Vol 2017 ◽  
pp. 1-5
Author(s):  
Tongda Zhang ◽  
Yiran Wu ◽  
Zhangzhang Lan ◽  
Quan Shi ◽  
Ying Yang ◽  
...  

Background. Synonymous mutation is the single nucleotide change that does not cause an amino acid change but can affect the rate and efficiency of translation. So recent increase in our knowledge has revealed a substantial contribution of synonymous mutations to human disease risk and other complex traits. Nevertheless, there are still rarely synonymous mutation prediction methods. Methods. Nonsynonymous and synonymous coding SNPs show similar likelihood and effect size of human disease association. Here we defined synonymous and missense variation as single nucleotide substitution variation. And then we evaluated the intolerance of genic transcripts to single nucleotide substitution variation based on gnomAD 123136 individuals. After regressing all variations on common variations, we defined residuals of regression model as every genomics region intolerance scores. Results. We constructed a total of 24799 nonoverlapped region-based intolerance score by their intolerance to single nucleotide substitution variation (Syntool). The results show that Syntool score can discriminate synonymous disease causing mutations in Human Gene Mutation Database (HGMD Professional) and ClinVar database much better than others. Taken together, this study provides a novel prediction system for synonymous mutations, called Syntool, which could be helpful in identifying candidate synonymous disease causing mutations.


Author(s):  
Xi Tang ◽  
Tao Zhang ◽  
Na Cheng ◽  
Huadong Wang ◽  
Chun-Hou Zheng ◽  
...  

F1000Research ◽  
2014 ◽  
Vol 3 ◽  
pp. 289 ◽  
Author(s):  
Konrad Hinsen

Computational models and methods take an ever more important place in modern scientific research. At the same time, they are becoming ever more complex, to the point that many such models and methods can no longer be adequately described in the narrative of a traditional journal article. Often they exist only as part of scientific software tools, which causes two important problems: (1) software tools are much more complex than the models and methods they embed, making the latter unnecessarily difficult to understand, (2) software tools depend on minute details of the computing environment they were written for, making them difficult to deploy and often completely unusable after a few years. This article addresses the second problem, based on the experience gained from the development and use of a platform specifically designed to facilitate the integration of computational methods into the scientific record.


2021 ◽  
Author(s):  
Camilo Matus-Olivares ◽  
Jaime Carrasco ◽  
José Luis Yela ◽  
Paula Meli ◽  
Andres Weintraub ◽  
...  

Abstract Aim Applying wide and effective sampling of animal communities is rarely possible due to the associated costs and the use of techniques that are not always efficient. Thus, many areas have a faunistic hidden diversity we denote Animal Dark Diversity (ADD), defined as the diversity that is present but not yet detected plus the diversity defined by Pärtel et al. (2011) that is not (yet) present despite the area’s favourable habitat conditions. We evaluated different species distribution model types (SDM techniques) on the basis of three requirements for ADD estimate reliability: 1) estimated spatial patterns of ADD do not differ significantly from other SDM techniques; 2) good predictive performances; and 3) low overfitting. Location Iberian Peninsula. Taxon Chiroptera and Noctuoidea (Lepidoptera) Methods We used distribution data for 25 species of bats and 352 species of moths. We evaluated eleven SDM techniques using biomod2 package implemented in the R software environment. We fitted the various SDM techniques to the data for each species and compared the resulting ADD estimates for the two animal groups under three threshold types. Results The results demonstrated that estimated ADD spatial patterns vary significantly between SDM techniques and depend on the threshold type. They also showed that SDM techniques with overfitting tend to generate smaller ADD sizes, thus reducing the possible species presence estimates. Among the SDMs studied, the ensemble models delivered ADD geographic patterns more like the other techniques while also presenting a high predictive performance for both faunal groups. However, the Ensemble Model Committee Average (ECA) performed much better on the sensitivity metric than all other techniques under any of the thresholds tested. In addition, ECA stood out clearly from the other ensemble model techniques in displaying low-medium overfitting. Main conclusions SDM techniques should no differ among each other in their ADD estimations, have good predictive performances and exhibit low overfitting. Furthermore, to reduce estimate uncertainty it is suggested that the threshold type be one that transforms high values of presences probabilities into binary information and furthermore that the SDM technique have a sensitivity bias, as otherwise the estimates will perform better for species absence in cases where it is not in fact known whether a species is truly absent.


Sign in / Sign up

Export Citation Format

Share Document