Comparison and integration of computational methods for deleterious synonymous mutation prediction

Na Cheng; Menglu Li; Le Zhao; Bo Zhang; Yuhua Yang; Chun-Hou Zheng; Junfeng Xia

doi:10.1093/bib/bbz047

Comparison and integration of computational methods for deleterious synonymous mutation prediction

Briefings in Bioinformatics ◽

10.1093/bib/bbz047 ◽

2019 ◽

Vol 21 (3) ◽

pp. 970-981 ◽

Cited By ~ 15

Author(s):

Na Cheng ◽

Menglu Li ◽

Le Zhao ◽

Bo Zhang ◽

Yuhua Yang ◽

...

Keyword(s):

Computational Methods ◽

Computational Models ◽

Predictive Performance ◽

Specific Method ◽

Ensemble Model ◽

Synonymous Mutation ◽

Synonymous Mutations ◽

Sequencing Technologies ◽

Nucleotide Mutation ◽

Mutation Prediction

Abstract Synonymous mutations do not change the encoded amino acids but may alter the structure or function of an mRNA in ways that impact gene function. Advances in next generation sequencing technologies have detected numerous synonymous mutations in the human genome. Several computational models have been proposed to predict deleterious synonymous mutations, which have greatly facilitated the development of this important field. Consequently, there is an urgent need to assess the state-of-the-art computational methods for deleterious synonymous mutation prediction to further advance the existing methodologies and to improve performance. In this regard, we systematically compared a total of 10 computational methods (including specific method for deleterious synonymous mutation and general method for single nucleotide mutation) in terms of the algorithms used, calculated features, performance evaluation and software usability. In addition, we constructed two carefully curated independent test datasets and accordingly assessed the robustness and scalability of these different computational methods for the identification of deleterious synonymous mutations. In an effort to improve predictive performance, we established an ensemble model, named Prediction of Deleterious Synonymous Mutation (PrDSM), which averages the ratings generated by the three most accurate predictors. Our benchmark tests demonstrated that the ensemble model PrDSM outperformed the reviewed tools for the prediction of deleterious synonymous mutations. Using the ensemble model, we developed an accessible online predictor, PrDSM, available at http://bioinfo.ahu.edu.cn:8080/PrDSM/. We hope that this comprehensive survey and the proposed strategy for building more accurate models can serve as a useful guide for inspiring future developments of computational methods for deleterious synonymous mutation prediction.

Download Full-text

Drug-pathway association prediction: from experimental results to computational models

Briefings in Bioinformatics ◽

10.1093/bib/bbaa061 ◽

2020 ◽

Author(s):

Chun-Chun Wang ◽

Yan Zhao ◽

Xing Chen

Keyword(s):

Drug Discovery ◽

Computational Methods ◽

Computational Models ◽

Matrix Decomposition ◽

Predictive Performance ◽

Single Target ◽

New Strategy ◽

Traditional Drug ◽

Long Time ◽

Effective Drugs

Abstract Effective drugs are urgently needed to overcome human complex diseases. However, the research and development of novel drug would take long time and cost much money. Traditional drug discovery follows the rule of one drug-one target, while some studies have demonstrated that drugs generally perform their task by affecting related pathway rather than targeting single target. Thus, the new strategy of drug discovery, namely pathway-based drug discovery, have been proposed. Obviously, identifying associations between drugs and pathways plays a key role in the development of pathway-based drug discovery. Revealing the drug-pathway associations by experiment methods would take much time and cost. Therefore, some computational models were established to predict potential drug-pathway associations. In this review, we first introduced the background of drug and the concept of drug-pathway associations. Then, some publicly accessible databases and web servers about drug-pathway associations were listed. Next, we summarized some state-of-the-art computational methods in the past years for inferring drug-pathway associations and divided these methods into three classes, namely Bayesian spare factor-based, matrix decomposition-based and other machine learning methods. In addition, we introduced several evaluation strategies to estimate the predictive performance of various computational models. In the end, we discussed the advantages and limitations of existing computational methods and provided some suggestions about the future directions of the data collection and the calculation models development.

Download Full-text

Machine Learning-Based Scoring Functions. Development and Applications with SAnDReS.

Current Medicinal Chemistry ◽

10.2174/0929867327666200515101820 ◽

2020 ◽

Vol 27 ◽

Author(s):

Gabriela Bitencourt-Ferreira ◽

Camila Rizzotto ◽

Walter Filgueira de Azevedo Junior

Keyword(s):

Machine Learning ◽

Binding Affinity ◽

Drug Targets ◽

Computational Models ◽

Factor Xa ◽

Coagulation Factor ◽

Predictive Performance ◽

Machine Learning Techniques ◽

Scoring Functions ◽

Molegro Virtual Docker

Background: Analysis of atomic coordinates of protein-ligand complexes can provide three-dimensional data to generate computational models to evaluate binding affinity and thermodynamic state functions. Application of machine learning techniques can create models to assess protein-ligand potential energy and binding affinity. These methods show superior predictive performance when compared with classical scoring functions available in docking programs. Objective: Our purpose here is to review the development and application of the program SAnDReS. We describe the creation of machine learning models to assess the binding affinity of protein-ligand complexes. Method: SAnDReS implements machine learning methods available in the scikit-learn library. This program is available for download at https://github.com/azevedolab/sandres. SAnDReS uses crystallographic structures, binding, and thermodynamic data to create targeted scoring functions. Results: Recent applications of the program SAnDReS to drug targets such as Coagulation factor Xa, cyclin-dependent kinases, and HIV-1 protease were able to create targeted scoring functions to predict inhibition of these proteins. These targeted models outperform classical scoring functions. Conclusion: Here, we reviewed the development of machine learning scoring functions to predict binding affinity through the application of the program SAnDReS. Our studies show the superior predictive performance of the SAnDReS-developed models when compared with classical scoring functions available in the programs such as AutoDock4, Molegro Virtual Docker, and AutoDock Vina.

Download Full-text

Towards Computational Models of Identifying Protein Ubiquitination Sites

Current Drug Targets ◽

10.2174/1389450119666180924150202 ◽

2019 ◽

Vol 20 (5) ◽

pp. 565-578 ◽

Cited By ~ 1

Author(s):

Lidong Wang ◽

Ruijun Zhang

Keyword(s):

Computational Methods ◽

Computational Models ◽

Feature Representation ◽

Biological Sequence ◽

Post Translational Modification ◽

Test Dataset ◽

Protein Ubiquitination ◽

Protein Functions ◽

Independent Test Dataset ◽

Benchmark Datasets

Ubiquitination is an important post-translational modification (PTM) process for the regulation of protein functions, which is associated with cancer, cardiovascular and other diseases. Recent initiatives have focused on the detection of potential ubiquitination sites with the aid of physicochemical test approaches in conjunction with the application of computational methods. The identification of ubiquitination sites using laboratory tests is especially susceptible to the temporality and reversibility of the ubiquitination processes, and is also costly and time-consuming. It has been demonstrated that computational methods are effective in extracting potential rules or inferences from biological sequence collections. Up to the present, the computational strategy has been one of the critical research approaches that have been applied for the identification of ubiquitination sites, and currently, there are numerous state-of-the-art computational methods that have been developed from machine learning and statistical analysis to undertake such work. In the present study, the construction of benchmark datasets is summarized, together with feature representation methods, feature selection approaches and the classifiers involved in several previous publications. In an attempt to explore pertinent development trends for the identification of ubiquitination sites, an independent test dataset was constructed and the predicting results obtained from five prediction tools are reported here, together with some related discussions.

Download Full-text

Quantitative ROESY analysis of computational models: structural studies of citalopram and β -cyclodextrin complexes by 1 H-NMR and computational methods

Magnetic Resonance in Chemistry ◽

10.1002/mrc.4250 ◽

2015 ◽

Vol 53 (7) ◽

pp. 526-535 ◽

Cited By ~ 10

Author(s):

Syed Mashhood Ali ◽

Shazia Shamim

Keyword(s):

Computational Methods ◽

Computational Models ◽

Structural Studies ◽

Cyclodextrin Complexes ◽

H Nmr ◽

Quantitative Roesy

Download Full-text

Computational Intelligence Approaches to Computational Aesthetics

Advanced Methodologies and Technologies in Artificial Intelligence, Computer Simulation, and Human-Computer Interaction - Advances in Computer and Electrical Engineering ◽

10.4018/978-1-5225-7368-5.ch007 ◽

2019 ◽

pp. 81-92

Author(s):

Erandi Lakshika ◽

Michael Barlow

Keyword(s):

Computational Methods ◽

Computational Intelligence ◽

Computational Models ◽

Automatic Generation ◽

Classification Problems ◽

Future Directions ◽

Simple Classification ◽

Computational Aesthetics

Computational aesthetics is an area of research that attempts to develop computational methods that can perform human-like aesthetic judgements. Aesthetic judgements are often subjective, and as such, the development of computational models of aesthetics is highly challenging. This chapter summarizes the advancements in the area of computational aesthetics and how computational intelligence techniques are applied in art and aesthetics ranging from simple classification problems to more advanced problems such as automatic generation of art artefacts, stories, and simulations. The chapter concludes by summarizing major challenges that need to be addressed, and future directions that need to be undertaken in order to make significant advancements in the area of computational aesthetics and its applications.

Download Full-text

LDAI-ISPS: LncRNA–Disease Associations Inference Based on Integrated Space Projection Scores

International Journal of Molecular Sciences ◽

10.3390/ijms21041508 ◽

2020 ◽

Vol 21 (4) ◽

pp. 1508

Author(s):

Yi Zhang ◽

Min Chen ◽

Ang Li ◽

Xiaohui Cheng ◽

Hong Jin ◽

...

Keyword(s):

Computational Methods ◽

Area Under The Curve ◽

Predictive Performance ◽

Weighted Networks ◽

Disease Associations ◽

Space Projection ◽

Auc Value ◽

Leave One Out ◽

Key Steps

Long non-coding RNAs (long ncRNAs, lncRNAs) of all kinds have been implicated in a range of cell developmental processes and diseases, while they are not translated into proteins. Inferring diseases associated lncRNAs by computational methods can be helpful to understand the pathogenesis of diseases, but those current computational methods still have not achieved remarkable predictive performance: such as the inaccurate construction of similarity networks and inadequate numbers of known lncRNA–disease associations. In this research, we proposed a lncRNA–disease associations inference based on integrated space projection scores (LDAI-ISPS) composed of the following key steps: changing the Boolean network of known lncRNA–disease associations into the weighted networks via combining all the global information (e.g., disease semantic similarities, lncRNA functional similarities, and known lncRNA–disease associations); obtaining the space projection scores via vector projections of the weighted networks to form the final prediction scores without biases. The leave-one-out cross validation (LOOCV) results showed that, compared with other methods, LDAI-ISPS had a higher accuracy with area-under-the-curve (AUC) value of 0.9154 for inferring diseases, with AUC value of 0.8865 for inferring new lncRNAs (whose associations related to diseases are unknown), with AUC value of 0.7518 for inferring isolated diseases (whose associations related to lncRNAs are unknown). A case study also confirmed the predictive performance of LDAI-ISPS as a helper for traditional biological experiments in inferring the potential LncRNA–disease associations and isolated diseases.

Download Full-text

Syntool: A Novel Region-Based Intolerance Score to Single Nucleotide Substitution for Synonymous Mutations Predictions Based on 123,136 Individuals

BioMed Research International ◽

10.1155/2017/5096208 ◽

2017 ◽

Vol 2017 ◽

pp. 1-5

Author(s):

Tongda Zhang ◽

Yiran Wu ◽

Zhangzhang Lan ◽

Quan Shi ◽

Ying Yang ◽

...

Keyword(s):

Human Disease ◽

Complex Traits ◽

Nucleotide Substitution ◽

Disease Risk ◽

Substantial Contribution ◽

Synonymous Mutation ◽

Single Nucleotide Substitution ◽

Single Nucleotide ◽

Synonymous Mutations ◽

Single Nucleotide Change

Background. Synonymous mutation is the single nucleotide change that does not cause an amino acid change but can affect the rate and efficiency of translation. So recent increase in our knowledge has revealed a substantial contribution of synonymous mutations to human disease risk and other complex traits. Nevertheless, there are still rarely synonymous mutation prediction methods. Methods. Nonsynonymous and synonymous coding SNPs show similar likelihood and effect size of human disease association. Here we defined synonymous and missense variation as single nucleotide substitution variation. And then we evaluated the intolerance of genic transcripts to single nucleotide substitution variation based on gnomAD 123136 individuals. After regressing all variations on common variations, we defined residuals of regression model as every genomics region intolerance scores. Results. We constructed a total of 24799 nonoverlapped region-based intolerance score by their intolerance to single nucleotide substitution variation (Syntool). The results show that Syntool score can discriminate synonymous disease causing mutations in Human Gene Mutation Database (HGMD Professional) and ClinVar database much better than others. Taken together, this study provides a novel prediction system for synonymous mutations, called Syntool, which could be helpful in identifying candidate synonymous disease causing mutations.

Download Full-text

Erratum: usDSM: a novel method for deleterious synonymous mutation prediction using undersampling scheme

Briefings in Bioinformatics ◽

10.1093/bib/bbab247 ◽

2021 ◽

Author(s):

Xi Tang ◽

Tao Zhang ◽

Na Cheng ◽

Huadong Wang ◽

Chun-Hou Zheng ◽

...

Keyword(s):

Synonymous Mutation ◽

Novel Method ◽

Mutation Prediction

Download Full-text

Platforms for publishing and archiving computer-aided research

F1000Research ◽

10.12688/f1000research.5773.1 ◽

2014 ◽

Vol 3 ◽

pp. 289 ◽

Cited By ~ 4

Author(s):

Konrad Hinsen

Keyword(s):

Computational Methods ◽

Computational Models ◽

Journal Article ◽

Scientific Research ◽

Software Tools ◽

Computing Environment ◽

Scientific Software ◽

Important Place ◽

Computer Aided

Computational models and methods take an ever more important place in modern scientific research. At the same time, they are becoming ever more complex, to the point that many such models and methods can no longer be adequately described in the narrative of a traditional journal article. Often they exist only as part of scientific software tools, which causes two important problems: (1) software tools are much more complex than the models and methods they embed, making the latter unnecessarily difficult to understand, (2) software tools depend on minute details of the computing environment they were written for, making them difficult to deploy and often completely unusable after a few years. This article addresses the second problem, based on the experience gained from the development and use of a platform specifically designed to facilitate the integration of computational methods into the scientific record.

Download Full-text

Evaluation of species distribution models for estimating animal dark diversity

10.21203/rs.3.rs-1100842/v1 ◽

2021 ◽

Author(s):

Camilo Matus-Olivares ◽

Jaime Carrasco ◽

José Luis Yela ◽

Paula Meli ◽

Andres Weintraub ◽

...

Keyword(s):

Spatial Patterns ◽

Species Distribution ◽

Species Distribution Model ◽

Predictive Performance ◽

The Other ◽

Distribution Model ◽

Distribution Data ◽

Ensemble Model ◽

Distribution Models ◽

Software Environment

Abstract Aim Applying wide and effective sampling of animal communities is rarely possible due to the associated costs and the use of techniques that are not always efficient. Thus, many areas have a faunistic hidden diversity we denote Animal Dark Diversity (ADD), defined as the diversity that is present but not yet detected plus the diversity defined by Pärtel et al. (2011) that is not (yet) present despite the area’s favourable habitat conditions. We evaluated different species distribution model types (SDM techniques) on the basis of three requirements for ADD estimate reliability: 1) estimated spatial patterns of ADD do not differ significantly from other SDM techniques; 2) good predictive performances; and 3) low overfitting. Location Iberian Peninsula. Taxon Chiroptera and Noctuoidea (Lepidoptera) Methods We used distribution data for 25 species of bats and 352 species of moths. We evaluated eleven SDM techniques using biomod2 package implemented in the R software environment. We fitted the various SDM techniques to the data for each species and compared the resulting ADD estimates for the two animal groups under three threshold types. Results The results demonstrated that estimated ADD spatial patterns vary significantly between SDM techniques and depend on the threshold type. They also showed that SDM techniques with overfitting tend to generate smaller ADD sizes, thus reducing the possible species presence estimates. Among the SDMs studied, the ensemble models delivered ADD geographic patterns more like the other techniques while also presenting a high predictive performance for both faunal groups. However, the Ensemble Model Committee Average (ECA) performed much better on the sensitivity metric than all other techniques under any of the thresholds tested. In addition, ECA stood out clearly from the other ensemble model techniques in displaying low-medium overfitting. Main conclusions SDM techniques should no differ among each other in their ADD estimations, have good predictive performances and exhibit low overfitting. Furthermore, to reduce estimate uncertainty it is suggested that the threshold type be one that transforms high values of presences probabilities into binary information and furthermore that the SDM technique have a sensitivity bias, as otherwise the estimates will perform better for species absence in cases where it is not in fact known whether a species is truly absent.

Download Full-text