Drug-Target Interaction Prediction Based on Drug Fingerprint Information and Protein Sequence

Yang Li; Yu-An Huang; Zhu-Hong You; Li-Ping Li; Zheng Wang

doi:10.3390/molecules24162999

Drug-Target Interaction Prediction Based on Drug Fingerprint Information and Protein Sequence

Molecules ◽

10.3390/molecules24162999 ◽

2019 ◽

Vol 24 (16) ◽

pp. 2999 ◽

Cited By ~ 4

Author(s):

Yang Li ◽

Yu-An Huang ◽

Zhu-Hong You ◽

Li-Ping Li ◽

Zheng Wang

Keyword(s):

Drug Target ◽

Large Scale ◽

Protein Sequences ◽

Performance Comparison ◽

Target Pair ◽

Evolutionary Information ◽

Support Vector ◽

Sequence Information ◽

Rotation Forest ◽

Comparison Results

The identification of drug-target interactions (DTIs) is a critical step in drug development. Experimental methods that are based on clinical trials to discover DTIs are time-consuming, expensive, and challenging. Therefore, as complementary to it, developing new computational methods for predicting novel DTI is of great significance with regards to saving cost and shortening the development period. In this paper, we present a novel computational model for predicting DTIs, which uses the sequence information of proteins and a rotation forest classifier. Specifically, all of the target protein sequences are first converted to a position-specific scoring matrix (PSSM) to retain evolutionary information. We then use local phase quantization (LPQ) descriptors to extract evolutionary information in the PSSM. On the other hand, substructure fingerprint information is utilized to extract the features of the drug. We finally combine the features of drugs and protein together to represent features of each drug-target pair and use a rotation forest classifier to calculate the scores of interaction possibility, for a global DTI prediction. The experimental results indicate that the proposed model is effective, achieving average accuracies of 89.15%, 86.01%, 82.20%, and 71.67% on four datasets (i.e., enzyme, ion channel, G protein-coupled receptors (GPCR), and nuclear receptor), respectively. In addition, we compared the prediction performance of the rotation forest classifier with another popular classifier, support vector machine, on the same dataset. Several types of methods previously proposed are also implemented on the same datasets for performance comparison. The comparison results demonstrate the superiority of the proposed method to the others. We anticipate that the proposed method can be used as an effective tool for predicting drug-target interactions on a large scale, given the information of protein sequences and drug fingerprints.

Download Full-text

Ensemble Learning Prediction of Drug-Target Interactions Using GIST Descriptor Extracted from PSSM-Based Evolutionary Information

BioMed Research International ◽

10.1155/2020/4516250 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10

Author(s):

Xinke Zhan ◽

Zhuhong You ◽

Changqing Yu ◽

Liping Li ◽

Jie Pan

Keyword(s):

Drug Target ◽

Large Scale ◽

Target Pair ◽

Evolutionary Information ◽

Support Vector ◽

Svm Classifier ◽

New Drug ◽

Golden Standard ◽

Scoring Matrix ◽

G Protein Coupled

Identifying the drug-target interactions (DTIs) plays an essential role in new drug development. However, there still has the limited knowledge of DTIs and a significant number of unknown DTI pairs. Moreover, the traditional experimental methods have inevitable disadvantages such as high cost and time-consuming. Therefore, developing computational methods for predicting DTIs is attracting more and more attention. In this study, we report a novel computational approach for predicting DTI using GIST feature, position-specific scoring matrix (PSSM), and rotation forest (RF). Specifically, each target protein is first converted into a PSSM for retaining evolutionary information. Then, the GIST feature is extracted from PSSM and substructure fingerprint information is adopted to extract the feature of the drug. Finally, combining each protein and drug features to form a new drug-target pair, which is employed as input feature for RF classifier. In the experiment, the proposed method achieves high average accuracies of 89.25%, 85.93%, 82.36%, and 73.89% on enzyme, ion channel, G protein-coupled receptors (GPCRs), and nuclear receptor, respectively. For further evaluating the prediction performance of the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the same golden standard dataset. These promising results illustrate that the proposed method is more effective and stable than other methods. We expect the proposed method to be a useful tool for predicting large-scale DTIs.

Download Full-text

A Sparse Feature Extraction Method with Elastic Net for Drug-Target Interaction Identification

Scientific Programming ◽

10.1155/2021/6686409 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Zheng-Yang Zhao ◽

Wen-Zhun Huang ◽

Jie Pan ◽

Yu-An Huang ◽

Shan-Wen Zhang ◽

...

Keyword(s):

Feature Extraction ◽

Extraction Method ◽

Drug Target ◽

Elastic Net ◽

Support Vector ◽

Svm Classifier ◽

Rotation Forest ◽

Feature Extraction Method ◽

Proposed Model ◽

Comparison Results

The identification of drug-target interactions (DTIs) plays a crucial role in drug discovery. However, the traditional high-throughput techniques based on clinical trials are costly, cumbersome, and time-consuming for identifying DTIs. Hence, new intelligent computational methods are urgently needed to surmount these defects in predicting DTIs. In this paper, we propose a novel computational method that combines position-specific scoring matrix (PSSM), elastic net based sparse features extraction, and rotation forest (RF) classifier. Specifically, we converted each protein primary sequence into PSSM, which contains biological evolutionary information. Then we extract the hidden sparse feature descriptors in PSSM by elastic net based sparse feature extraction method (ESFE). After that, we fuse them with the features of drug, which are represented by molecular fingerprints. Finally, rotation forest classifier works on detecting the potential drug-target interactions. When performing the proposed method by the experiments of fivefold cross validation (CV) on enzyme, ion channel, G protein-coupled receptors (GPCRs), and nuclear receptor datasets, this method achieves average accuracies of 90.32%, 88.91%, 80.65%, and 79.73%, respectively. We also compared the proposed model with the state-of-the-art support vector machine (SVM) classifier and other effective methods on the same datasets. The comparison results distinctly indicate that the proposed model possesses the efficient and robust ability to predict DTIs. We expect that the new model will be able to take effects on predicting massive DTIs.

Download Full-text

Prediction of Drug–Target Interactions by Combining Dual-Tree Complex Wavelet Transform with Ensemble Learning Method

Molecules ◽

10.3390/molecules26175359 ◽

2021 ◽

Vol 26 (17) ◽

pp. 5359

Author(s):

Jie Pan ◽

Li-Ping Li ◽

Zhu-Hong You ◽

Chang-Qing Yu ◽

Zhong-Hao Ren ◽

...

Keyword(s):

Wavelet Transform ◽

Drug Discovery ◽

Protein Sequence ◽

Drug Target ◽

New Drugs ◽

Evolutionary Information ◽

Support Vector ◽

Sequence Information ◽

Complex Wavelet Transform ◽

Complex Wavelet

Identification of drug–target interactions (DTIs) is vital for drug discovery. However, traditional biological approaches have some unavoidable shortcomings, such as being time consuming and expensive. Therefore, there is an urgent need to develop novel and effective computational methods to predict DTIs in order to shorten the development cycles of new drugs. In this study, we present a novel computational approach to identify DTIs, which uses protein sequence information and the dual-tree complex wavelet transform (DTCWT). More specifically, a position-specific scoring matrix (PSSM) was performed on the target protein sequence to obtain its evolutionary information. Then, DTCWT was used to extract representative features from the PSSM, which were then combined with the drug fingerprint features to form the feature descriptors. Finally, these descriptors were sent to the Rotation Forest (RoF) model for classification. A 5-fold cross validation (CV) was adopted on four datasets (Enzyme, Ion Channel, GPCRs (G-protein-coupled receptors), and NRs (Nuclear Receptors)) to validate the proposed model; our method yielded high average accuracies of 89.21%, 85.49%, 81.02%, and 74.44%, respectively. To further verify the performance of our model, we compared the RoF classifier with two state-of-the-art algorithms: the support vector machine (SVM) and the k-nearest neighbor (KNN) classifier. We also compared it with some other published methods. Moreover, the prediction results for the independent dataset further indicated that our method is effective for predicting potential DTIs. Thus, we believe that our method is suitable for facilitating drug discovery and development.

Download Full-text

An efficient computational method for predicting drug-target interactions using weighted extreme learning machine and speed up robot features

BioData Mining ◽

10.1186/s13040-021-00242-1 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Ji-Yong An ◽

Fan-Rong Meng ◽

Zi-Ji Yan

Keyword(s):

Ion Channel ◽

Extreme Learning Machine ◽

Nuclear Receptor ◽

Drug Target ◽

Computational Method ◽

Evolutionary Information ◽

Support Vector ◽

Weighted Extreme Learning Machine ◽

Speed Up ◽

Learning Machine

Abstract Background Prediction of novel Drug–Target interactions (DTIs) plays an important role in discovering new drug candidates and finding new proteins to target. In consideration of the time-consuming and expensive of experimental methods. Therefore, it is a challenging task that how to develop efficient computational approaches for the accurate predicting potential associations between drug and target. Results In the paper, we proposed a novel computational method called WELM-SURF based on drug fingerprints and protein evolutionary information for identifying DTIs. More specifically, for exploiting protein sequence feature, Position Specific Scoring Matrix (PSSM) is applied to capturing protein evolutionary information and Speed up robot features (SURF) is employed to extract sequence key feature from PSSM. For drug fingerprints, the chemical structure of molecular substructure fingerprints was used to represent drug as feature vector. Take account of the advantage that the Weighted Extreme Learning Machine (WELM) has short training time, good generalization ability, and most importantly ability to efficiently execute classification by optimizing the loss function of weight matrix. Therefore, the WELM classifier is used to carry out classification based on extracted features for predicting DTIs. The performance of the WELM-SURF model was evaluated by experimental validations on enzyme, ion channel, GPCRs and nuclear receptor datasets by using fivefold cross-validation test. The WELM-SURF obtained average accuracies of 93.54, 90.58, 85.43 and 77.45% on enzyme, ion channels, GPCRs and nuclear receptor dataset respectively. We also compared our performance with the Extreme Learning Machine (ELM), the state-of-the-art Support Vector Machine (SVM) on enzyme and ion channels dataset and other exiting methods on four datasets. By comparing with experimental results, the performance of WELM-SURF is significantly better than that of ELM, SVM and other previous methods in the domain. Conclusion The results demonstrated that the proposed WELM-SURF model is competent for predicting DTIs with high accuracy and robustness. It is anticipated that the WELM-SURF method is a useful computational tool to facilitate widely bioinformatics studies related to DTIs prediction.

Download Full-text

Robust and accurate prediction of protein–protein interactions by exploiting evolutionary information

Scientific Reports ◽

10.1038/s41598-021-96265-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Yang Li ◽

Zheng Wang ◽

Li-Ping Li ◽

Zhu-Hong You ◽

Wen-Zhun Huang ◽

...

Keyword(s):

Protein Interactions ◽

Protein Sequence ◽

Large Scale ◽

False Positive Rate ◽

Computational Method ◽

Evolutionary Information ◽

Local Alignment ◽

Protein Interaction Data ◽

Sequence Information ◽

Protein Protein Interactions

AbstractVarious biochemical functions of organisms are performed by protein–protein interactions (PPIs). Therefore, recognition of protein–protein interactions is very important for understanding most life activities, such as DNA replication and transcription, protein synthesis and secretion, signal transduction and metabolism. Although high-throughput technology makes it possible to generate large-scale PPIs data, it requires expensive cost of both time and labor, and leave a risk of high false positive rate. In order to formulate a more ingenious solution, biology community is looking for computational methods to quickly and efficiently discover massive protein interaction data. In this paper, we propose a computational method for predicting PPIs based on a fresh idea of combining orthogonal locality preserving projections (OLPP) and rotation forest (RoF) models, using protein sequence information. Specifically, the protein sequence is first converted into position-specific scoring matrices (PSSMs) containing protein evolutionary information by using the Position-Specific Iterated Basic Local Alignment Search Tool (PSI-BLAST). Then we characterize a protein as a fixed length feature vector by applying OLPP to PSSMs. Finally, we train an RoF classifier for the purpose of identifying non-interacting and interacting protein pairs. The proposed method yielded a significantly better results than existing methods, with 90.07% and 96.09% prediction accuracy on Yeast and Human datasets. Our experiment show the proposed method can serve as a useful tool to accelerate the process of solving key problems in proteomics.

Download Full-text

Large-scale clinical interpretation of genetic variants using evolutionary data and deep learning

10.1101/2020.12.21.423785 ◽

2020 ◽

Author(s):

Jonathan Frazer ◽

Pascal Notin ◽

Mafalda Dias ◽

Aidan Gomez ◽

Kelly Brock ◽

...

Keyword(s):

Genetic Variants ◽

Large Scale ◽

Protein Sequences ◽

Evolutionary Model ◽

Generative Models ◽

Evolutionary Information ◽

Disease Genes ◽

Independent Evidence ◽

Variants Of Unknown Significance ◽

Protein Variants

AbstractQuantifying the pathogenicity of protein variants in human disease-related genes would have a profound impact on clinical decisions, yet the overwhelming majority (over 98%) of these variants still have unknown consequences1–3. In principle, computational methods could support the large-scale interpretation of genetic variants. However, prior methods4–7 have relied on training machine learning models on available clinical labels. Since these labels are sparse, biased, and of variable quality, the resulting models have been considered insufficiently reliable8. By contrast, our approach leverages deep generative models to predict the clinical significance of protein variants without relying on labels. The natural distribution of protein sequences we observe across organisms is the result of billions of evolutionary experiments9,10. By modeling that distribution, we implicitly capture constraints on the protein sequences that maintain fitness. Our model EVE (Evolutionary model of Variant Effect) not only outperforms computational approaches that rely on labelled data, but also performs on par, if not better than, high-throughput assays which are increasingly used as strong evidence for variant classification11–23. After thorough validation on clinical labels, we predict the pathogenicity of 11 million variants across 1,081 disease genes, and assign high-confidence reclassification for 72k Variants of Unknown Significance8. Our work suggests that models of evolutionary information can provide a strong source of independent evidence for variant interpretation and that the approach will be widely useful in research and clinical settings.

Download Full-text

iTrade

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2015010104 ◽

2015 ◽

Vol 11 (1) ◽

pp. 66-83 ◽

Cited By ~ 4

Author(s):

Yong Hu ◽

Xiangzhou Zhang ◽

Bin Feng ◽

Kang Xie ◽

Mei Liu

Keyword(s):

Stock Market ◽

Large Scale ◽

Concept Drift ◽

Composite Index ◽

Performance Comparison ◽

Mobile App ◽

Data Driven ◽

Support Vector ◽

Individual Investors ◽

Chinese Stock Market

Among all investors in the Chinese stock market, more than 95% are non-professional individual investors. These individual investors are in great need of mobile apps that can provide professional and handy trading analysis and decision support everywhere. However, financial data is challenging to analyze because of its large-scale, non-linear and noisy characteristics in a varying stock environment. This paper develops a Mobile Data-Driven Stock Trading System (iTrade), which is a mobile app system based on Client-Server architecture and various data mining techniques. The iTrade is characterized by 1) a data-driven intelligent learning model, which can provide further insight compared to empirical technical analysis, 2) a concept drift adaptation process, which facilitates the model adaptation to market structure changes, and 3) a rigorous benchmark analysis, including the Buy-and-Hold strategy and the strategies of three world-famous master investors (e.g., Warren E. Buffett). Technologies used in iTrade include the Least Absolute Shrinkage and Selection Operator (Lasso) algorithm, Support Vector Machine (SVM) and risk-adjusted portfolio optimization. An application case of iTrade is presented, which is based on a seven-year (2005-2011) back-testing. Evaluation results indicated that iTrade could gain much higher cumulative return compared to the benchmark (Shanghai Composite Index). To the best of our knowledge, this is the first study and mobile app system that emphasizes and investigates the concept drift phenomenon in stock market, as well as the performance comparison between data-driven intelligent model and strategies of master investors.

Download Full-text

Prediction of kinase–substrate relations based on heterogeneous networks

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720015420032 ◽

2015 ◽

Vol 13 (06) ◽

pp. 1542003 ◽

Cited By ~ 7

Author(s):

Haichun Li ◽

Minghui Wang ◽

Xiaoyi Xu

Keyword(s):

Heterogeneous Networks ◽

Protein Sequence ◽

Comprehensive Evaluation ◽

Protein Sequences ◽

Machine Learning Algorithms ◽

Sequence Information ◽

Kinase Substrate ◽

Heterogeneous Information Networks ◽

Comparison Results ◽

Relevance Measure

Protein phosphorylation catalyzed by kinases plays essential roles in various intracellular processes. With an increasing number of phosphorylation sites verified experimentally by high-throughput technologies and assigned as substrates of specific kinases, prediction of potential kinase–substrate relations (KSRs) attracts increasing attention. Although a large number of computational methods have been designed, most of them only focus on local protein sequence information. A few KSR prediction approaches integrate protein–protein interaction and protein sequence information into existing machine learning algorithms at the cost of high feature dimensions or reduced sensitivity. In this work, we introduce two novel heterogeneous networks, HetNet-PPI and HetNet-SEQ, by incorporating PPI and similarity of protein sequences into the kinase–substrate heterogeneous networks, respectively. Based on these two heterogeneous networks, we further propose two new KSR prediction methods, HeteSim-PPI and HeteSim-SEQ, by adopting the HeteSim algorithm, which is recently proposed for relevance measure in heterogeneous information networks. Comprehensive evaluation results of the two methods show that similarity of protein sequences is more effective in improving KSR prediction performance as HeteSim-SEQ outperforms HeteSim-PPI in most cases. Further comparison results demonstrate that HeteSim-SEQ is superior to existing methods including BDT, SVM and iGPS, suggesting the effectiveness of the proposed network-based method in predicting potential KSRs.

Download Full-text

Prediction of microRNA-binding residues in protein using a Laplacian support vector machine based on sequence information

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720018400097 ◽

2018 ◽

Vol 16 (03) ◽

pp. 1840009 ◽

Cited By ~ 2

Author(s):

Xin Ma ◽

Jing Guo ◽

Xiao Sun

Keyword(s):

Support Vector Machine ◽

Evolutionary Information ◽

Support Vector ◽

Sequence Information ◽

Protein Residues ◽

Research Areas ◽

Binding Residue ◽

Structure Conservation ◽

Binding Residues ◽

Binding Residue Prediction

The identification of microRNA (miRNA)-binding protein residues significantly impacts several research areas, including gene regulation and expression. We propose a method, PmiRBR, which combines a novel hybrid feature with the Laplacian support vector machine (LapSVM) algorithm to predict miRNA-binding residues in protein sequences. The hybrid feature is constituted by secondary structure, conservation scores, and a novel feature, which includes evolutionary information combined with the physicochemical properties of amino acids. Performance comparisons of the various features indicate that our novel feature contributes the most to prediction improvement. Our results demonstrate that PmiRBR can achieve 85.96% overall accuracy, with 43.89% sensitivity and 90.56% specificity. PmiRBR significantly outperforms other approaches at miRNA-binding residue prediction.

Download Full-text

DTIRF: Predicting Drug-Target Interactions Based on Improved Rotation Forest from Drug Molecular Structure and Protein Sequence

10.21203/rs.2.15799/v2 ◽

2019 ◽

Author(s):

lei wang ◽

Zhu-Hong You ◽

Li-Ping Li ◽

Xin Yan ◽

Wei Zhang ◽

...

Keyword(s):

In Silico ◽

Drug Target ◽

Cross Validation ◽

Position Specific Score Matrix ◽

Support Vector ◽

Data Sets ◽

Rotation Forest ◽

Cost Constraints ◽

Score Matrix ◽

Specific Score

Abstract Background: The identification and prediction of Drug-Target Interaction (DTI) is the basis for screening drug candidates, which plays a vital role in the development of innovative drugs. However, due to the time-consuming and high cost constraints of biological experimental methods, traditional drug target identification technologies are often difficult to develop on a large scale. Therefore, in silico methods are urgently needed to predict drug-target interactions in a genome-wide manner. Results: In this article, we design a new in silico approach, named DTIRF, to predict the DTI combine feature weighted Rotation Forest (FwRF) classifier with protein amino acids information. More specifically, we first use Position-Specific Score Matrix (PSSM) to numerically convert protein sequences and utilize Pseudo Position-Specific Score Matrix (PsePSSM) to extract their features. Then a unified digital descriptor is formed by combining molecular fingerprints representing drug information. Finally, the feature weighted rotation forest is applied to implement on Enzyme, Ion Channel, GPCR, and Nuclear Receptor data sets. The results of the five-fold cross-validation experiment show that the prediction accuracy of this approach reaches 91.68%, 88.11%, 84.72% and 78.33% on four benchmark data sets, respectively. To further validate the performance of the DTIRF, we compare it with other excellent methods and Support Vector Machine (SVM) model. In addition, 7 of the 10 highest predictive scores in predicting novel DTIs were validated by relevant databases. Conclusions: The experimental results of cross-validation indicated that DTIRF is feasible in predicting the relationship among drugs and target, and can provide help for the discovery of new candidate drugs.

Download Full-text