Ensemble Learning Prediction of Drug-Target Interactions Using GIST Descriptor Extracted from PSSM-Based Evolutionary Information

BioMed Research International ◽

10.1155/2020/4516250 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10

Author(s):

Xinke Zhan ◽

Zhuhong You ◽

Changqing Yu ◽

Liping Li ◽

Jie Pan

Keyword(s):

Drug Target ◽

Large Scale ◽

Target Pair ◽

Evolutionary Information ◽

Support Vector ◽

Svm Classifier ◽

New Drug ◽

Golden Standard ◽

Scoring Matrix ◽

G Protein Coupled

Identifying the drug-target interactions (DTIs) plays an essential role in new drug development. However, there still has the limited knowledge of DTIs and a significant number of unknown DTI pairs. Moreover, the traditional experimental methods have inevitable disadvantages such as high cost and time-consuming. Therefore, developing computational methods for predicting DTIs is attracting more and more attention. In this study, we report a novel computational approach for predicting DTI using GIST feature, position-specific scoring matrix (PSSM), and rotation forest (RF). Specifically, each target protein is first converted into a PSSM for retaining evolutionary information. Then, the GIST feature is extracted from PSSM and substructure fingerprint information is adopted to extract the feature of the drug. Finally, combining each protein and drug features to form a new drug-target pair, which is employed as input feature for RF classifier. In the experiment, the proposed method achieves high average accuracies of 89.25%, 85.93%, 82.36%, and 73.89% on enzyme, ion channel, G protein-coupled receptors (GPCRs), and nuclear receptor, respectively. For further evaluating the prediction performance of the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the same golden standard dataset. These promising results illustrate that the proposed method is more effective and stable than other methods. We expect the proposed method to be a useful tool for predicting large-scale DTIs.

Download Full-text

Drug-Target Interaction Prediction Based on Drug Fingerprint Information and Protein Sequence

Molecules ◽

10.3390/molecules24162999 ◽

2019 ◽

Vol 24 (16) ◽

pp. 2999 ◽

Cited By ~ 4

Author(s):

Yang Li ◽

Yu-An Huang ◽

Zhu-Hong You ◽

Li-Ping Li ◽

Zheng Wang

Keyword(s):

Drug Target ◽

Large Scale ◽

Protein Sequences ◽

Performance Comparison ◽

Target Pair ◽

Evolutionary Information ◽

Support Vector ◽

Sequence Information ◽

Rotation Forest ◽

Comparison Results

The identification of drug-target interactions (DTIs) is a critical step in drug development. Experimental methods that are based on clinical trials to discover DTIs are time-consuming, expensive, and challenging. Therefore, as complementary to it, developing new computational methods for predicting novel DTI is of great significance with regards to saving cost and shortening the development period. In this paper, we present a novel computational model for predicting DTIs, which uses the sequence information of proteins and a rotation forest classifier. Specifically, all of the target protein sequences are first converted to a position-specific scoring matrix (PSSM) to retain evolutionary information. We then use local phase quantization (LPQ) descriptors to extract evolutionary information in the PSSM. On the other hand, substructure fingerprint information is utilized to extract the features of the drug. We finally combine the features of drugs and protein together to represent features of each drug-target pair and use a rotation forest classifier to calculate the scores of interaction possibility, for a global DTI prediction. The experimental results indicate that the proposed model is effective, achieving average accuracies of 89.15%, 86.01%, 82.20%, and 71.67% on four datasets (i.e., enzyme, ion channel, G protein-coupled receptors (GPCR), and nuclear receptor), respectively. In addition, we compared the prediction performance of the rotation forest classifier with another popular classifier, support vector machine, on the same dataset. Several types of methods previously proposed are also implemented on the same datasets for performance comparison. The comparison results demonstrate the superiority of the proposed method to the others. We anticipate that the proposed method can be used as an effective tool for predicting drug-target interactions on a large scale, given the information of protein sequences and drug fingerprints.

Download Full-text

An Ensemble Learning-Based Method for Inferring Drug-Target Interactions Combining Protein Sequences and Drug Fingerprints

BioMed Research International ◽

10.1155/2021/9933873 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Zheng-Yang Zhao ◽

Wen-Zhun Huang ◽

Xin-Ke Zhan ◽

Jie Pan ◽

Yu-An Huang ◽

...

Keyword(s):

Drug Target ◽

Large Scale ◽

Biological Evolution ◽

Gradient Boosting ◽

Support Vector ◽

Data Sets ◽

Standard Data ◽

Light Gradient ◽

Golden Standard ◽

Drug Reposition

Identifying the interactions of the drug-target is central to the cognate areas including drug discovery and drug reposition. Although the high-throughput biotechnologies have made tremendous progress, the indispensable clinical trials remain to be expensive, laborious, and intricate. Therefore, a convenient and reliable computer-aided method has become the focus on inferring drug-target interactions (DTIs). In this research, we propose a novel computational model integrating a pyramid histogram of oriented gradients (PHOG), Position-Specific Scoring Matrix (PSSM), and rotation forest (RF) classifier for identifying DTIs. Specifically, protein primary sequences are first converted into PSSMs to describe the potential biological evolution information. After that, PHOG is employed to mine the highly representative features of PSSM from multiple pyramid levels, and the complete describers of drug-target pairs are generated by combining the molecular substructure fingerprints and PHOG features. Finally, we feed the complete describers into the RF classifier for effective prediction. The experiments of 5-fold Cross-Validations (CV) yield mean accuracies of 88.96%, 86.37%, 82.88%, and 76.92% on four golden standard data sets (enzyme, ion channel, G protein-coupled receptors (GPCRs), and nuclear receptor, respectively). Moreover, the paper also conducts the state-of-art light gradient boosting machine (LGBM) and support vector machine (SVM) to further verify the performance of the proposed model. The experimental outcomes substantiate that the established model is feasible and reliable to predict DTIs. There is an excellent prospect that our model is capable of predicting DTIs as an efficient tool on a large scale.

Download Full-text

An efficient computational method for predicting drug-target interactions using weighted extreme learning machine and speed up robot features

BioData Mining ◽

10.1186/s13040-021-00242-1 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Ji-Yong An ◽

Fan-Rong Meng ◽

Zi-Ji Yan

Keyword(s):

Ion Channel ◽

Extreme Learning Machine ◽

Nuclear Receptor ◽

Drug Target ◽

Computational Method ◽

Evolutionary Information ◽

Support Vector ◽

Weighted Extreme Learning Machine ◽

Speed Up ◽

Learning Machine

Abstract Background Prediction of novel Drug–Target interactions (DTIs) plays an important role in discovering new drug candidates and finding new proteins to target. In consideration of the time-consuming and expensive of experimental methods. Therefore, it is a challenging task that how to develop efficient computational approaches for the accurate predicting potential associations between drug and target. Results In the paper, we proposed a novel computational method called WELM-SURF based on drug fingerprints and protein evolutionary information for identifying DTIs. More specifically, for exploiting protein sequence feature, Position Specific Scoring Matrix (PSSM) is applied to capturing protein evolutionary information and Speed up robot features (SURF) is employed to extract sequence key feature from PSSM. For drug fingerprints, the chemical structure of molecular substructure fingerprints was used to represent drug as feature vector. Take account of the advantage that the Weighted Extreme Learning Machine (WELM) has short training time, good generalization ability, and most importantly ability to efficiently execute classification by optimizing the loss function of weight matrix. Therefore, the WELM classifier is used to carry out classification based on extracted features for predicting DTIs. The performance of the WELM-SURF model was evaluated by experimental validations on enzyme, ion channel, GPCRs and nuclear receptor datasets by using fivefold cross-validation test. The WELM-SURF obtained average accuracies of 93.54, 90.58, 85.43 and 77.45% on enzyme, ion channels, GPCRs and nuclear receptor dataset respectively. We also compared our performance with the Extreme Learning Machine (ELM), the state-of-the-art Support Vector Machine (SVM) on enzyme and ion channels dataset and other exiting methods on four datasets. By comparing with experimental results, the performance of WELM-SURF is significantly better than that of ELM, SVM and other previous methods in the domain. Conclusion The results demonstrated that the proposed WELM-SURF model is competent for predicting DTIs with high accuracy and robustness. It is anticipated that the WELM-SURF method is a useful computational tool to facilitate widely bioinformatics studies related to DTIs prediction.

Download Full-text

A Network Integration Approach for Drug-Target Interaction Prediction and Computational Drug Repositioning from Heterogeneous Information

10.1101/100305 ◽

2017 ◽

Cited By ~ 4

Author(s):

Yunan Luo ◽

Xinbin Zhao ◽

Jingtian Zhou ◽

Jinglin Yang ◽

Yanqing Zhang ◽

...

Keyword(s):

Heterogeneous Network ◽

Drug Target ◽

Large Scale ◽

Molecular Mechanisms ◽

Inflammatory Diseases ◽

Drug Repositioning ◽

Heterogeneous Data ◽

Heterogeneous Information ◽

Cox Inhibitors ◽

New Drug

AbstractThe emergence of large-scale genomic, chemical and pharmacological data provides new opportunities for drug discovery and repositioning. Systematic integration of these heterogeneous data not only serves as a promising tool for identifying new drug-target interactions (DTIs), which is an important step in drug development, but also provides a more complete understanding of the molecular mechanisms of drug action. In this work, we integrate diverse drug-related information, including drugs, proteins, diseases and side-effects, together with their interactions, associations or similarities, to construct a heterogeneous network with 12,015 nodes and 1,895,445 edges. We then develop a new computational pipeline, called DTINet, to predict novel drug-target interactions from the constructed heterogeneous network. Specifically, DTINet focuses on learning a low-dimensional vector representation of features for each node, which accurately explains the topological properties of individual nodes in the heterogeneous network, and then predicts the likelihood of a new DTI based on these representations via a vector space projection scheme. DTINet achieves substantial performance improvement over other state-of-the-art methods for DTI prediction. Moreover, we have experimentally validated the novel interactions between three drugs and the cyclooxygenase (COX) protein family predicted by DTINet, and demonstrated the new potential applications of these identified COX inhibitors in preventing inflammatory diseases. These results indicate that DTINet can provide a practically useful tool for integrating heterogeneous information to predict new drug-target interactions and repurpose existing drugs. The source code of DTINet and the input heterogeneous network data can be downloaded from http://github.com/luoyunan/DTINet.

Download Full-text

A Sparse Feature Extraction Method with Elastic Net for Drug-Target Interaction Identification

Scientific Programming ◽

10.1155/2021/6686409 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Zheng-Yang Zhao ◽

Wen-Zhun Huang ◽

Jie Pan ◽

Yu-An Huang ◽

Shan-Wen Zhang ◽

...

Keyword(s):

Feature Extraction ◽

Extraction Method ◽

Drug Target ◽

Elastic Net ◽

Support Vector ◽

Svm Classifier ◽

Rotation Forest ◽

Feature Extraction Method ◽

Proposed Model ◽

Comparison Results

The identification of drug-target interactions (DTIs) plays a crucial role in drug discovery. However, the traditional high-throughput techniques based on clinical trials are costly, cumbersome, and time-consuming for identifying DTIs. Hence, new intelligent computational methods are urgently needed to surmount these defects in predicting DTIs. In this paper, we propose a novel computational method that combines position-specific scoring matrix (PSSM), elastic net based sparse features extraction, and rotation forest (RF) classifier. Specifically, we converted each protein primary sequence into PSSM, which contains biological evolutionary information. Then we extract the hidden sparse feature descriptors in PSSM by elastic net based sparse feature extraction method (ESFE). After that, we fuse them with the features of drug, which are represented by molecular fingerprints. Finally, rotation forest classifier works on detecting the potential drug-target interactions. When performing the proposed method by the experiments of fivefold cross validation (CV) on enzyme, ion channel, G protein-coupled receptors (GPCRs), and nuclear receptor datasets, this method achieves average accuracies of 90.32%, 88.91%, 80.65%, and 79.73%, respectively. We also compared the proposed model with the state-of-the-art support vector machine (SVM) classifier and other effective methods on the same datasets. The comparison results distinctly indicate that the proposed model possesses the efficient and robust ability to predict DTIs. We expect that the new model will be able to take effects on predicting massive DTIs.

Download Full-text

Event-Centered Data Segmentation in Accelerometer-Based Fall Detection Algorithms

Sensors ◽

10.3390/s21134335 ◽

2021 ◽

Vol 21 (13) ◽

pp. 4335

Author(s):

Goran Šeketa ◽

Lovro Pavlaković ◽

Dominik Džaja ◽

Igor Lacković ◽

Ratko Magjarević

Keyword(s):

Large Scale ◽

Detection System ◽

Low Cost ◽

Fall Detection ◽

Support Vector ◽

Svm Classifier ◽

Detection Accuracy ◽

Detection Systems ◽

Data Segmentation ◽

Post Impact

Automatic fall detection systems ensure that elderly people get prompt assistance after experiencing a fall. Fall detection systems based on accelerometer measurements are widely used because of their portability and low cost. However, the ability of these systems to differentiate falls from Activities of Daily Living (ADL) is still not acceptable for everyday usage at a large scale. More work is still needed to raise the performance of these systems. In our research, we explored an essential but often neglected part of accelerometer-based fall detection systems—data segmentation. The aim of our work was to explore how different configurations of windows for data segmentation affect detection accuracy of a fall detection system and to find the best-performing configuration. For this purpose, we designed a testing environment for fall detection based on a Support Vector Machine (SVM) classifier and evaluated the influence of the number and duration of segmentation windows on the overall detection accuracy. Thereby, an event-centered approach for data segmentation was used, where windows are set relative to a potential fall event detected in the input data. Fall and ADL data records from three publicly available datasets were utilized for the test. We found that a configuration of three sequential windows (pre-impact, impact, and post-impact) provided the highest detection accuracy on all three datasets. The best results were obtained when either a 0.5 s or a 1 s long impact window was used, combined with pre- and post-impact windows of 3.5 s or 3.75 s.

Download Full-text

Ecological Environment Changes of Mining Areas Around Nansi Lake With Remote Sensing Monitoring

10.21203/rs.3.rs-186720/v1 ◽

2021 ◽

Author(s):

Hu Liu ◽

Yan Jiang ◽

Rafal Misa ◽

Junhai Gao ◽

Mingyu Xia ◽

...

Keyword(s):

Remote Sensing ◽

Coal Mining ◽

Large Scale ◽

Underground Mining ◽

Water Area ◽

Ecological Environment ◽

Support Vector ◽

Svm Classifier ◽

Nansi Lake ◽

The Impact

Abstract Underground mining activity has existed for more than 100 years in Nansi lake. Coal mining not only plays a supporting role in local social and economic development but also has a significant impact on the ecological environment in the region. Landsat series remote sensing data (1988~2019) are used to research the impact of coal mining on the ecological environment in Nansi lake. Then Support Vector Machine (SVM) classifier is applied to extract the water area of the upstream lake from 1988 to 2019, and ecological environment and spatiotemporal variation characteristics are analyzed by Remote Sensing Ecology Index (RSEI). The results illustrate that the water area change is associated with annual precipitation. Compared with 2009, the ecological quality of the lake is worse in 2019, and then the reason for this change is due to large-scale underground mining. Therefore, the coal mines from the natural reserve may be closed or limited to the mining boundary for protecting the lake's ecological environment.

Download Full-text

FWHT-RF: A Novel Computational Approach to Predict Plant Protein-Protein Interactions via an Ensemble Learning Method

Scientific Programming ◽

10.1155/2021/1607946 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Jie Pan ◽

Li-Ping Li ◽

Chang-Qing Yu ◽

Zhu-Hong You ◽

Zhong-Hao Ren ◽

...

Keyword(s):

Protein Interactions ◽

Nearest Neighbor ◽

Protein Sequences ◽

Evolutionary Information ◽

Support Vector ◽

Protein Protein Interactions ◽

K Nearest Neighbor ◽

Novel Approach ◽

Knn Classifier ◽

Scoring Matrix

Protein-protein interactions (PPIs) in plants are crucial for understanding biological processes. Although high-throughput techniques produced valuable information to identify PPIs in plants, they are usually expensive, inefficient, and extremely time-consuming. Hence, there is an urgent need to develop novel computational methods to predict PPIs in plants. In this article, we proposed a novel approach to predict PPIs in plants only using the information of protein sequences. Specifically, plants’ protein sequences are first converted as position-specific scoring matrix (PSSM); then, the fast Walsh–Hadamard transform (FWHT) algorithm is used to extract feature vectors from PSSM to obtain evolutionary information of plant proteins. Lastly, the rotation forest (RF) classifier is trained for prediction and produced a series of evaluation results. In this work, we named this approach FWHT-RF because FWHT and RF are used for feature extraction and classification, respectively. When applying FWHT-RF on three plants’ PPI datasets Maize, Rice, and Arabidopsis thaliana (Arabidopsis), the average accuracies of FWHT-RF using 5-fold cross validation were achieved as high as 95.20%, 94.42%, and 83.85%, respectively. To further evaluate the predictive power of FWHT-RF, we compared it with the state-of-art support vector machine (SVM) and K-nearest neighbor (KNN) classifier in different aspects. The experimental results demonstrated that FWHT-RF can be a useful supplementary method to predict potential PPIs in plants.

Download Full-text

Support Vector Machine optimization with fractional gradient descent for data classification

Journal of Applied Sciences, Management and Engineering Technology ◽

10.31284/j.jasmet.2021.v2i1.1467 ◽

2021 ◽

Vol 2 (1) ◽

pp. 1-6

Author(s):

Dian Puspita Hapsari ◽

Imam Utoyo ◽

Santi Wulan Purnami

Keyword(s):

Gradient Descent ◽

Large Scale ◽

Computing Time ◽

Data Classification ◽

Optimization Method ◽

Descent Method ◽

Computational Time ◽

Support Vector ◽

Svm Classifier ◽

Gradient Descent Method

Data classification has several problems one of which is a large amount of data that will reduce computing time. SVM is a reliable linear classifier for linear or non-linear data, for large-scale data, there are computational time constraints. The Fractional gradient descent method is an unconstrained optimization algorithm to train classifiers with support vector machines that have convex problems. Compared to the classic integer-order model, a model built with fractional calculus has a significant advantage to accelerate computing time. In this research, it is to conduct investigate the current state of this new optimization method fractional derivatives that can be implemented in the classifier algorithm. The results of the SVM Classifier with fractional gradient descent optimization, it reaches a convergence point of approximately 50 iterations smaller than SVM-SGD. The process of updating or fixing the model is smaller in fractional because the multiplier value is less than 1 or in the form of fractions. The SVM-Fractional SGD algorithm is proven to be an effective method for rainfall forecast decisions.

Download Full-text

HYBRID DECISION TREE ARCHITECTURE UTILIZING LOCAL SVMs FOR EFFICIENT MULTI-LABEL LEARNING

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800141351004x ◽

2013 ◽

Vol 27 (07) ◽

pp. 1351004 ◽

Cited By ~ 3

Author(s):

DEJAN GJORGJEVIKJ ◽

GJORGJI MADJAROV ◽

SAŠO DŽEROSKI

Keyword(s):

Decision Tree ◽

Text Categorization ◽

Large Scale ◽

Semantic Annotation ◽

Predictive Performance ◽

Tree Architecture ◽

Support Vector ◽

Svm Classifier ◽

Strong Impact ◽

Classification Problems

Multi-label learning (MLL) problems abound in many areas, including text categorization, protein function classification, and semantic annotation of multimedia. Issues that severely limit the applicability of many current machine learning approaches to MLL are the large-scale problem, which have a strong impact on the computational complexity of learning. These problems are especially pronounced for approaches that transform MLL problems into a set of binary classification problems for which Support Vector Machines (SVMs) are used. On the other hand, the most efficient approaches to MLL, based on decision trees, have clearly lower predictive performance. We propose a hybrid decision tree architecture, where the leaves do not give multi-label predictions directly, but rather utilize local SVM-based classifiers giving multi-label predictions. A binary relevance architecture is employed in the leaves, where a binary SVM classifier is built for each of the labels relevant to that particular leaf. We use a broad range of multi-label datasets with a variety of evaluation measures to evaluate the proposed method against related and state-of-the-art methods, both in terms of predictive performance and time complexity. Our hybrid architecture on almost every large classification problem outperforms the competing approaches in terms of the predictive performance, while its computational efficiency is significantly improved as a result of the integrated decision tree.

Download Full-text