iMPTCE-Hnetwork: A Multilabel Classifier for Identifying Metabolic Pathway Types of Chemicals and Enzymes with a Heterogeneous Network

Computational and Mathematical Methods in Medicine ◽

10.1155/2021/6683051 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Yuanyuan Zhu ◽

Bin Hu ◽

Lei Chen ◽

Qi Dai

Keyword(s):

Metabolic Pathway ◽

Metabolic Pathways ◽

Heterogeneous Network ◽

Cross Validation ◽

Polynomial Kernel ◽

Support Vector ◽

Exact Match ◽

Living Organisms ◽

A Chain ◽

Fold Cross Validation

Metabolic pathway is an important type of biological pathways. It produces essential molecules and energies to maintain the life of living organisms. Each metabolic pathway consists of a chain of chemical reactions, which always need enzymes to participate in. Thus, chemicals and enzymes are two major components for each metabolic pathway. Although several metabolic pathways have been uncovered, the metabolic pathway system is still far from complete. Some hidden chemicals or enzymes are not discovered in a certain metabolic pathway. Besides the traditional experiments to detect hidden chemicals or enzymes, an alternative pipeline is to design efficient computational methods. In this study, we proposed a powerful multilabel classifier, called iMPTCE-Hnetwork, to uniformly assign chemicals and enzymes to metabolic pathway types reported in KEGG. Such classifier adopted the embedding features derived from a heterogeneous network, which defined chemicals and enzymes as nodes and the interactions between chemicals and enzymes as edges, through a powerful network embedding algorithm, Mashup. The popular RAndom k-labELsets (RAKEL) algorithm was employed to construct the classifier, which incorporated the support vector machine (polynomial kernel) as the basic classifier. The ten-fold cross-validation results indicated that such a classifier had good performance with accuracy higher than 0.800 and exact match higher than 0.750. Several comparisons were done to indicate the superiority of the iMPTCE-Hnetwork.

Download Full-text

Rice Yield Forecasting using Support Vector Machine

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d7236.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 2588-2593

Keyword(s):

Cross Validation ◽

Rice Yield ◽

Polynomial Kernel ◽

Support Vector ◽

Classification Models ◽

Average Increase ◽

Vector Machines ◽

Computing Support ◽

Multi Classification ◽

Fold Cross Validation

In the domain of Soft Computing, Support Vector Machines (SVMs) have acquired considerable significance. These are widely used in making predictions, owing to their ability of generalization. This paper is about the development of SVM based classification models for the prediction of rice yield in India. Experiments have been conducted involving oneagainst-one multi classification method, k-fold cross validation and polynomial kernel function for SVM training. Rice production data of India has been sourced from Directorate of Economics and Statistics, Ministry of Agriculture, Government of India, for this work. The best prediction accuracy for the 4- year relative average increase has been achieved as 75.06% using 4-fold cross validation method. MATLAB software has been used for experimentation in this work.

Download Full-text

iMPT-FRAKEL: A Simple Multi-label Web-server that Only Uses Fingerprints to Identify which Metabolic Pathway Types Compounds can Participate In

The Open Bioinformatics Journal ◽

10.2174/1875036202013010083 ◽

2020 ◽

Vol 13 (1) ◽

pp. 83-91

Author(s):

Yanjuan Jia ◽

Lei Chen ◽

Jian-Peng Zhou ◽

Min Liu

Keyword(s):

Metabolic Pathway ◽

Metabolic Pathways ◽

Learning Algorithm ◽

Web Server ◽

Support Vector ◽

Binary Relevance ◽

Comparison Results ◽

Rbf Kernel ◽

Living Organisms ◽

Set Up

Background: Metabolic pathway is one of the most basic biological pathways in living organisms. It consists of a series of chemical reactions and provides the necessary molecules and energies for organisms. To date, lots of metabolic pathways have been detected. However, there still exist hidden participants (compounds and enzymes) for some metabolic pathways due to the complexity and diversity of metabolic pathways. It is necessary to develop quick, reliable, and non-animal-involved prediction model to recognize metabolic pathways for any compound. Methods: In this study, a multi-label classifier, namely iMPT-FRAKEL, was developed for identifying which metabolic pathway types that compounds can participate in. Compounds and 12 metabolic pathway types were retrieved from KEGG. Each compound was represented by its fingerprints, which was the most widely used form for representing compounds and can be extracted from its SMILES format. A popular multi-label classification scheme, Random k-Labelsets (RAKEL) algorithm, was adopted to build the classifier. Classic machine learning algorithm, Support Vector Machine (SVM) with RBF kernel, was selected as the basic classification algorithm. Ten-fold cross-validation was used to evaluate the performance of the iMPT-FRAKEL. In addition, a web-server version of such classifier was set up, which can be assessed at http://cie.shmtu.edu.cn/impt/index. Results: iMPT-FRAKEL yielded the accuracy of 0.804, exact match of 0.745 and hamming loss of 0.039. Comparison results indicated that such classifier was superior to other models, including models with Binary Relevance (BR) or other classification algorithms. Conclusion: The proposed classifier employed limited prior knowledge of compounds but gives satisfying performance for recognizing metabolic pathways of compounds.

Download Full-text

Combination of Support Vector Machine and K-Fold cross-validation for prediction of long-term degradation of the compressive strength of marine concrete

International Journal of Computational Physics Series ◽

10.29167/a1i1p120-130 ◽

2018 ◽

Vol 1 (1) ◽

pp. 120-130 ◽

Cited By ~ 1

Author(s):

Chunxiang Qian ◽

Wence Kang ◽

Hao Ling ◽

Hua Dong ◽

Chengyao Liang ◽

...

Keyword(s):

Support Vector Machine ◽

Environmental Factors ◽

Cross Validation ◽

Concrete Strength ◽

Simulation Method ◽

Support Vector ◽

Svm Model ◽

Artificial Neural Network Ann ◽

Influence Degree ◽

Fold Cross Validation

Support Vector Machine (SVM) model optimized by K-Fold cross-validation was built to predict and evaluate the degradation of concrete strength in a complicated marine environment. Meanwhile, several mathematical models, such as Artificial Neural Network (ANN) and Decision Tree (DT), were also built and compared with SVM to determine which one could make the most accurate predictions. The material factors and environmental factors that influence the results were considered. The materials factors mainly involved the original concrete strength, the amount of cement replaced by fly ash and slag. The environmental factors consisted of the concentration of Mg2+, SO42-, Cl-, temperature and exposing time. It was concluded from the prediction results that the optimized SVM model appeared to perform better than other models in predicting the concrete strength. Based on SVM model, a simulation method of variables limitation was used to determine the sensitivity of various factors and the influence degree of these factors on the degradation of concrete strength.

Download Full-text

Pengenalan Wajah Manusia berbasis Algoritma Local Binary Pattern

Emitor: Jurnal Teknik Elektro ◽

10.23917/emitor.v17i2.6232 ◽

2017 ◽

Vol 17 (2) ◽

pp. 29-38

Author(s):

Ratih Purwati ◽

Gunawan Ariyanto

Keyword(s):

Computer Vision ◽

Support Vector Machine ◽

Face Recognition ◽

Local Binary Pattern ◽

Cross Validation ◽

Support Vector ◽

Fold Cross Validation

Face Recognition merupakan teknologi komputer untuk mengidentifikasi wajah manusia melalui gambar digital yang tersimpan di database. Wajah manusia dapat berubah bentuk sesuai dengan ekspresi yang dimilikinya. Wajah manusia dapat berubah bentuk sesuai dengan eskpresi yang dimilikinya. Ekspresi wajah manusia memiliki kemiripan satu sama lain sehingga untuk mengenali suatu ekspresi adalah kepunyaan siapa akan sedikit sulit. Pengenalan wajah terus menjadi topik aktif di zaman sekarang pada penelitian bidang computer vision. Penggunaan wajah manusia sering kita jumpai pada fitur-fitur aplikasi media sosial seperti Snapchat, Snapgram dari Instagram dan banyak aplikasi sosial media lainnya yang menggunakan teknologi tersebut. Pada penelitian ini dilakukan analisa pengenalan ekpresi wajah manusia dengan pendekatan fitur alogaritma Local Binary Pattern dan mencari pengembangan alogaritma dasar Local Binary Pattern yang paling optimal dengan cara menggabungkan metode Hisogram Equalization, Support Vector Machine, dan K-fold cross validation sehingga dapat meningkatkan pengenalan gambar wajah manusia pada hasil yang terbaik. Penelitian ini menginput beberapa database wajah manusia seperti JAFFE yang merupakan gambar wajah manusia wanita jepang yang berjumlah 10 orang dengan 7 ekspresi emosional seperti marah, sedih, bahagia, jijik, kaget, takut dan netral ke dalam sistem. YALE yaitu merupakan gambar wajah manusia orang Amerika. Serta menggunakan dataset CALTECH yang merupakan gambar manusia yang terdiri dari 450 gambar dengan ukuran 896 x 592 piksel dan disimpan dalam format JPEG. Kemudian data tersebut di sesuaikan dengan bentuk tekstur wajah masing-masing. Dari hasil penggabungan ketiga metode diatas dan percobaan-percobaan yang sudah dilakukan, didapatkan hasil yang paling optimal dalam pengenalan wajah manusia yaitu menggunakan dataset JAFFE dengan resolusi 92 x 112 piksel dan dengan tingkat penggunaan processor yang tinggi dapat mempengaruhi waktu kecepatan komputasi dalam proses menjalankan sistem sehingga menghasilkan prediksi yang lebih tepat.

Download Full-text

Abstract 473: Identification of Apolipoproteins Using Feature Selection Technique

Arteriosclerosis Thrombosis and Vascular Biology ◽

10.1161/atvb.36.suppl_1.473 ◽

2016 ◽

Vol 36 (suppl_1) ◽

Author(s):

Hua Tang ◽

Hao Lin

Keyword(s):

Support Vector Machine ◽

Cross Validation ◽

Support Vector ◽

Feature Subset ◽

Risk Markers ◽

Dipeptide Composition ◽

Accurate Identification ◽

Feature Selection Technique ◽

Physiological Importance ◽

Fold Cross Validation

Objective: Apolipoproteins are of great physiological importance and are associated with different diseases such as dyslipidemia, thrombogenesis and angiocardiopathy. Apolipoproteins have therefore emerged as key risk markers and important research targets yet the types of apolipoproteins has not been fully elucidated. Accurate identification of the apoliproproteins is very crucial to the comprehension of cardiovascular diseases and drug design. The aim of this study is to develop a powerful model to precisely identify apolipoproteins. Approach and Results: We manually collected a non-redundant dataset of 53 apoliproproteins and 136 non-apoliproproteins with the sequence identify of less than 40% from UniProt. After formulating the protein sequence samples with g -gap dipeptide composition (here g =1~10), the analysis of various (ANOVA) was adopted to find out the best feature subset which can achieve the best accuracy. Support Vector Machine (SVM) was then used to perform classification. The predictive model was evaluated using a five-fold cross-validation which yielded a sensitivity of 96.2%, a specificity of 99.3%, and an accuracy of 98.4%. The study indicated that the proposed method could be a feasible means of conducting preliminary analyses of apoliproproteins. Conclusion: We demonstrated that apoliproproteins can be predicted from their primary sequences. Also we discovered the special dipeptide distribution in apoliproproteins. These findings open new perspectives to improve apoliproproteins prediction by considering the specific dipeptides. We expect that these findings will help to improve drug development in anti-angiocardiopathy disease. Key words: Apoliproproteins Angiocardiopathy Support Vector Machine

Download Full-text

The Animal Classification: An Evaluation of Different Transfer Learning Pipeline

Mekatronika ◽

10.15282/mekatronika.v3i1.6680 ◽

2021 ◽

Vol 3 (1) ◽

pp. 27-31

Author(s):

Ken-ji Ee ◽

Ahmad Fakhri Bin Ab. Nasir ◽

Anwar P. P. Abdul Majeed ◽

Mohd Azraai Mohd Razman ◽

Nur Hafieza Ismail

Keyword(s):

Transfer Learning ◽

Classification System ◽

Cross Validation ◽

Support Vector ◽

Svm Classifier ◽

Average Classification Accuracy ◽

Validation Technique ◽

Search Approach ◽

Fold Cross Validation

The animal classification system is a technology to classify the animal class (type) automatically and useful in many applications. There are many types of learning models applied to this technology recently. Nonetheless, it is worth noting that the extraction of the features and the classification of the animal features is non-trivial, particularly in the deep learning approach for a successful animal classification system. The use of Transfer Learning (TL) has been demonstrated to be a powerful tool in the extraction of essential features. However, the employment of such a method towards animal classification applications are somewhat limited. The present study aims to determine a suitable TL-conventional classifier pipeline for animal classification. The VGG16 and VGG19 were used in extracting features and then coupled with either k-Nearest Neighbour (k-NN) or Support Vector Machine (SVM) classifier. Prior to that, a total of 4000 images were gathered consisting of a total of five classes which are cows, goats, buffalos, dogs, and cats. The data was split into the ratio of 80:20 for train and test. The classifiers hyper parameters are tuned by the Grids Search approach that utilises the five-fold cross-validation technique. It was demonstrated from the study that the best TL pipeline identified is the VGG16 along with an optimised SVM, as it was able to yield an average classification accuracy of 0.975. The findings of the present investigation could facilitate animal classification application, i.e. for monitoring animals in wildlife.

Download Full-text

Predictor Selection for Bacterial Vaginosis Diagnosis Using Decision Tree and Relief Algorithms

Applied Sciences ◽

10.3390/app10093291 ◽

2020 ◽

Vol 10 (9) ◽

pp. 3291

Author(s):

Jesús F. Pérez-Gómez ◽

Juana Canul-Reich ◽

José Hernández-Torruco ◽

Betania Hernández-Ocaña

Keyword(s):

Feature Selection ◽

Decision Tree ◽

Bacterial Vaginosis ◽

Cross Validation ◽

Performance Comparison ◽

Support Vector ◽

Ongoing Research ◽

Selection For ◽

Comparison Of The Results ◽

Fold Cross Validation

Requiring only a few relevant characteristics from patients when diagnosing bacterial vaginosis is highly useful for physicians as it makes it less time consuming to collect these data. This would result in having a dataset of patients that can be more accurately diagnosed using only a subset of informative or relevant features in contrast to using the entire set of features. As such, this is a feature selection (FS) problem. In this work, decision tree and Relief algorithms were used as feature selectors. Experiments were conducted on a real dataset for bacterial vaginosis with 396 instances and 252 features/attributes. The dataset was obtained from universities located in Baltimore and Atlanta. The FS algorithms utilized feature rankings, from which the top fifteen features formed a new dataset that was used as input for both support vector machine (SVM) and logistic regression (LR) algorithms for classification. For performance evaluation, averages of 30 runs of 10-fold cross-validation were reported, along with balanced accuracy, sensitivity, and specificity as performance measures. A performance comparison of the results was made between using the total number of features against using the top fifteen. These results found similar attributes from our rankings compared to those reported in the literature. This study is part of ongoing research that is investigating a range of feature selection and classification methods.

Download Full-text

Analisis Sentimen Pada Maskapai Penerbangan di Platform Twitter Menggunakan Algoritma Support Vector Machine (SVM)

Teknika ◽

10.34148/teknika.v10i1.311 ◽

2021 ◽

Vol 10 (1) ◽

pp. 18-26

Author(s):

Hendry Cipta Husada ◽

Adi Suryaputra Paramita

Keyword(s):

Machine Learning ◽

Social Media ◽

Support Vector Machine ◽

Cross Validation ◽

Support Vector ◽

Learning Approach ◽

Social Media Platform ◽

Machine Learning Approach ◽

Media Platform ◽

Fold Cross Validation

Perkembangan teknologi saat ini telah memberikan kemudahan bagi banyak orang dalam mendapatkan dan menyebarkan informasi di berbagai social media platform. Twitter merupakan salah satu media yang kerap digunakan untuk menyampaikan opini sebagai bentuk reaksi seseorang atas suatu hal. Opini yang terdapat di Twitter dapat digunakan perusahaan maskapai penerbangan sebagai parameter kunci untuk mengetahui tingkat kepuasan publik sekaligus bahan evaluasi bagi perusahaan. Berdasarkan hal tersebut, diperlukan sebuah metode yang dapat secara otomatis melakukan klasifikasi opini ke dalam kategori positif, negatif, atau netral melalui proses analisis sentimen. Proses analisis sentimen dilakukan dengan proses data preprocessing, pembobotan kata menggunakan metode TF-IDF, penerapan algoritma, dan pembahasan atas hasil klasifikasi. Klasifikasi opini dilakukan dengan machine learning approach memanfaatkan algoritma multi-class Support Vector Machine (SVM). Data yang digunakan dalam penelitian ini adalah opini dalam bahasa Inggris dari para pengguna Twitter terhadap maskapai penerbangan. Berdasarkan pengujian yang telah dilakukan, hasil klasifikasi terbaik diperoleh menggunakan SVM kernel RBF pada nilai parameter 𝐶(complexity) = 10 dan 𝛾(gamma) = 1, dengan nilai accuracy sebesar 84,37% dan 80,41% ketika menggunakan 10-fold cross validation.

Download Full-text

Comparison between fuzzy kernel k-medoids using radial basis function kernel and polynomial kernel function in hepatitis classification

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v10.i1.pp60-65 ◽

2021 ◽

Vol 10 (1) ◽

pp. 60

Author(s):

Glori Stephani Saragih ◽

Sri Hartini ◽

Zuherman Rustam

Keyword(s):

Radial Basis Function ◽

Kernel Function ◽

Basis Function ◽

Cross Validation ◽

Kernel Functions ◽

Polynomial Kernel ◽

Radial Basis ◽

Rbf Kernel ◽

Rbf Kernel Function ◽

Fold Cross Validation

<span id="docs-internal-guid-10508d4e-7fff-5011-7a0e-441840e858c8"><span>This paper compares the fuzzy kernel k-medoids using radial basis function (RBF) and polynomial kernel function in hepatitis classification. These two kernel functions were chosen due to their popularity in any kernel-based machine learning method for solving the classification task. The hepatitis dataset then used to evaluate the performance of both methods that were expected to provide an accurate diagnosis in patients to obtain treatment at an early phase. The data were obtained from two hospitals in Indonesia, consisting of 89 hepatitis-B and 31 hepatitis-C samples. The data were analyzed using several cases of k-fold cross-validation, and the performances were compared according to their accuracy, sensitivity, precision, F1-Score, and running time. From the experiments, it was concluded that fuzzy kernel k-medoids using RBF kernel function is better compared to polynomial kernel function with the 6% increment of accuracy, 13% enhancement of sensitivity, and 5% improvement in F1-Score. On the other side, the precision of fuzzy kernel k-medoids using polynomial kernel function is 2% higher than using the RBF kernel function. According to the results, the use of RBF or polynomial kernel function in fuzzy kernel medoids can be considered according to the primary goal of the classification.</span></span>

Download Full-text

Analisis Sentimen Twitter terhadap Tokoh Publik dengan Algoritma Naive Bayes dan Support Vector Machine

Simetris Jurnal Teknik Mesin Elektro dan Ilmu Komputer ◽

10.24176/simet.v11i2.4568 ◽

2021 ◽

Vol 11 (2) ◽

pp. 626-636

Author(s):

Tanthy Tawaqalia Widowati ◽

Mujiono Sadikin

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Cross Validation ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Fold Cross Validation

Salah satu media sosial yang berkembang adalah Twitter. Media sosial Twitter mempermudah masyarakat untuk bebas berpendapat melalui cuitan atau biasa disebut dengan tweets. Netizen dengan bebas menyampaikan opini pribadinya untuk topik apapun, termasuk persepsi terhadap tokoh publik. Artikel ini menyajikan hasil penelitian dan analisis sentimen masyarakat (netizen) terhadap tokoh publik, Nadiem Makariem sebagai Menteri Kementerian Pendidikan dan Kebudayaan baru. Penelitian ini menggunakan teknik data mining yang bertujuan untuk membandingkan hasil klasifikasi dari opini masyarakat yang dituliskan di Twitter. Dataset yang digunakan berasal dari tweets dengan kata kunci ”nadiem makariem”, ”kemendikbud” dan ”pak nadiem”. Tools RapidMiner digunakan untuk membantu tahap pre-processing dan klasifikasi menggunakan dua metode yaitu, Naive Bayes dan Support Vector Machine dengan evaluasi k-fold cross-validation. Dari hasil ujicoba diketahui bahwa untuk kasus yang diteliti, metode Naive Bayes menghasilkan kinerja yang lebih baik dengan accuracy 91.48%, precision 89.28% dan recall 91.58%.

Download Full-text