Efficacy of Integrating a Novel 16-Gene Biomarker Panel and Intelligence Classifiers for Differential Diagnosis of Rheumatoid Arthritis and Osteoarthritis

Nguyen Phuoc Long; Seongoh Park; Nguyen Hoang Anh; Jung Eun Min; Sang Jun Yoon; Hyung Min Kim; Tran Diem Nghi; Dong Kyu Lim; Jeong Hill Park; Johan Lim; Sung Won Kwon

doi:10.3390/jcm8010050

Efficacy of Integrating a Novel 16-Gene Biomarker Panel and Intelligence Classifiers for Differential Diagnosis of Rheumatoid Arthritis and Osteoarthritis

Journal of Clinical Medicine ◽

10.3390/jcm8010050 ◽

2019 ◽

Vol 8 (1) ◽

pp. 50 ◽

Cited By ~ 6

Author(s):

Nguyen Phuoc Long ◽

Seongoh Park ◽

Nguyen Hoang Anh ◽

Jung Eun Min ◽

Sang Jun Yoon ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Naive Bayes ◽

Meta Analysis ◽

Area Under The Curve ◽

Gene Signature ◽

Naïve Bayes ◽

Clinical Samples ◽

Support Vector ◽

K Nearest Neighbors ◽

Synovial Tissues

Introducing novel biomarkers for accurately detecting and differentiating rheumatoid arthritis (RA) and osteoarthritis (OA) using clinical samples is essential. In the current study, we searched for a novel data-driven gene signature of synovial tissues to differentiate RA from OA patients. Fifty-three RA, 41 OA, and 25 normal microarray-based transcriptome samples were utilized. The area under the curve random forests (RF) variable importance measurement was applied to seek the most influential differential genes between RA and OA. Five algorithms including RF, k-nearest neighbors (kNN), support vector machines (SVM), naïve-Bayes, and a tree-based method were employed for the classification. We found a 16-gene signature that could effectively differentiate RA from OA, including TMOD1, POP7, SGCA, KLRD1, ALOX5, RAB22A, ANK3, PTPN3, GZMK, CLU, GZMB, FBXL7, TNFRSF4, IL32, MXRA7, and CD8A. The externally validated accuracy of the RF model was 0.96 (sensitivity = 1.00, specificity = 0.90). Likewise, the accuracy of kNN, SVM, naïve-Bayes, and decision tree was 0.96, 0.96, 0.96, and 0.91, respectively. Functional meta-analysis exhibited the differential pathological processes of RA and OA; suggested promising targets for further mechanistic and therapeutic studies. In conclusion, the proposed genetic signature combined with sophisticated classification methods may improve the diagnosis and management of RA patients.

Download Full-text

Predicting Student’s Performance Using Machine Learning Algorithm

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-1209 ◽

2021 ◽

pp. 53-58

Author(s):

Sheela Rani P ◽

Dhivya S ◽

Dharshini Priya M ◽

Dharmila Chowdary A

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Prediction Model ◽

Naive Bayes ◽

Learning Algorithm ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbors

Machine learning is a new analysis discipline that uses knowledge to boost learning, optimizing the training method and developing the atmosphere within which learning happens. There square measure 2 sorts of machine learning approaches like supervised and unsupervised approach that square measure accustomed extract the knowledge that helps the decision-makers in future to require correct intervention. This paper introduces an issue that influences students' tutorial performance prediction model that uses a supervised variety of machine learning algorithms like support vector machine , KNN(k-nearest neighbors), Naïve Bayes and supplying regression and logistic regression. The results supported by various algorithms are compared and it is shown that the support vector machine and Naïve Bayes performs well by achieving improved accuracy as compared to other algorithms. The final prediction model during this paper may have fairly high prediction accuracy .The objective is not just to predict future performance of students but also provide the best technique for finding the most impactful features that influence student’s while studying.

Download Full-text

Técnicas de aprendizaje de máquina utilizadas para la minería de texto

Investigación Bibliotecológica Archivonomía Bibliotecología e Información ◽

10.22201/iibi.0187358xp.2017.71.57812 ◽

2017 ◽

Vol 31 (71) ◽

pp. 103

Author(s):

Ángel Freddy Godoy Viera

Keyword(s):

Support Vector Machine ◽

Naive Bayes ◽

Nearest Neighbors ◽

Naïve Bayes ◽

Support Vector ◽

K Nearest Neighbors ◽

Self Organizing Maps ◽

Self Organizing

Las técnicas de aprendizaje de máquina continúan siendo muy utilizadas para la minería de texto. Para este artículo se realizó una revisión de literatura en periódicos científicos publicados en los años de 2010 y 2011, con el objetivo de identificar las principales formas de aprendizaje de máquina empleadas para la minería de texto. Se utilizó estadística descriptiva para organizar, resumir y analizar los datos encontrados, y se presentó una descripción resumida de las principales encontradas. En los artículos analizados se hallaron 13 aplicadas para la minería de texto, el 83% de los artículos mencionaban de 1 a 3 técnicas de aprendizaje de máquina, las principales usadas por los autores en los artículos estudiados fueron support vector machine (svm), k-means (k-m),k-nearest neighbors (k-nn), naive bayes (nb), self-organizing maps (som). Los pares que aparecen con mayor frecuencia son svm/nb, svm/k-nn, svm/decission tree.

Download Full-text

Model Prediksi Prestasi Mahasiswa Berdasarkan Evaluasi Pembelajaran Menggunakan Pendekatan Data Science

Data Sciences Indonesia (DSI) ◽

10.47709/dsi.v1i1.1168 ◽

2021 ◽

Vol 1 (1) ◽

pp. 14-20

Author(s):

Tommy Tommy ◽

Amir Mahmud Husein

Keyword(s):

Support Vector Machine ◽

Logistic Regression ◽

Data Science ◽

Naive Bayes ◽

Nearest Neighbors ◽

Naïve Bayes ◽

Support Vector ◽

K Nearest Neighbors

Perguruan tinggi merupakan satuan penyelenggara pendidikan tinggi sebagai tingkat lanjut jenjang pendidikan menengah di jalur pendidikan formal. Aspek prestasi belajar merupakan salah satu aspek penilaian keberhasilan perguruan tinggi dalam proses belajar. Dalam makalah ini menyajikan hasil analisis hubungan antara pembelajaran dengan prestasi mahasiswa dimana tahapan yang dilakukan menggunakan pendetakan data science. Berdasarkan Analisis data terdapat tiga indikator penting dalam penilaian prestasi belajar yaitu pedagogi, profesional dan kepribadian. Ketiga fitur digunakan sebagai variabel dependen untuk memprediksi prestasi belajar dimana algoritma DecisionTree menghasilkan akurasi lebih baik dari pada model k-nearest neighbors (KNN), Logistic Regression, Support Vector Machine, Naive Bayes dan dengan tingkat akurasi 68%, kemudian KNN dengan akurasi 66% dan lainnya sebesar 55% pada masing-masing algoritma yang diusulkan.

Download Full-text

A Clinical Decision Support Tool to Detect Invasive Ductal Carcinoma in Histopathological Images Using Support Vector Machines, Naïve-Bayes, and K-Nearest Neighbor Classifiers

Machine Learning and Artificial Intelligence - Frontiers in Artificial Intelligence and Applications ◽

10.3233/faia200765 ◽

2020 ◽

Author(s):

Kyra Mikaela M. Lopez ◽

Ma. Sheila A. Magboo

Keyword(s):

Support Vector Machines ◽

Invasive Ductal Carcinoma ◽

Naive Bayes ◽

Ductal Carcinoma ◽

Naïve Bayes ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbors ◽

Support Tool ◽

Vector Machines

This study aims to describe a model that will apply image processing and traditional machine learning techniques specifically Support Vector Machines, Naïve-Bayes, and k-Nearest Neighbors to identify whether or not a given breast histopathological image has Invasive Ductal Carcinoma (IDC). The dataset consisted of 54,811 breast cancer image patches of size 50px x 50px, consisting of 39,148 IDC negative and 15,663 IDC positive. Feature extraction was accomplished using Oriented FAST and Rotated BRIEF (ORB) descriptors. Feature scaling was performed using Min-Max Normalization while K-Means Clustering on the ORB descriptors was used to generate the visual codebook. Automatic hyperparameter tuning using Grid Search Cross Validation was implemented although it can also accept user supplied hyperparameter values for SVM, Naïve Bayes, and K-NN models should the user want to do experimentation. Aside from computing for accuracy, the AUPRC and MCC metrics were used to address the dataset imbalance. The results showed that SVM has the best overall performance, obtaining accuracy = 0.7490, AUPRC = 0.5536, and MCC = 0.2924.

Download Full-text

On the Analysis of Machine Learning Classifiers to Detect Traffic Congestion in Vehicular Networks

10.5753/eniac.2019.9290 ◽

2019 ◽

Author(s):

Lucas Carvalho ◽

Maycon Silva ◽

Edimilson Santos ◽

Daniel Guidoni

Keyword(s):

Machine Learning ◽

Traffic Congestion ◽

Vehicular Networks ◽

Naive Bayes ◽

Naïve Bayes ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbors ◽

Applied Machine Learning ◽

Routing Methods

Problems related to traffic congestion and management have become common in many cities. Thus, vehicle re-routing methods have been proposed to minimize the congestion. Some of these methods have applied machine learning techniques, more specifically classifiers, to verify road conditions and detect congestion. However, better results may be obtained by applying a classifier more suitable to domain. In this sense, this paper presents an evaluation of different classifiers applied to the identification of the level of road congestion. Our main goal is to analyze the characteristics of each classifier in this task. The classifiers involved in the experiments here are: Multiple Layer Neural Network (MLP), K-Nearest Neighbors (KNN), Decision Trees (J48), Support Vector Machines (SVM), Naive Bayes and Tree Augment Naive Bayes.

Download Full-text

Classification of Aggressive Movements Using Smartwatches

Sensors ◽

10.3390/s20216377 ◽

2020 ◽

Vol 20 (21) ◽

pp. 6377

Author(s):

Franck Tchuente ◽

Natalie Baddour ◽

Edward D. Lemaire

Keyword(s):

Machine Learning ◽

Random Forest ◽

Aggressive Behavior ◽

Naive Bayes ◽

Poor Performance ◽

Naïve Bayes ◽

Classification Model ◽

Support Vector ◽

Care Providers ◽

K Nearest Neighbors

Recognizing aggressive movements is a challenging task in human activity recognition. Wearable smartwatch technology with machine learning may be a viable approach for human aggressive behavior classification. This research identified a viable classification model and feature selector (CM-FS) combination for separating aggressive from non-aggressive movements using smartwatch data and determined if only one smartwatch is sufficient for this task. A ranking method was used to select relevant CM-FS models across accuracy, sensitivity, specificity, precision, F-score, and Matthews correlation coefficient (MCC). The Waikato environment for knowledge analysis (WEKA) was used to run 6 machine learning classifiers (random forest, k-nearest neighbors (kNN), multilayer perceptron neural network (MP), support vector machine, naïve Bayes, decision tree) coupled with three feature selectors (ReliefF, InfoGain, Correlation). Microsoft Band 2 accelerometer and gyroscope data were collected during an activity circuit that included aggressive (punching, shoving, slapping, shaking) and non-aggressive (clapping hands, waving, handshaking, opening/closing a door, typing on a keyboard) tasks. A combination of kNN and ReliefF was the best CM-FS model for separating aggressive actions from non-aggressive actions, with 99.6% accuracy, 98.4% sensitivity, 99.8% specificity, 98.9% precision, 0.987 F-score, and 0.984 MCC. kNN and random forest classifiers, combined with any of the feature selectors, generated the top models. Models with naïve Bayes or support vector machines had poor performance for sensitivity, F-score, and MCC. Wearing the smartwatch on the dominant wrist produced the best single-watch results. The kNN and ReliefF combination demonstrated that this smartwatch-based approach is a viable solution for identifying aggressive behavior. This wrist-based wearable sensor approach could be used by care providers in settings where people suffer from dementia or mental health disorders, where random aggressive behaviors often occur.

Download Full-text

Prediction of Hepatitis Disease Using K-Nearest Neighbors, Naive Bayes, Support Vector Machine, Multi-Layer Perceptron and Random Forest

2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD) ◽

10.1109/icict4sd50815.2021.9397013 ◽

2021 ◽

Author(s):

Md. Julker Nayeem ◽

Sohel Rana ◽

Farjana Alam ◽

Md. Ataur Rahman

Keyword(s):

Support Vector Machine ◽

Random Forest ◽

Naive Bayes ◽

Nearest Neighbors ◽

Naïve Bayes ◽

Support Vector ◽

Multi Layer Perceptron ◽

K Nearest Neighbors

Download Full-text

Mineração de Texto para a Análise do Perfil Emocional de Usuários de Jogo Empático

10.14210/cotb.v12.p370-377 ◽

2021 ◽

Author(s):

Leonardo Dias Martins ◽

Fabíola Pantoja Oliveira Araújo

Keyword(s):

Naive Bayes ◽

Nearest Neighbors ◽

Naïve Bayes ◽

Support Vector ◽

The Internet ◽

Classification Algorithms ◽

K Nearest Neighbors ◽

The One ◽

Radial Kernel

Daily, a large amount of data circulates on the Internet, producing a lot of information in the form of images, videos and texts. Then, it is necessary to analyze and extract these information automatically. Therefore, this work presents a case study that applies text mining to extract the emotional and sentimental profiles from the comments of the Last Day of June game users, where the results and the information extracted from the analysis of sentiments were presented. Three classification algorithms were used: Naive Bayes, Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) to predict the class of elements according to the emotions or feelings identified in the comments analysis. As a result, SVM with radial kernel was the one with the best accuracy, with 79%, followed by KNN with 3 closest neighbors, with 75%, and finally, Naive Bayes, with 62%.

Download Full-text

Comparison of Support Vector Machine, Naïve Bayes and Logistic Regression for Assessing the Necessity for Coronary Angiography

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph17186449 ◽

2020 ◽

Vol 17 (18) ◽

pp. 6449

Author(s):

Parastoo Golpour ◽

Majid Ghayour-Mobarhan ◽

Azadeh Saki ◽

Habibollah Esmaily ◽

Ali Taghipour ◽

...

Keyword(s):

Machine Learning ◽

Decision Making ◽

Support Vector Machine ◽

Logistic Regression ◽

Coronary Angiography ◽

Naive Bayes ◽

Area Under The Curve ◽

Naïve Bayes ◽

Support Vector ◽

Bayes Model

(1) Background: Coronary angiography is considered to be the most reliable method for the diagnosis of cardiovascular disease. However, angiography is an invasive procedure that carries a risk of complications; hence, it would be preferable for an appropriate method to be applied to determine the necessity for angiography. The objective of this study was to compare support vector machine, naïve Bayes and logistic regressions to determine the diagnostic factors that can predict the need for coronary angiography. These models are machine learning algorithms. Machine learning is considered to be a branch of artificial intelligence. Its aims are to design and develop algorithms that allow computers to improve their performance on data analysis and decision making. The process involves the analysis of past experiences to find practical and helpful regularities and patterns, which may also be overlooked by a human. (2) Materials and Methods: This cross-sectional study was performed on 1187 candidates for angiography referred to Ghaem Hospital, Mashhad, Iran from 2011 to 2012. A logistic regression, naive Bayes and support vector machine were applied to determine whether they could predict the results of angiography. Afterwards, the sensitivity, specificity, positive and negative predictive values, AUC (area under the curve) and accuracy of all three models were computed in order to compare them. All analyses were performed using R 3.4.3 software (R Core Team; Auckland, New Zealand) with the help of other software packages including receiver operating characteristic (ROC), caret, e1071 and rminer. (3) Results: The area under the curve for logistic regression, naïve Bayes and support vector machine were similar—0.76, 0.74 and 0.75, respectively. Thus, in terms of the model parsimony and simplicity of application, the naïve Bayes model with three variables had the best performance in comparison with the logistic regression model with seven variables and support vector machine with six variables. (4) Conclusions: Gender, age and fasting blood glucose (FBG) were found to be the most important factors to predict the result of coronary angiography. The naïve Bayes model performed well using these three variables alone, and they are considered important variables for the other two models as well. According to an acceptable prediction of the models, they can be used as pragmatic, cost-effective and valuable methods that support physicians in decision making.

Download Full-text

Analisis Sentimen Dewan Perwakilan Rakyat Dengan Algoritma Klasifikasi Berbasis Particle Swarm Optimization

JOINTECS (Journal of Information Technology and Computer Science) ◽

10.31328/jointecs.v5i2.1362 ◽

2020 ◽

Vol 5 (2) ◽

pp. 61

Author(s):

Anas Faisal ◽

Yuris Alkhalifi ◽

Achmad Rifai ◽

Windu Gata

Keyword(s):

Support Vector Machine ◽

Particle Swarm Optimization ◽

Cross Validation ◽

Naive Bayes ◽

Particle Swarm ◽

Area Under The Curve ◽

Naïve Bayes ◽

Support Vector ◽

Swarm Optimization ◽

Fold Cross Validation

Penggunaan internet terutama media sosial telah menjadi bagian dari kehidupan bernegara. Hal ini salah satunya karena Anggota Dewan Perwakilan Rakyat Republik Indonesia (DPR RI) banyak yang menyampaikan ide, kebijakan maupun memberikan komentar atas kebijakan pemerintah melalui media sosial. Penelitian ini dilakukan untuk mengukur pendapat atau memisahkan antara sentimen positif dan sentimen negatif terhadap DPR RI. Data yang digunakan dalam penelitian ini didapatkan dengan melakukan crawling pada media sosial twitter. Penelitian dilakukan dengan menggunakan dua Algoritma yaitu Algoritma Support Vector Machine (SVM) dan Naive Bayes (NB). Kedua algoritma tersebut masing-masing dioptimasi menggunakan Particle Swarm Optimization (PSO). Hasil pengujian k-fold cross validation SVM dan NB mendapatkan nilai accuracy 71,04% dan 70,69% dengan nilai Area Under the Curve (AUC) 0,817 dan 0,661. Sedangkan hasil pengujian k-flod cross validation dengan menggunakan PSO, untuk SVM dan NB masing-masing mendapatkan nilai accuracy 75,03% dan 73,49% dengan nilai AUC 0,808 dan 0,719. Penggunaan PSO mampu meningkatkan nilai accuracy algoritma SVM sebesar 3,99% dan 2,8% pada algoritma NB. Hasil dari pengujian kedua algoritma tersebut nilai accuracy tertinggi adalah SVM dengan PSO sebesar 75,03%.

Download Full-text