scholarly journals The Impact of Simulated Spectral Noise on Random Forest and Oblique Random Forest Classification Performance

2018 ◽  
Vol 2018 ◽  
pp. 1-8 ◽  
Author(s):  
Na’eem Hoosen Agjee ◽  
Onisimo Mutanga ◽  
Kabir Peerbhay ◽  
Riyad Ismail

Hyperspectral datasets contain spectral noise, the presence of which adversely affects the classifier performance to generalize accurately. Despite machine learning algorithms being regarded as robust classifiers that generalize well under unfavourable noisy conditions, the extent of this is poorly understood. This study aimed to evaluate the influence of simulated spectral noise (10%, 20%, and 30%) on random forest (RF) and oblique random forest (oRF) classification performance using two node-splitting models (ridge regression (RR) and support vector machines (SVM)) to discriminate healthy and low infested water hyacinth plants. Results from this study showed that RF was slightly influenced by simulated noise with classification accuracies decreasing for week one and week two with the addition of 30% noise. In comparison to RF, oRF-RR and oRF-SVM yielded higher test accuracies (oRF-RR: 5.36%–7.15%; oRF-SVM: 3.58%–5.36%) and test kappa coefficients (oRF-RR: 10.72%–14.29%; oRF-SVM: 7.15%–10.72%). Notably, oRF-RR test accuracies and kappa coefficients remained consistent irrespective of simulated noise level for week one and week two while similar results were achieved for week three using oRF-SVM. Overall, this study has demonstrated that oRF-RR can be regarded a robust classification algorithm that is not influenced by noisy spectral conditions.

2019 ◽  
Vol 11 (11) ◽  
pp. 3222 ◽  
Author(s):  
Pascal Schirmer ◽  
Iosif Mporas

In this paper we evaluate several well-known and widely used machine learning algorithms for regression in the energy disaggregation task. Specifically, the Non-Intrusive Load Monitoring approach was considered and the K-Nearest-Neighbours, Support Vector Machines, Deep Neural Networks and Random Forest algorithms were evaluated across five datasets using seven different sets of statistical and electrical features. The experimental results demonstrated the importance of selecting both appropriate features and regression algorithms. Analysis on device level showed that linear devices can be disaggregated using statistical features, while for non-linear devices the use of electrical features significantly improves the disaggregation accuracy, as non-linear appliances have non-sinusoidal current draw and thus cannot be well parametrized only by their active power consumption. The best performance in terms of energy disaggregation accuracy was achieved by the Random Forest regression algorithm.


2018 ◽  
Vol 5 (2) ◽  
pp. 175-185
Author(s):  
Akhmad Syukron ◽  
Agus Subekti

                                         AbstrakPenilaian kredit telah menjadi salah satu cara utama bagi sebuah lembaga keuangan untuk menilai resiko kredit,  meningkatkan arus kas, mengurangi kemungkinan resiko dan membuat keputusan manajerial. Salah satu permasalahan yang dihadapai pada penilaian kredit yaitu adanya ketidakseimbangan distribusi dataset. Metode untuk mengatasi ketidakseimbangan kelas yaitu dengan metode resampling, seperti menggunakan Oversampling, undersampling dan hibrida yaitu dengan menggabungkan kedua pendekatan sampling. Metode yang diusulkan pada penelitian ini adalah penerapan metode Random Over-Under Sampling Random Forest untuk meningkatkan kinerja akurasi klasifikasi penilaian kredit pada dataset German Credit.  Hasil pengujian menunjukan bahwa klasifikasi tanpa melalui proses resampling menghasilkan kinerja akurasi rata-rata 70 % pada semua classifier. Metode Random Forest memiliki nilai akurasi yang lebih baik dibandingkan dengan beberapa metode lainnya dengan nilai akurasi sebesar 0,76 atau 76%. Sedangkan klasifikasi dengan penerapan metode Random Over-under sampling Random Forest  dapat meningkatkan kinerja akurasi sebesar 14,1% dengan nilai akurasi sebesar 0,901 atau 90,1 %. Hasil penelitian menunjukan bahwa penerapan  resampling dengan metode Random Over-Under Sampling pada algoritma Random Forest dapat meningkatkan kinerja akurasi secara efektif pada klasifikasi  tidak seimbang untuk penilaian kredit pada dataset German Credit. Kata kunci: Penilaian Kredit, Random Forest, Klasifikasi, ketidakseimbangan kelas, Random Over-Under Sampling                                                  AbstractCredit scoring has become one of the main ways for a financial institution to assess credit risk, improve cash flow, reduce the possibility of risk and make managerial decisions. One of the problems faced by credit scoring is the imbalance in the distribution of datasets. The method to overcome class imbalances is the resampling method, such as using Oversampling, undersampling and hybrids by combining both sampling approaches. The method proposed in this study is the application of the Random Over-Under Sampling Random Forest method to improve the accuracy of the credit scoring classification performance on German Credit dataset. The test results show that the classification without going through the resampling process results in an average accuracy performance of 70% for all classifiers. The Random Forest method has a better accuracy value compared to some other methods with an accuracy value of 0.76 or 76%. While classification by applying the Random Over-under sampling + Random Forest method can improve accuracy performance 14.1% with an accuracy value of 0.901 or 90.1%. The results showed that the application of resampling using Random Over-Under Sampling method in the Random Forest algorithm can improve accuracy performance effectively on an unbalanced classification for credit scoring on German Credit dataset. Keywords: Imbalance Class, Credit Scoring, Random Forest, Classification, Resampling


2020 ◽  
Vol 492 (4) ◽  
pp. 5075-5088 ◽  
Author(s):  
R M Arnason ◽  
P Barmby ◽  
N Vulic

ABSTRACT Identifying X-ray binary (XRB) candidates in nearby galaxies requires distinguishing them from possible contaminants including foreground stars and background active galactic nuclei. This work investigates the use of supervised machine learning algorithms to identify high-probability XRB candidates. Using a catalogue of 943 Chandra X-ray sources in the Andromeda galaxy, we trained and tested several classification algorithms using the X-ray properties of 163 sources with previously known types. Amongst the algorithms tested, we find that random forest classifiers give the best performance and work better in a binary classification (XRB/non-XRB) context compared to the use of multiple classes. Evaluating our method by comparing with classifications from visible-light and hard X-ray observations as part of the Panchromatic Hubble Andromeda Treasury, we find compatibility at the 90 per cent level, although we caution that the number of source in common is rather small. The estimated probability that an object is an XRB agrees well between the random forest binary and multiclass approaches and we find that the classifications with the highest confidence are in the XRB class. The most discriminating X-ray bands for classification are the 1.7–2.8, 0.5–1.0, 2.0–4.0, and 2.0–7.0 keV photon flux ratios. Of the 780 unclassified sources in the Andromeda catalogue, we identify 16 new high-probability XRB candidates and tabulate their properties for follow-up.


2018 ◽  
Author(s):  
Artur Bąk ◽  
Jakub Segen ◽  
Kamil Wereszczyński ◽  
Pawel Mielnik ◽  
Marcin Fojcik ◽  
...  

Identifying the separate parts in ultrasound images such as bone and skin plays the crucial role in synovitis detection task. This paper presents a detector of bone and skin regions in the form of a classifier which is trained on a set of annotated images. Selected regions have labels: skin or bone or none. Feature vectors used by the classifier are assigned to image pixels as a result of passing the image through the bank of linear and nonlinear filters. The filters include Gaussian blurring filter, its first and second order derivatives, Laplacian as well as positive and negative threshold operations applied to the filtered images. We compared multiple supervised learning classifiers including Naive Bayes, k-Nearest Neighbour, Decision Trees, Random Forest, AdaBoost and Support Vector Machines (SVM) with various kernels, using four classification performance scores and computation time. The Random Forest classifier was selected for the final use, as it gives the best overall evaluation results.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Tom Elliot ◽  
Robert Morse ◽  
Duane Smythe ◽  
Ashley Norris

AbstractIt is 50 years since Sieveking et al. published their pioneering research in Nature on the geochemical analysis of artefacts from Neolithic flint mines in southern Britain. In the decades since, geochemical techniques to source stone artefacts have flourished globally, with a renaissance in recent years from new instrumentation, data analysis, and machine learning techniques. Despite the interest over these latter approaches, there has been variation in the quality with which these methods have been applied. Using the case study of flint artefacts and geological samples from England, we present a robust and objective evaluation of three popular techniques, Random Forest, K-Nearest-Neighbour, and Support Vector Machines, and present a pipeline for their appropriate use. When evaluated correctly, the results establish high model classification performance, with Random Forest leading with an average accuracy of 85% (measured through F1 Scores), and with Support Vector Machines following closely. The methodology developed in this paper demonstrates the potential to significantly improve on previous approaches, particularly in removing bias, and providing greater means of evaluation than previously utilised.


2020 ◽  
Vol 6 (1) ◽  
pp. 7-14
Author(s):  
Achmad Udin Zailani ◽  
Nugraha Listiana Hanun

In English : Credit is the provision of money or bills which can be equalized with an agreement or deal between the bank and another parties that requires the borrower to pay off the debt after a certain period of time through interest. Before the cooperative approves the credit proposed by the debtor, the cooperative conducts a credit analysis of borrowers whether the credit application is approved or disapproved. This study objectives to predict creditworthiness by applying the Random Forest Classification Algorithm in order to provide a solution for determining the creditworthiness.This research method is absolute experimental research that leads to the impact resulting from experiments on the application of the decision tree model of the Random Forest Classification Algorithm’s approach. The study results using the Random Forest Classification Algorithm’s are able to analyze problem credit and disproblems debtors with an accuracy value of 87.88%. Besides that,. decision tree model was able to improve the accuracy in analyzing the credit worthiness of borrowers who filed. In Indonesian : Kredit adalah penyediaan uang atau tagihan yang dapat dipersamakan atas persetujuan atau kesepakatan pinjam meminjam antara bank dengan pihak lain yang mewajibkan pihak peminjam melunasi utangnya setelah jangka waktu tertentu dengan pemberian bunga. Koperasi Mitra Sejahtera menghadapi masalah pembayaran pihak peminjam atas tunggakan kredit. Penelitian ini bertujuan untuk memprediksi kelayakan kredit dengan penerapan Algoritma Klasifikasi Random Forest agar dapat memberikan solusi untuk penentuan kelayakan pemberian kredit. Metode penelitian ini adalah riset eksperimen absolut yang mengarah kepada dampak yang dihasilkan dari eksperimen atas penerapan model pohon keputusan menggunakan pendekatan Algoritma Klasifikasi Random Forest. Hasil pengujian dengan algoritma klasifikasi Random Forest mampu menganalisis kredit yang bermasalah dan yang debitur yang tidak bermasalah dengan nilai akurasi sebesar 87,88%. Di samping itu, model pohon keputusan ternyata mampu meningkatkan akurasi dalam menganalisis kelayakan kredit yang diajukan calon debitur.


2020 ◽  
Vol 6 (1) ◽  
pp. 7-14
Author(s):  
Achmad Udin Zailani ◽  
Nugraha Listiana Hanun

Credit is the provision of money or bills which can be equalized with an agreement or deal between the bank and another parties that requires the borrower to pay off the debt after a certain period of time through interest. Before the cooperative approves the credit proposed by the debtor, the cooperative conducts a credit analysis of borrowers whether the credit application is approved or disapproved. This study objectives to predict creditworthiness by applying the Random Forest Classification Algorithm in order to provide a solution for determining the creditworthiness.This research method is absolute experimental research that leads to the impact resulting from experiments on the application of the decision tree model of the Random Forest Classification Algorithm’s approach. The study results using the Random Forest Classification Algorithm’s are able to analyze problem credit and disproblems debtors with an accuracy value of 87.88%. Besides that,. decision tree model was able to improve the accuracy in analyzing the credit worthiness of borrowers who filed.


Today the world is gripped with fear of the most infectious disease which was caused by a newly discovered virus namely corona and thus termed as COVID-19. This is a large group of viruses which severely affects humans. The world bears testimony to its contagious nature and rapidity of spreading the illness. 50l people got infected and 30l people died due to this pandemic all around the world. This made a wide impact for people to fear the epidemic around them. The death rate of male is more compared to female. This Pandemic news has caught the attention of the world and gained its momentum in almost all the media platforms. There was an array of creating and spreading of true as well as fake news about COVID-19 in the social media, which has become popular and a major concern to the general public who access it. Spreading such hot news in social media has become a new trend in acquiring familiarity and fan base. At the time it is undeniable that spreading of such fake news in and around creates lots of confusion and fear to the public. To stop all such rumors detection of fake news has become utmost important. To effectively detect the fake news in social media the emerging machine learning classification algorithms can be an appropriate method to frame the model. In the context of the COVID-19 pandemic, we investigated and implemented by collecting the training data and trained a machine learning model by using various machine learning algorithms to automatically detect the fake news about the Corona Virus. The machine learning algorithm used in this investigation is Naïve Bayes classifier and Random forest classification algorithm for the best results. A separate model for each classifier is created after the data preparation and feature extraction Techniques. The results obtained are compared and examined accurately to evaluate the accurate model. Our experiments on a benchmark dataset with random forest classification model showed a promising results with an overall accuracy of 94.06%. This experimental evaluation will prevent the general public to keep themselves out of their fear and to know and understand the impact of fast-spreading as well as misleading fake news.


2018 ◽  
Vol 5 (2) ◽  
pp. 175-185 ◽  
Author(s):  
Akhmad Syukron ◽  
Agus Subekti

                                         AbstrakPenilaian kredit telah menjadi salah satu cara utama bagi sebuah lembaga keuangan untuk menilai resiko kredit,  meningkatkan arus kas, mengurangi kemungkinan resiko dan membuat keputusan manajerial. Salah satu permasalahan yang dihadapai pada penilaian kredit yaitu adanya ketidakseimbangan distribusi dataset. Metode untuk mengatasi ketidakseimbangan kelas yaitu dengan metode resampling, seperti menggunakan Oversampling, undersampling dan hibrida yaitu dengan menggabungkan kedua pendekatan sampling. Metode yang diusulkan pada penelitian ini adalah penerapan metode Random Over-Under Sampling Random Forest untuk meningkatkan kinerja akurasi klasifikasi penilaian kredit pada dataset German Credit.  Hasil pengujian menunjukan bahwa klasifikasi tanpa melalui proses resampling menghasilkan kinerja akurasi rata-rata 70 % pada semua classifier. Metode Random Forest memiliki nilai akurasi yang lebih baik dibandingkan dengan beberapa metode lainnya dengan nilai akurasi sebesar 0,76 atau 76%. Sedangkan klasifikasi dengan penerapan metode Random Over-under sampling Random Forest  dapat meningkatkan kinerja akurasi sebesar 14,1% dengan nilai akurasi sebesar 0,901 atau 90,1 %. Hasil penelitian menunjukan bahwa penerapan  resampling dengan metode Random Over-Under Sampling pada algoritma Random Forest dapat meningkatkan kinerja akurasi secara efektif pada klasifikasi  tidak seimbang untuk penilaian kredit pada dataset German Credit. Kata kunci: Penilaian Kredit, Random Forest, Klasifikasi, ketidakseimbangan kelas, Random Over-Under Sampling                                                  AbstractCredit scoring has become one of the main ways for a financial institution to assess credit risk, improve cash flow, reduce the possibility of risk and make managerial decisions. One of the problems faced by credit scoring is the imbalance in the distribution of datasets. The method to overcome class imbalances is the resampling method, such as using Oversampling, undersampling and hybrids by combining both sampling approaches. The method proposed in this study is the application of the Random Over-Under Sampling Random Forest method to improve the accuracy of the credit scoring classification performance on German Credit dataset. The test results show that the classification without going through the resampling process results in an average accuracy performance of 70% for all classifiers. The Random Forest method has a better accuracy value compared to some other methods with an accuracy value of 0.76 or 76%. While classification by applying the Random Over-under sampling + Random Forest method can improve accuracy performance 14.1% with an accuracy value of 0.901 or 90.1%. The results showed that the application of resampling using Random Over-Under Sampling method in the Random Forest algorithm can improve accuracy performance effectively on an unbalanced classification for credit scoring on German Credit dataset. Keywords: Imbalance Class, Credit Scoring, Random Forest, Classification, Resampling


2021 ◽  
Vol 2095 (1) ◽  
pp. 012058
Author(s):  
Xiaoyu Xian ◽  
Haichuan Tang ◽  
Yin Tian ◽  
Qi Liu ◽  
Yuming Fan

Abstract This paper addresses electric motor fault diagnosis using supervised machine learning classification. A total of 15 distinct fault types are classified and multilabel strategies are used to classify concurrent faults. we explored, developed, and compared the performance of different types of binary (fault/non-fault), multi-class (fault type) and multi-label (single fault versus combination fault) classifiers. To evaluate the effectiveness of fault identification and classification, we used different supervised machine learning methods, including Random forest classification, support vector machine and neural network classification. Through experiment, we compared these methods over 4 classification regimes and finally summarize the most suitable machine learning algorithms for different aspects of health diagnosis in traction motors area.


Sign in / Sign up

Export Citation Format

Share Document