On-line gradient learning algorithms for K-nearest neighbor classifiers

Author(s):  
Sergio Bermejo ◽  
Joan Cabestany

Knowledge extraction within a healthcare field is a very challenging task since we are having many problems such as noise and imbalanced datasets. They are obtained from clinical studies where uncertainty and variability are popular. Lately, a wide number of machine learning algorithms are considered and evaluated to check their validity of being used in the medical field. Usually, the classification algorithms are compared against medical experts who are specialized in certain disease diagnoses and provide an effective methodological evaluation of classifiers by applying performance metrics. The performance metrics contain four criteria: accuracy, sensitivity, and specificity forming the confusion matrix of each used algorithm. We have utilized eight different well-known machine learning algorithms to evaluate their performances in six different medical datasets. Based on the experimental results we conclude that the XGBoost and K-Nearest Neighbor classifiers were the best overall among the used datasets and signs can be used for diagnosing various diseases.


Author(s):  
Sandy C. Lauguico ◽  
◽  
Ronnie S. Concepcion II ◽  
Jonnel D. Alejandrino ◽  
Rogelio Ruzcko Tobias ◽  
...  

The arising problem on food scarcity drives the innovation of urban farming. One of the methods in urban farming is the smart aquaponics. However, for a smart aquaponics to yield crops successfully, it needs intensive monitoring, control, and automation. An efficient way of implementing this is the utilization of vision systems and machine learning algorithms to optimize the capabilities of the farming technique. To realize this, a comparative analysis of three machine learning estimators: Logistic Regression (LR), K-Nearest Neighbor (KNN), and Linear Support Vector Machine (L-SVM) was conducted. This was done by modeling each algorithm from the machine vision-feature extracted images of lettuce which were raised in a smart aquaponics setup. Each of the model was optimized to increase cross and hold-out validations. The results showed that KNN having the tuned hyperparameters of n_neighbors=24, weights='distance', algorithm='auto', leaf_size = 10 was the most effective model for the given dataset, yielding a cross-validation mean accuracy of 87.06% and a classification accuracy of 91.67%.


2012 ◽  
Vol 468-471 ◽  
pp. 2504-2509
Author(s):  
Qiang Da Yang ◽  
Zhen Quan Liu

The on-line estimation of some key hard-to-measure process variables by using soft-sensor technique has received extensive concern in industrial production process. The precision of on-line estimation is closely related to the accuracy of soft-sensor model, while the accuracy of soft-sensor model depends strongly on the accuracy of modeling data. Aiming at the special character of the definition for outliers in soft-sensor modeling process, an outlier detection method based on k-nearest neighbor (k-NN) is proposed in this paper. The proposed method can be realized conveniently from data without priori knowledge and assumption of the process. The simulation result and practical application show that the proposed outlier detection method based on k-NN has good detection effect and high application value.


2010 ◽  
Vol 5 (2) ◽  
pp. 133-137 ◽  
Author(s):  
Mohammed J. Islam ◽  
Q. M. Jonathan Wu ◽  
Majid Ahmadi ◽  
Maher A. SidAhmed

2019 ◽  
Vol 11 (8) ◽  
pp. 976
Author(s):  
Nicholas M. Enwright ◽  
Lei Wang ◽  
Hongqing Wang ◽  
Michael J. Osland ◽  
Laura C. Feher ◽  
...  

Barrier islands are dynamic environments because of their position along the marine–estuarine interface. Geomorphology influences habitat distribution on barrier islands by regulating exposure to harsh abiotic conditions. Researchers have identified linkages between habitat and landscape position, such as elevation and distance from shore, yet these linkages have not been fully leveraged to develop predictive models. Our aim was to evaluate the performance of commonly used machine learning algorithms, including K-nearest neighbor, support vector machine, and random forest, for predicting barrier island habitats using landscape position for Dauphin Island, Alabama, USA. Landscape position predictors were extracted from topobathymetric data. Models were developed for three tidal zones: subtidal, intertidal, and supratidal/upland. We used a contemporary habitat map to identify landscape position linkages for habitats, such as beach, dune, woody vegetation, and marsh. Deterministic accuracy, fuzzy accuracy, and hindcasting were used for validation. The random forest algorithm performed best for intertidal and supratidal/upland habitats, while the K-nearest neighbor algorithm performed best for subtidal habitats. A posteriori application of expert rules based on theoretical understanding of barrier island habitats enhanced model results. For the contemporary model, deterministic overall accuracy was nearly 70%, and fuzzy overall accuracy was over 80%. For the hindcast model, deterministic overall accuracy was nearly 80%, and fuzzy overall accuracy was over 90%. We found machine learning algorithms were well-suited for predicting barrier island habitats using landscape position. Our model framework could be coupled with hydrodynamic geomorphologic models for forecasting habitats with accelerated sea-level rise, simulated storms, and restoration actions.


2019 ◽  
Vol 108 (12) ◽  
pp. 2087-2111 ◽  
Author(s):  
Eric Bax ◽  
Lingjie Weng ◽  
Xu Tian

Diagnostics ◽  
2019 ◽  
Vol 9 (3) ◽  
pp. 104 ◽  
Author(s):  
Ahmed ◽  
Yigit ◽  
Isik ◽  
Alpkocak

Leukemia is a fatal cancer and has two main types: Acute and chronic. Each type has two more subtypes: Lymphoid and myeloid. Hence, in total, there are four subtypes of leukemia. This study proposes a new approach for diagnosis of all subtypes of leukemia from microscopic blood cell images using convolutional neural networks (CNN), which requires a large training data set. Therefore, we also investigated the effects of data augmentation for an increasing number of training samples synthetically. We used two publicly available leukemia data sources: ALL-IDB and ASH Image Bank. Next, we applied seven different image transformation techniques as data augmentation. We designed a CNN architecture capable of recognizing all subtypes of leukemia. Besides, we also explored other well-known machine learning algorithms such as naive Bayes, support vector machine, k-nearest neighbor, and decision tree. To evaluate our approach, we set up a set of experiments and used 5-fold cross-validation. The results we obtained from experiments showed that our CNN model performance has 88.25% and 81.74% accuracy, in leukemia versus healthy and multiclass classification of all subtypes, respectively. Finally, we also showed that the CNN model has a better performance than other wellknown machine learning algorithms.


Sign in / Sign up

Export Citation Format

Share Document