The Effect of Block Size, Training Set and K-Value in the Classification of Food Grains Using HSI Color Model

Optimization of K Value in KNN Algorithm for Spam and Ham Email Classification

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i2.1845 ◽

2020 ◽

Vol 4 (2) ◽

pp. 377-383

Author(s):

Eko Laksono ◽

Achmad Basuki ◽

Fitra Bachtiar

Keyword(s):

Frequency Distribution ◽

Confusion Matrix ◽

High Accuracy ◽

K Value ◽

A Value ◽

Optimal Value ◽

Classification Evaluation ◽

Email Spam ◽

Email Classification

There are many cases of email abuse that have the potential to harm others. This email abuse is commonly known as spam, which contains advertisements, phishing scams, and even malware. This study purpose to know the classification of email spam with ham using the KNN method as an effort to reduce the amount of spam. KNN can classify spam or ham in an email by checking it using a different K value approach. The results of the classification evaluation using confusion matrix resulted in the KNN method with a value of K = 1 having the highest accuracy value of 91.4%. From the results of the study, it is known that the optimization of the K value in KNN using frequency distribution clustering can produce high accuracy of 100%, while k-means clustering produces an accuracy of 99%. So based on the results of the existing accuracy values, the frequency distribution clustering and k-means clustering can be used to optimize the K-optimal value of the KNN in the classification of existing spam emails.

Download Full-text

UAV Image Multi-Labeling with Data-Efficient Transformers

Applied Sciences ◽

10.3390/app11093974 ◽

2021 ◽

Vol 11 (9) ◽

pp. 3974

Author(s):

Laila Bashmal ◽

Yakoub Bazi ◽

Mohamad Mahmoud Al Rahhal ◽

Haikel Alhichri ◽

Naif Al Ajlan

Keyword(s):

Data Augmentation ◽

Feature Representation ◽

Aerial Image ◽

Remote Sensing Images ◽

Training Set ◽

Proposed Model ◽

Class Labels ◽

Using Data ◽

Uav Image

In this paper, we present an approach for the multi-label classification of remote sensing images based on data-efficient transformers. During the training phase, we generated a second view for each image from the training set using data augmentation. Then, both the image and its augmented version were reshaped into a sequence of flattened patches and then fed to the transformer encoder. The latter extracts a compact feature representation from each image with the help of a self-attention mechanism, which can handle the global dependencies between different regions of the high-resolution aerial image. On the top of the encoder, we mounted two classifiers, a token and a distiller classifier. During training, we minimized a global loss consisting of two terms, each corresponding to one of the two classifiers. In the test phase, we considered the average of the two classifiers as the final class labels. Experiments on two datasets acquired over the cities of Trento and Civezzano with a ground resolution of two-centimeter demonstrated the effectiveness of the proposed model.

Download Full-text

Classification of multiwavelength transients with Machine Learning

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa3873 ◽

2020 ◽

Author(s):

K Sooknunan ◽

M Lochner ◽

Bruce A Bassett ◽

H V Peiris ◽

R Fender ◽

...

Keyword(s):

Machine Learning ◽

Small Sample ◽

Light Curves ◽

Machine Learning Techniques ◽

Optical Data ◽

Test Time ◽

Test Accuracy ◽

Training Set ◽

The Impact

Abstract With the advent of powerful telescopes such as the Square Kilometer Array and the Vera C. Rubin Observatory, we are entering an era of multiwavelength transient astronomy that will lead to a dramatic increase in data volume. Machine learning techniques are well suited to address this data challenge and rapidly classify newly detected transients. We present a multiwavelength classification algorithm consisting of three steps: (1) interpolation and augmentation of the data using Gaussian processes; (2) feature extraction using wavelets; (3) classification with random forests. Augmentation provides improved performance at test time by balancing the classes and adding diversity into the training set. In the first application of machine learning to the classification of real radio transient data, we apply our technique to the Green Bank Interferometer and other radio light curves. We find we are able to accurately classify most of the eleven classes of radio variables and transients after just eight hours of observations, achieving an overall test accuracy of 78%. We fully investigate the impact of the small sample size of 82 publicly available light curves and use data augmentation techniques to mitigate the effect. We also show that on a significantly larger simulated representative training set that the algorithm achieves an overall accuracy of 97%, illustrating that the method is likely to provide excellent performance on future surveys. Finally, we demonstrate the effectiveness of simultaneous multiwavelength observations by showing how incorporating just one optical data point into the analysis improves the accuracy of the worst performing class by 19%.

Download Full-text

Study on Consistency Analysis in Text Categorization

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.539.181 ◽

2014 ◽

Vol 539 ◽

pp. 181-184

Author(s):

Wan Li Zuo ◽

Zhi Yan Wang ◽

Ning Ma ◽

Hong Liang

Keyword(s):

Text Categorization ◽

Training Data ◽

Experimental Result ◽

Final Decision ◽

Consistency Analysis ◽

Training Set ◽

Weak Classifier ◽

Data Set ◽

Basic Premise

Accurate classification of text is a basic premise of extracting various types of information on the Web efficiently and utilizing the network resources properly. In this paper, a brand new text classification method was proposed. Consistency analysis method is a type of iterative algorithm, which mainly trains different classifiers (weak classifier) by aiming at the same training set, and then these classifiers will be gathered for testing the consistency degrees of various classification methods for the same text, thus to manifest the knowledge of each type of classifier. It main determines the weight of each sample according to the fact is the classification of each sample is accurate in each training set, as well as the accuracy of the last overall classification, and then sends the new data set whose weight has been modified to the subordinate classifier for training. In the end, the classifier gained in the training will be integrated as the final decision classifier. The classifier with consistency analysis can eliminate some unnecessary training data characteristics and place the key words on key training data. According to the experimental result, the average accuracy of this method is 91.0%, while the average recall rate is 88.1%.

Download Full-text

A Low-Light Sensor Image Enhancement Algorithm Based on HSI Color Model

Sensors ◽

10.3390/s18103583 ◽

2018 ◽

Vol 18 (10) ◽

pp. 3583 ◽

Cited By ~ 8

Author(s):

Shiping Ma ◽

Hongqiang Ma ◽

Yuelei Xu ◽

Shuai Li ◽

Chao Lv ◽

...

Keyword(s):

Image Enhancement ◽

Color Space ◽

Color Model ◽

Low Light ◽

Image Brightness ◽

Low Contrast ◽

Light Sensor ◽

Hsi Color Model ◽

Art Research ◽

Enhancement Algorithm

Images captured by sensors in unpleasant environment like low illumination condition are usually degraded, which means low visibility, low brightness, and low contrast. In order to improve this kind of images, in this paper, a low-light sensor image enhancement algorithm based on HSI color model is proposed. At first, we propose a dataset generation method based on the Retinex model to overcome the shortage of sample data. Then, the original low-light image is transformed from RGB to HSI color space. The segmentation exponential method is used to process the saturation (S) and the specially designed Deep Convolutional Neural Network is applied to enhance the intensity component (I). At the end, we back into the original RGB space to get the final improved image. Experimental results show that the proposed algorithm not only enhances the image brightness and contrast significantly, but also avoids color distortion and over-enhancement in comparison with some other state-of-the-art research papers. So, it effectively improves the quality of sensor images.

Download Full-text

Classification of Cancer Recurrence with Alpha-Beta BAM

Mathematical Problems in Engineering ◽

10.1155/2009/680212 ◽

2009 ◽

Vol 2009 ◽

pp. 1-14 ◽

Cited By ~ 2

Author(s):

María Elena Acevedo ◽

Marco Antonio Acevedo ◽

Federico Felipe

Keyword(s):

Breast Cancer ◽

Stable State ◽

Cancer Recurrence ◽

Breast Cancer Surgery ◽

Associative Memories ◽

Training Set ◽

Perfect Recall ◽

Alpha Beta ◽

Leave One Out

Bidirectional Associative Memories (BAMs) based on first model proposed by Kosko do not have perfect recall of training set, and their algorithm must iterate until it reaches a stable state. In this work, we use the model of Alpha-Beta BAM to classify automatically cancer recurrence in female patients with a previous breast cancer surgery. Alpha-Beta BAM presents perfect recall of all the training patterns and it has a one-shot algorithm; these advantages make to Alpha-Beta BAM a suitable tool for classification. We use data from Haberman database, and leave-one-out algorithm was applied to analyze the performance of our model as classifier. We obtain a percentage of classification of 99.98%.

Download Full-text

Mean-Based Breakpoint Selection on Circular Histogram

Mathematical Problems in Engineering ◽

10.1155/2021/5966463 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Jiulun Fan ◽

Jipeng Yang

Keyword(s):

Selection Criteria ◽

Color Image ◽

Selection Criterion ◽

Threshold Value ◽

Simple Expression ◽

Circular Data ◽

Cumulative Distribution ◽

Color Model ◽

Hsi Color Model ◽

Selection Of

Circular histogram represents the statistical distribution of circular data; the H component histogram of HSI color model is a typical example of the circular histogram. When using H component to segment color image, a feasible way is to transform the circular histogram into a linear histogram, and then, the mature gray image thresholding methods are used on the linear histogram to select the threshold value. Thus, the reasonable selection of the breakpoint on circular histogram to linearize the circular histogram is the key. In this paper, based on the angles mean on circular histogram and the line mean on linear histogram, a simple breakpoint selection criterion is proposed, and the suitable range of this method is analyzed. Compared with the existing breakpoint selection criteria based on Lorenz curve and cumulative distribution entropy, the proposed method has the advantages of simple expression and less calculation and does not depend on the direction of rotation.

Download Full-text

1D conditional generative adversarial network for spectrum-to-spectrum translation of simulated chemical reflectance signatures

Journal of Spectral Imaging ◽

10.1255/jsi.2021.a2 ◽

2021 ◽

Author(s):

Cara Murphy ◽

John Kerekes

Keyword(s):

Classification Accuracy ◽

Domain Adaptation ◽

Real Data ◽

Training Set ◽

Generative Adversarial Network ◽

Average Classification Accuracy ◽

Adversarial Network ◽

Chemical Residues ◽

Reflectance Data

The classification of trace chemical residues through active spectroscopic sensing is challenging due to the lack of physics-based models that can accurately predict spectra. To overcome this challenge, we leveraged the field of domain adaptation to translate data from the simulated to the measured domain for training a classifier. We developed the first 1D conditional generative adversarial network (GAN) to perform spectrum-to-spectrum translation of reflectance signatures. We applied the 1D conditional GAN to a library of simulated spectra and quantified the improvement in classification accuracy on real data using the translated spectra for training the classifier. Using the GAN-translated library, the average classification accuracy increased from 0.622 to 0.723 on real chemical reflectance data, including data from chemicals not included in the GAN training set.

Download Full-text

Retrained Classification of Tyrosinase Inhibitors and “In Silico” Potency Estimation by Using Atom-Type Linear Indices

Methodologies and Applications for Chemoinformatics and Chemical Engineering ◽

10.4018/978-1-4666-4010-8.ch021 ◽

2013 ◽

pp. 322-427

Keyword(s):

External Validation ◽

Correlation Coefficients ◽

Classification Models ◽

Training Set ◽

Linear Discriminant ◽

Oecd Principles ◽

Qsar Models ◽

Validation Set ◽

Global Accuracy

In this paper, the authors present an effort to increase the applicability domain (AD) by means of retraining models using a database of 701 great dissimilar molecules presenting anti-tyrosinase activity and 728 drugs with other uses. Atom-based linear indices and best subset linear discriminant analysis (LDA) were used to develop individual classification models. Eighteen individual classification-based QSAR models for the tyrosinase inhibitory activity were obtained with global accuracy varying from 88.15-91.60% in the training set and values of Matthews correlation coefficients (C) varying from 0.76-0.82. The external validation set shows globally classifications above 85.99% and 0.72 for C. All individual models were validated and fulfilled by OECD principles. A brief analysis of AD for the training set of 478 compounds and the new active compounds included in the re-training was carried out. Various assembled multiclassifier systems contained eighteen models using different selection criterions were obtained, which provide possibility of select the best strategy for particular problem. The various assembled multiclassifier systems also estimated the potency of active identified compounds. Eighteen validated potency models by OECD principles were used.

Download Full-text

Preliminary classification of mass spectral patterns using a simplified learning machine

Australian Journal of Chemistry ◽

10.1071/ch9731955 ◽

1973 ◽

Vol 26 (9) ◽

pp. 1955 ◽

Cited By ~ 1

Author(s):

RJ Mathews

Keyword(s):

Mass Spectra ◽

Functional Group ◽

Training Process ◽

Training Set ◽

Mass Spectral ◽

Learning Machine ◽

Spectral Patterns ◽

Feature Pattern ◽

Preliminary Classification

A simplified learning machine technique is presented which is suitable for forming preliminary classification of patterns. This technique allows preliminary compression of data prior to the training process, and generates a reliable classifier even when there are linearly inseparable data in the training set. This method has been used to form an eight-feature pattern classifier which identifies, directly from their mass spectra, compounds of the structure (RO)2P(=X)Y where R is H, Me, or Et; X is O or S; and Y is any functional group.

Download Full-text