Accuracy Assessment of Satellite Image Classification Depending on Training Sample

Georg Ruppert; Mushtaq Hussain; Heimo Müller

doi:10.17713/ajs.v28i4.522

Accuracy Assessment of Satellite Image Classification Depending on Training Sample

Austrian Journal of Statistics ◽

10.17713/ajs.v28i4.522 ◽

2016 ◽

Vol 28 (4) ◽

Cited By ~ 3

Author(s):

Georg Ruppert ◽

Mushtaq Hussain ◽

Heimo Müller

Keyword(s):

Classification Accuracy ◽

Accuracy Assessment ◽

Satellite Image ◽

Remote Sensing Data ◽

Ground Truth ◽

Training Sample ◽

Training Set ◽

Sampling Plans ◽

Training Sets

The paper presents a method of predicting classification accuracy of remote sensing data by means of training set analysis. Various sampling plans were applied to satellite image and its complete ground truth to derive different training sets. The quality of these training sets was determined by quantifying the similarity of the training set distributions to the ones of the entire satellite image. Each training set was then used to learn a classifier.The paper shows how the accuracy of classifications that were carried out using these classifiers depends upon the quality of the corresponding training sets.

Download Full-text

Lossy Compression of Multichannel Remote Sensing Images with Quality Control

Remote Sensing ◽

10.3390/rs12223840 ◽

2020 ◽

Vol 12 (22) ◽

pp. 3840

Author(s):

Vladimir Lukin ◽

Irina Vasilyeva ◽

Sergey Krivenko ◽

Fangfang Li ◽

Sergey Abramov ◽

...

Keyword(s):

Remote Sensing ◽

Classification Accuracy ◽

Real Life ◽

Remote Sensing Data ◽

Original Data ◽

Lossy Compression ◽

Compressed Images ◽

Training Methodology ◽

Neural Network Classifiers

Lossy compression is widely used to decrease the size of multichannel remote sensing data. Alongside this positive effect, lossy compression may lead to a negative outcome as making worse image classification. Thus, if possible, lossy compression should be carried out carefully, controlling the quality of compressed images. In this paper, a dependence between classification accuracy of maximum likelihood and neural network classifiers applied to three-channel test and real-life images and quality of compressed images characterized by standard and visual quality metrics is studied. The following is demonstrated. First, a classification accuracy starts to decrease faster when image quality due to compression ratio increasing reaches a distortion visibility threshold. Second, the classes with a wider distribution of features start to “take pixels” from classes with narrower distributions of features. Third, a classification accuracy might depend essentially on the training methodology, i.e., whether features are determined from original data or compressed images. Finally, the drawbacks of pixel-wise classification are shown and some recommendations on how to improve classification accuracy are given.

Download Full-text

Classification Algorithm for Person Identification and Gesture Recognition Based on Hand Gestures with Small Training Sets

Sensors ◽

10.3390/s20247279 ◽

2020 ◽

Vol 20 (24) ◽

pp. 7279

Author(s):

Krzysztof Rzecki

Keyword(s):

Gesture Recognition ◽

Error Rate ◽

Classification Accuracy ◽

Classification Algorithm ◽

Machine Learning Algorithms ◽

Training Data ◽

Training Set ◽

Person Identification ◽

Hand Gestures ◽

Training Sets

Classification algorithms require training data initially labelled by classes to build a model and then to be able to classify the new data. The amount and diversity of training data affect the classification quality and usually the larger the training set, the better the accuracy of classification. In many applications only small amounts of training data are available. This article presents a new time series classification algorithm for problems with small training sets. The algorithm was tested on hand gesture recordings in tasks of person identification and gesture recognition. The algorithm provides significantly better classification accuracy than other machine learning algorithms. For 22 different hand gestures performed by 10 people and the training set size equal to 5 gesture execution records per class, the error rate for the newly proposed algorithm is from 37% to 75% lower than for the other compared algorithms. When the training set consists of only one sample per class the new algorithm reaches from 45% to 95% lower error rate. Conducted experiments indicate that the algorithm outperforms state-of-the-art methods in terms of classification accuracy in the problem of person identification and gesture recognition.

Download Full-text

Combining Supervised and Unsupervised Machine Learning Methods for Phenotypic Functional Genomics Screening

SLAS DISCOVERY Advancing Life Sciences ◽

10.1177/2472555220919345 ◽

2020 ◽

Vol 25 (6) ◽

pp. 655-664

Author(s):

Wienand A. Omta ◽

Roy G. van Heesbeen ◽

Ian Shen ◽

Jacob de Nobel ◽

Desmond Robers ◽

...

Keyword(s):

Machine Learning ◽

Training Set ◽

Data Set ◽

Genome Wide ◽

Machine Learning Model ◽

Exploratory Data ◽

Interfering Rna ◽

Insight Into ◽

Training Sets

There has been an increase in the use of machine learning and artificial intelligence (AI) for the analysis of image-based cellular screens. The accuracy of these analyses, however, is greatly dependent on the quality of the training sets used for building the machine learning models. We propose that unsupervised exploratory methods should first be applied to the data set to gain a better insight into the quality of the data. This improves the selection and labeling of data for creating training sets before the application of machine learning. We demonstrate this using a high-content genome-wide small interfering RNA screen. We perform an unsupervised exploratory data analysis to facilitate the identification of four robust phenotypes, which we subsequently use as a training set for building a high-quality random forest machine learning model to differentiate four phenotypes with an accuracy of 91.1% and a kappa of 0.85. Our approach enhanced our ability to extract new knowledge from the screen when compared with the use of unsupervised methods alone.

Download Full-text

Analogy-preserving functions: A way to extend Boolean samples

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/218 ◽

2017 ◽

Cited By ~ 10

Author(s):

Miguel Couceiro ◽

Nicolas Hug ◽

Henri Prade ◽

Gilles Richard

Keyword(s):

Machine Learning ◽

Theoretical Result ◽

Boolean Functions ◽

Analogical Reasoning ◽

Empirical Investigation ◽

Ground Truth ◽

Training Set ◽

Free Extension ◽

Training Sets

Training set extension is an important issue in machine learning. Indeed when the examples at hand are in a limited quantity, the performances of standard classifiers may significantly decrease and it can be helpful to build additional examples. In this paper, we consider the use of analogical reasoning, and more particularly of analogical proportions for extending training sets. Here the ground truth labels are considered to be given by a (partially known) function. We examine the conditions that are required for such functions to ensure an error-free extension in a Boolean setting. To this end, we introduce the notion of Analogy Preserving (AP) functions, and we prove that their class is the class of affine Boolean functions. This noteworthy theoretical result is complemented with an empirical investigation of approximate AP functions, which suggests that they remain suitable for training set extension.

Download Full-text

The t-SNE Algorithm as a Tool to Improve the Quality of Reference Data Used in Accurate Mapping of Heterogeneous Non-Forest Vegetation

Remote Sensing ◽

10.3390/rs12010039 ◽

2019 ◽

Vol 12 (1) ◽

pp. 39 ◽

Cited By ~ 2

Author(s):

Anna Halladin-Dąbrowska ◽

Adam Kania ◽

Dominik Kopeć

Keyword(s):

Remote Sensing ◽

Reference Data ◽

Remote Sensing Data ◽

Ground Truth ◽

Forest Vegetation ◽

Visual Interpretation ◽

Reference Dataset ◽

Visual Evaluation ◽

Sensing Data

Supervised classification methods, used for many applications, including vegetation mapping require accurate “ground truth” to be effective. Nevertheless, it is common for the quality of this data to be poorly verified prior to it being used for the training and validation of classification models. The fact that noisy or erroneous parts of the reference dataset are not removed is usually explained by the relatively high resistance of some algorithms to errors. The objective of this study was to demonstrate the rationale for cleaning the reference dataset used for the classification of heterogeneous non-forest vegetation, and to present a workflow based on the t-distributed stochastic neighbor embedding (t-SNE) algorithm for the better integration of reference data with remote sensing data in order to improve outcomes. The proposed analysis is a new application of the t-SNE algorithm. The effectiveness of this workflow was tested by classifying three heterogeneous non-forest Natura 2000 habitats: Molinia meadows (Molinion caeruleae; code 6410), species-rich Nardus grassland (code 6230) and dry heaths (code 4030), employing two commonly used algorithms: random forest (RF) and AdaBoost (AB), which, according to the literature, differ in their resistance to errors in reference datasets. Polygons collected in the field (on-ground reference data) in 2016 and 2017, containing no intentional errors, were used as the on-ground reference dataset. The remote sensing data used in the classification were obtained in 2017 during the peak growing season by a HySpex sensor consisting of two imaging spectrometers covering spectral ranges of 0.4–0.9 μm (VNIR-1800) and 0.9–2.5 μm (SWIR-384). The on-ground reference dataset was gradually cleaned by verifying candidate polygons selected by visual interpretation of t-SNE plots. Around 40–50% of candidate polygons were ultimately found to contain errors. Altogether, 15% of reference polygons were removed. As a result, the quality of the final map, as assessed by the Kappa and F1 accuracy measures as well as by visual evaluation, was significantly improved. The global map accuracy increased by about 6% (in Kappa coefficient), relative to the baseline classification obtained using random removal of the same number of reference polygons.

Download Full-text

Object Based and Pixel Based Classification Using Rapideye Satellite Imager of ETI-OSA, Lagos, Nigeria

Geoinformatics FCE CTU ◽

10.14311/gi.15.2.5 ◽

2016 ◽

Vol 15 (2) ◽

pp. 59-70 ◽

Cited By ~ 6

Author(s):

Esther Oluwafunmilayo Makinde ◽

Ayobami Taofeek Salami ◽

James Bolarinwa Olaleye ◽

Oluwapelumi Comfort Okewusi

Keyword(s):

Spatial Information ◽

Accuracy Assessment ◽

Satellite Image ◽

Remote Sensing Data ◽

Bare Soil ◽

Spectral Angle Mapper ◽

Object Based ◽

Object Oriented Approach ◽

Nearest Neighbour Classifier ◽

Source Of Information

Several studies have been carried out to find an appropriate method to classify the remote sensing data. Traditional classification approaches are all pixel-based, and do not utilize the spatial information within an object which is an important source of information to image classification. Thus, this study compared the pixel based and object based classification algorithms using RapidEye satellite image of Eti-Osa LGA, Lagos. In the object-oriented approach, the image was segmented to homogenous area by suitable parameters such as scale parameter, compactness, shape etc. Classification based on segments was done by a nearest neighbour classifier. In the pixel-based classification, the spectral angle mapper was used to classify the images. The user accuracy for each class using object based classification were 98.31% for waterbody, 92.31% for vegetation, 86.67% for bare soil and 90.57% for Built up while the user accuracy for the pixel based classification were 98.28% for waterbody, 84.06% for Vegetation 86.36% and 79.41% for Built up. These classification techniques were subjected to accuracy assessment and the overall accuracy of the Object based classification was 94.47%, while that of Pixel based classification yielded 86.64%. The result of classification and accuracy assessment show that the object-based approach gave more accurate and satisfying results

Download Full-text

Effectiveness of Training Sample and Features for Random Forest on Road Extraction from Unmanned Aerial Vehicle-Based Point Cloud

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211029645 ◽

2021 ◽

pp. 036119812110296

Author(s):

Serkan Biçici ◽

Mustafa Zeybek

Keyword(s):

Random Forest ◽

Unmanned Aerial Vehicle ◽

Classification Accuracy ◽

Point Cloud ◽

Accuracy Assessment ◽

Training Sample ◽

Road Surface ◽

Learning Stage ◽

Training Samples ◽

Aerial Vehicle

The accuracy of random forest (RF) classification depends on several inputs. In this study, two primary inputs—training sample and features—are evaluated for road classification from an unmanned aerial vehicle-based point cloud. Training sample selection is a challenging step since the machine learning stage of the RF classification depends greatly on it. That is, an imbalanced training sample might dramatically decrease classification accuracy. Various criteria are defined to generate different types of training samples to evaluate the effectiveness of the training sample. There are several point features that can be used in RF classification under different circumstances. More features might increase the classification accuracy, however, in that case, the processing time is also increased. Point features such as RGB (red/green/blue), surface normals, curvature, omnivariance, planarity, linearity, surface variance, anisotropy, verticality, and ground/non-ground class are investigated in this study. Different training samples and sets of features are used in the RF to extract the road surface. The experiment is conducted on a local road without a raised curb located on a relatively steep hill. The accuracy assessment is conducted by comparing the model classification results with the manually extracted road surface point cloud. It is found that the accuracy increases up to around 4%–13%, and 95% overall accuracy was obtained when using convenient training samples and features.

Download Full-text

A NEW THINKING OF LULC CLASSIFICATION ACCURACY ASSESSMENT

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-2-w13-1207-2019 ◽

2019 ◽

Vol XLII-2/W13 ◽

pp. 1207-1211

Author(s):

K. S. Cheng ◽

J. Y. Ling ◽

T. W. Lin ◽

Y. T. Liu ◽

Y. C. Shen ◽

...

Keyword(s):

Classification Accuracy ◽

Accuracy Assessment ◽

Confusion Matrix ◽

Reference Sample ◽

Training Sample ◽

Training Data ◽

Data Uncertainty ◽

Bootstrap Simulation ◽

New Thinking ◽

Global Accuracy

<p><strong>Abstract.</strong> A majority of studies involving remote sensing LULC classification conducted classification accuracy assessment without consideration of the training data uncertainty. In this study we present new concepts of LULC classification accuracies, namely the training-sample-based global accuracy and the classifier global accuracy, and a general expression of different measures of classification accuracy in terms of the sample dataset for classifier training and the sample dataset for evaluation of classification results. Through stochastic simulation of a two-feature and two-class case, we demonstrate that the training-sample confusion matrix should replace the commonly adopted reference-sample confusion matrix for evaluation of LULC classification results. We then propose a bootstrap-simulation approach for establishing 95% confidence intervals of classifier global accuracies.</p>

Download Full-text

Multichannel satellite image application for water surface objects identification

E3S Web of Conferences ◽

10.1051/e3sconf/202021007005 ◽

2020 ◽

Vol 210 ◽

pp. 07005

Author(s):

Natalia Panasenko ◽

Marina Ganzhur ◽

Alexey Ganzhur ◽

Vladimir Fathi

Keyword(s):

Shallow Water ◽

Satellite Image ◽

Water Reservoir ◽

Remote Sensing Data ◽

Satellite Observation ◽

Observation Data ◽

Implementation Method ◽

Biological Kinetics ◽

Kinetics Of

The paper is devoted to the analysis of methods of adoption of satellite observation data in order to identify the required information used in the development and verification of mathematical models of hydrodynamics and biological kinetics of shallow water reservoirs. For the information accumulation, we consider the use of remote sensing data. The aim of the paper is to identify the best implementation method for software tools in order to improve the quality of assimilation of date of satellite sensing of the Earth relating to hydrobiological processes in a shallow water reservoir.

Download Full-text

A No-Reference CNN-Based Super-Resolution Method for KOMPSAT-3 Using Adaptive Image Quality Modification

Remote Sensing ◽

10.3390/rs13163301 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3301

Author(s):

Yeonju Choi ◽

Sanghyuck Han ◽

Yongwoo Kim

Keyword(s):

Image Quality ◽

Satellite Images ◽

Satellite Image ◽

Super Resolution ◽

Ground Truth ◽

Perceptual Quality ◽

High Quality ◽

Degradation Model ◽

Sr Method

In recent years, research on increasing the spatial resolution and enhancing the quality of satellite images using the deep learning-based super-resolution (SR) method has been actively conducted. In a remote sensing field, conventional SR methods required high-quality satellite images as the ground truth. However, in most cases, high-quality satellite images are difficult to acquire because many image distortions occur owing to various imaging conditions. To address this problem, we propose an adaptive image quality modification method to improve SR image quality for the KOrea Multi-Purpose Satellite-3 (KOMPSAT-3). The KOMPSAT-3 is a high performance optical satellite, which provides 0.7-m ground sampling distance (GSD) panchromatic and 2.8-m GSD multi-spectral images for various applications. We proposed an SR method with a scale factor of 2 for the panchromatic and pan-sharpened images of KOMPSAT-3. The proposed SR method presents a degradation model that generates a low-quality image for training, and a method for improving the quality of the raw satellite image. The proposed degradation model for low-resolution input image generation is based on Gaussian noise and blur kernel. In addition, top-hat and bottom-hat transformation is applied to the original satellite image to generate an enhanced satellite image with improved edge sharpness or image clarity. Using this enhanced satellite image as the ground truth, an SR network is then trained. The performance of the proposed method was evaluated by comparing it with other SR methods in multiple ways, such as edge extraction, visual inspection, qualitative analysis, and the performance of object detection. Experimental results show that the proposed SR method achieves improved reconstruction results and perceptual quality compared to conventional SR methods.

Download Full-text