Accelerating Relevance-Vector-Machine-Based Classification of Hyperspectral Image with Parallel Computing

Mathematical Problems in Engineering ◽

10.1155/2012/252979 ◽

2012 ◽

Vol 2012 ◽

pp. 1-13

Author(s):

Chao Dong ◽

Lianfang Tian

Keyword(s):

Parallel Computing ◽

Large Scale ◽

Message Passing Interface ◽

Hyperspectral Image ◽

Relevance Vector Machine ◽

Support Vector ◽

Training Procedure ◽

Data Set ◽

Computing Technique

Benefiting from the kernel skill and the sparse property, the relevance vector machine (RVM) could acquire a sparse solution, with an equivalent generalization ability compared with the support vector machine. The sparse property requires much less time in the prediction, making RVM potential in classifying the large-scale hyperspectral image. However, RVM is not widespread influenced by its slow training procedure. To solve the problem, the classification of the hyperspectral image using RVM is accelerated by the parallel computing technique in this paper. The parallelization is revealed from the aspects of the multiclass strategy, the ensemble of multiple weak classifiers, and the matrix operations. The parallel RVMs are implemented using the C language plus the parallel functions of the linear algebra packages and the message passing interface library. The proposed methods are evaluated by the AVIRIS Indian Pines data set on the Beowulf cluster and the multicore platforms. It shows that the parallel RVMs accelerate the training procedure obviously.

Download Full-text

Land Cover Classification of Nine Perennial Crops Using Sentinel-1 and -2 Data

Remote Sensing ◽

10.3390/rs12010096 ◽

2019 ◽

Vol 12 (1) ◽

pp. 96 ◽

Cited By ~ 6

Author(s):

James Brinkhoff ◽

Justin Vardanega ◽

Andrew J. Robson

Keyword(s):

Land Cover ◽

Large Scale ◽

Satellite Image ◽

Resource Planning ◽

Support Vector ◽

Perennial Crop ◽

Perennial Crops ◽

Object Based ◽

Rbf Kernel

Land cover mapping of intensive cropping areas facilitates an enhanced regional response to biosecurity threats and to natural disasters such as drought and flooding. Such maps also provide information for natural resource planning and analysis of the temporal and spatial trends in crop distribution and gross production. In this work, 10 meter resolution land cover maps were generated over a 6200 km2 area of the Riverina region in New South Wales (NSW), Australia, with a focus on locating the most important perennial crops in the region. The maps discriminated between 12 classes, including nine perennial crop classes. A satellite image time series (SITS) of freely available Sentinel-1 synthetic aperture radar (SAR) and Sentinel-2 multispectral imagery was used. A segmentation technique grouped spectrally similar adjacent pixels together, to enable object-based image analysis (OBIA). K-means unsupervised clustering was used to filter training points and classify some map areas, which improved supervised classification of the remaining areas. The support vector machine (SVM) supervised classifier with radial basis function (RBF) kernel gave the best results among several algorithms trialled. The accuracies of maps generated using several combinations of the multispectral and radar bands were compared to assess the relative value of each combination. An object-based post classification refinement step was developed, enabling optimization of the tradeoff between producers’ accuracy and users’ accuracy. Accuracy was assessed against randomly sampled segments, and the final map achieved an overall count-based accuracy of 84.8% and area-weighted accuracy of 90.9%. Producers’ accuracies for the perennial crop classes ranged from 78 to 100%, and users’ accuracies ranged from 63 to 100%. This work develops methods to generate detailed and large-scale maps that accurately discriminate between many perennial crops and can be updated frequently.

Download Full-text

Support Vector Machines in Big Data Classification: A Systematic Literature Review

10.21203/rs.3.rs-663359/v1 ◽

2021 ◽

Author(s):

Mohammad Hassan Almaspoor ◽

Ali Safaei ◽

Afshin Salajegheh ◽

Behrouz Minaei-Bidgoli

Keyword(s):

Machine Learning ◽

Big Data ◽

Large Scale ◽

Support Vector ◽

Research Areas ◽

Large Scale Data ◽

Training Samples ◽

Big Data Classification ◽

Scale Data

Abstract Classification is one of the most important and widely used issues in machine learning, the purpose of which is to create a rule for grouping data to sets of pre-existing categories is based on a set of training sets. Employed successfully in many scientific and engineering areas, the Support Vector Machine (SVM) is among the most promising methods of classification in machine learning. With the advent of big data, many of the machine learning methods have been challenged by big data characteristics. The standard SVM has been proposed for batch learning in which all data are available at the same time. The SVM has a high time complexity, i.e., increasing the number of training samples will intensify the need for computational resources and memory. Hence, many attempts have been made at SVM compatibility with online learning conditions and use of large-scale data. This paper focuses on the analysis, identification, and classification of existing methods for SVM compatibility with online conditions and large-scale data. These methods might be employed to classify big data and propose research areas for future studies. Considering its advantages, the SVM can be among the first options for compatibility with big data and classification of big data. For this purpose, appropriate techniques should be developed for data preprocessing in order to covert data into an appropriate form for learning. The existing frameworks should also be employed for parallel and distributed processes so that SVMs can be made scalable and properly online to be able to handle big data.

Download Full-text

An Incremental Isomap Method for Hyperspectral Dimensionality Reduction and Classification

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.87.7.445 ◽

2021 ◽

Vol 87 (6) ◽

pp. 445-455

Author(s):

Yi Ma ◽

Zezhong Zheng ◽

Yutang Ma ◽

Mingcang Zhu ◽

Ran Huang ◽

...

Keyword(s):

Manifold Learning ◽

Nearest Neighbor ◽

Hyperspectral Image ◽

Hyperspectral Data ◽

Training Data ◽

Support Vector ◽

Data Sets ◽

K Nearest Neighbor ◽

Data Set ◽

Data Points

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.

Download Full-text

An iterative method for classification of binary data

Information and Inference A Journal of the IMA ◽

10.1093/imaiai/iaaa003 ◽

2020 ◽

Author(s):

Denali Molitor ◽

Deanna Needell

Keyword(s):

Binary Data ◽

Large Scale ◽

Support Vector ◽

Large Scale Data ◽

Classification Framework ◽

Vector Machines ◽

Inference Methods ◽

Compressed Data ◽

Scale Data

Abstract In today’s data-driven world, storing, processing and gleaning insights from large-scale data are major challenges. Data compression is often required in order to store large amounts of high-dimensional data, and thus, efficient inference methods for analyzing compressed data are necessary. Building on a recently designed simple framework for classification using binary data, we demonstrate that one can improve classification accuracy of this approach through iterative applications whose output serves as input to the next application. As a side consequence, we show that the original framework can be used as a data preprocessing step to improve the performance of other methods, such as support vector machines. For several simple settings, we showcase the ability to obtain theoretical guarantees for the accuracy of the iterative classification method. The simplicity of the underlying classification framework makes it amenable to theoretical analysis.

Download Full-text

Particle Center Supported Plane for Subsurface Target Classification based on Full Polarimetric Ground Penetrating Radar

Remote Sensing ◽

10.3390/rs11040405 ◽

2019 ◽

Vol 11 (4) ◽

pp. 405

Author(s):

Xuan Feng ◽

Haoqiu Zhou ◽

Cai Liu ◽

Yan Zhang ◽

Wenjing Liang ◽

...

Keyword(s):

Ground Penetrating Radar ◽

Decomposition Methods ◽

Support Vector ◽

Classification Methods ◽

Target Classification ◽

Data Set ◽

Particle Center ◽

Ground Penetrating ◽

New Criterion

The subsurface target classification of ground penetrating radar (GPR) is a popular topic in the field of geophysics. Among the existing classification methods, geometrical features and polarimetric attributes of targets are primarily used. As polarimetric attributes contain more information of targets, polarimetric decomposition methods, such as H-Alpha decomposition, have been developed for target classification of GPR in recent years. However, the classification template used in H-Alpha classification is preset depending on the experience of synthetic aperture radar (SAR); therefore, it may not be suitable for GPR. Moreover, many existing classification methods require excessive human operation, particularly when outliers exist in the sample (the data set containing the features of targets); therefore, they are not efficient or intelligent. We herein propose a new machine learning method based on sample centers, i.e., particle center supported plane (PCSP). The sample center is defined as the point with the smallest sum of distances from all points in the same sample, which is considered as a better representation of the sample without significant effect of the outliers. In this proposed method, particle swarm optimization (PSO) is performed to obtain the sample centers; the new criterion for subsurface target classification is achieved. We applied this algorithm to full polarimetric GPR data measured in the laboratory and outdoors. The results indicate that, comparing with support vector machine (SVM) and classical H-Alpha classification, this new method is more efficient and the accuracy is relatively high.

Download Full-text

Interpretive MPI for Parallel Computing

Volume 3: 28th Computers and Information in Engineering Conference, Parts A and B ◽

10.1115/detc2008-49996 ◽

2008 ◽

Author(s):

Yu-Cheng Chou ◽

Harry H. Cheng

Keyword(s):

Parallel Computing ◽

Programming Languages ◽

Message Passing ◽

Large Scale ◽

Message Passing Interface ◽

Rapid Development ◽

Web Based ◽

Heterogeneous Platforms ◽

C Programs ◽

Computation Speedup

Message Passing Interface (MPI) is a standardized library specification designed for message-passing parallel programming on large-scale distributed systems. A number of MPI libraries have been implemented to allow users to develop portable programs using the scientific programming languages, Fortran, C and C++. Ch is an embeddable C/C++ interpreter that provides an interpretive environment for C/C++ based scripts and programs. Combining Ch with any MPI C/C++ library provides the functionality for rapid development of MPI C/C++ programs without compilation. In this article, the method of interfacing Ch scripts with MPI C implementations is introduced by using the MPICH2 C library as an example. The MPICH2-based Ch MPI package provides users with the ability to interpretively run MPI C program based on the MPICH2 C library. Running MPI programs through the MPICH2-based Ch MPI package across heterogeneous platforms consisting of Linux and Windows machines is illustrated. Comparisons for the bandwidth, latency, and parallel computation speedup between C MPI, Ch MPI, and MPI for Python in an Ethernet-based environment comprising identical Linux machines are presented. A Web-based example is given to demonstrate the use of Ch and MPICH2 in C based CGI scripting to facilitate the development of Web-based applications for parallel computing.

Download Full-text

Large-Scale Distributed Training Applied to Generative Adversarial Networks for Calorimeter Simulation

EPJ Web of Conferences ◽

10.1051/epjconf/201921406025 ◽

2019 ◽

Vol 214 ◽

pp. 06025

Author(s):

Jean-Roch Vlimant ◽

Felice Pantaleo ◽

Maurizio Pierini ◽

Vladimir Loncar ◽

Sofia Vallecorsa ◽

...

Keyword(s):

Large Scale ◽

Message Passing Interface ◽

Scale Up ◽

High Energy ◽

Neural Nets ◽

Generative Adversarial Networks ◽

Detector Response ◽

Data Set ◽

Distributed Training ◽

Adversarial Networks

In recent years, several studies have demonstrated the benefit of using deep learning to solve typical tasks related to high energy physics data taking and analysis. In particular, generative adversarial networks are a good candidate to supplement the simulation of the detector response in a collider environment. Training of neural network models has been made tractable with the improvement of optimization methods and the advent of GP-GPU well adapted to tackle the highly-parallelizable task of training neural nets. Despite these advancements, training of large models over large data sets can take days to weeks. Even more so, finding the best model architecture and settings can take many expensive trials. To get the best out of this new technology, it is important to scale up the available network-training resources and, consequently, to provide tools for optimal large-scale distributed training. In this context, our development of a new training workflow, which scales on multi-node/multi-GPU architectures with an eye to deployment on high performance computing machines is described. We describe the integration of hyper parameter optimization with a distributed training framework using Message Passing Interface, for models defined in keras [12] or pytorch [13]. We present results on the speedup of training generative adversarial networks trained on a data set composed of the energy deposition from electron, photons, charged and neutral hadrons in a fine grained digital calorimeter.

Download Full-text

Support vector machines classification of fluorescence hyperspectral image for detection of aflatoxin in corn kernels

2013 5th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS) ◽

10.1109/whispers.2013.8080645 ◽

2013 ◽

Cited By ~ 1

Author(s):

Sathishkumar Samiappan ◽

Lori M Bruce ◽

Haibo Yao ◽

Zuzana Hruska ◽

Robert L. Brown ◽

...

Keyword(s):

Support Vector Machines ◽

Hyperspectral Image ◽

Support Vector ◽

Vector Machines ◽

Corn Kernels

Download Full-text

DIALOG ACT CLASSIFICATION USING ACOUSTIC AND DISCOURSE INFORMATION OF MAPTASK DATA

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026810002926 ◽

2010 ◽

Vol 09 (04) ◽

pp. 289-311 ◽

Cited By ~ 2

Author(s):

FATEMA N. JULIA ◽

KHAN M. IFTEKHARUDDIN ◽

ATIQ U. ISLAM

Keyword(s):

Classifier Fusion ◽

Support Vector ◽

Acoustic Features ◽

Average Precision ◽

Data Set ◽

Parts Of Speech ◽

Pos Tagging ◽

Accuracy Rates ◽

Better Than

Dialog act (DA) classification is useful to understand the intentions of a human speaker. An effective classification of DA can be exploited for realistic implementation of expert systems. In this work, we investigate DA classification using both acoustic and discourse information for HCRC MapTask data. We extract several different acoustic features and exploit these features using a Hidden Markov Model (HMM) network to classify acoustic information. For discourse feature extraction, we propose a novel parts-of-speech (POS) tagging technique that effectively reduces the dimensionality of discourse features. To classify discourse information, we exploit two classifiers such as a HMM and Support Vector Machine (SVM). We further obtain classifier fusion between HMM and SVM to improve discourse classification. Finally, we perform an efficient decision-level classifier fusion for both acoustic and discourse information to classify 12 different DAs in MapTask data. We obtain 65.2% and 55.4% DA classification rates using acoustic and discourse information, respectively. Furthermore, we obtain combined accuracy of 68.6% for DA classification using both acoustic and discourse information. These accuracy rates of DA classification are either comparable or better than previously reported results for the same data set. For average precision and recall, we obtain accuracy rates of 74.89% and 69.83%, respectively. Therefore, we obtain much better precision and recall rates for most of the classified DAs when compared to existing works on the same HCRC MapTask data set.

Download Full-text

KDClassifier: A urinary proteomic spectra analysis tool based on machine learning for the classification of kidney diseases

Aging Pathobiology and Therapeutics ◽

10.31491/apt.2021.09.064 ◽

2021 ◽

Vol 3 (3) ◽

pp. 63-72

Author(s):

Wanjun Zhao ◽

Keyword(s):

Kidney Disease ◽

Kidney Diseases ◽

Confusion Matrix ◽

Gradient Boosting ◽

Support Vector ◽

Diagnostic Model ◽

Analysis Tool ◽

Data Set ◽

Extreme Gradient Boosting

Background: We aimed to establish a novel diagnostic model for kidney diseases by combining artificial intelligence with complete mass spectrum information from urinary proteomics. Methods: We enrolled 134 patients (IgA nephropathy, membranous nephropathy, and diabetic kidney disease) and 68 healthy participants as controls, with a total of 610,102 mass spectra from their urinary proteomic profiles. The training data set (80%) was used to create a diagnostic model using XGBoost, random forest (RF), a support vector machine (SVM), and artificial neural networks (ANNs). The diagnostic accuracy was evaluated using a confusion matrix with a test dataset (20%). We also constructed receiver operating-characteristic, Lorenz, and gain curves to evaluate the diagnostic model. Results: Compared with the RF, SVM, and ANNs, the modified XGBoost model, called Kidney Disease Classifier (KDClassifier), showed the best performance. The accuracy of the XGBoost diagnostic model was 96.03%. The area under the curve of the extreme gradient boosting (XGBoost) model was 0.952 (95% confidence interval, 0.9307–0.9733). The Kolmogorov-Smirnov (KS) value of the Lorenz curve was 0.8514. The Lorenz and gain curves showed the strong robustness of the developed model. Conclusions: The KDClassifier achieved high accuracy and robustness and thus provides a potential tool for the classification of kidney diseases

Download Full-text