Empirical Evaluation on Utilizing CNN-Features for Seismic Patch Classification

Chunxia Zhang; Xiaoli Wei; Sang-Woon Kim

doi:10.3390/app12010197

Empirical Evaluation on Utilizing CNN-Features for Seismic Patch Classification

Applied Sciences ◽

10.3390/app12010197 ◽

2021 ◽

Vol 12 (1) ◽

pp. 197

Author(s):

Chunxia Zhang ◽

Xiaoli Wei ◽

Sang-Woon Kim

Keyword(s):

Transfer Learning ◽

Empirical Evaluation ◽

Classification Performance ◽

Support Vector ◽

Target Domain ◽

Complexity Measures ◽

Processing Times ◽

Vector Machines ◽

Patch Image ◽

Learning Data

This paper empirically evaluates two kinds of features, which are extracted, respectively, with traditional statistical methods and convolutional neural networks (CNNs), in order to improve the performance of seismic patch image classification. In the latter case, feature vectors, named “CNN-features”, were extracted from one trained CNN model, and were then used to learn existing classifiers, such as support vector machines. In this case, to learn the CNN model, a technique of transfer learning using synthetic seismic patch data in the source domain, and real-world patch data in the target domain, was applied. The experimental results show that CNN-features lead to some improvements in the classification performance. By analyzing the data complexity measures, the CNN-features are found to have the strongest discriminant capabilities. Furthermore, the transfer learning technique alleviates the problems of long processing times and the lack of learning data.

Download Full-text

Impact of Dataset Size on Classification Performance: An Empirical Evaluation in the Medical Domain

Applied Sciences ◽

10.3390/app11020796 ◽

2021 ◽

Vol 11 (2) ◽

pp. 796

Author(s):

Alhanoof Althnian ◽

Duaa AlSaeed ◽

Heyam Al-Baity ◽

Amani Samha ◽

Alanoud Bin Dris ◽

...

Keyword(s):

Empirical Evaluation ◽

Classification Performance ◽

Support Vector ◽

Robust Model ◽

Original Distribution ◽

C4.5 Decision Tree ◽

Dataset Size ◽

Overall Performance ◽

Medical Domain ◽

The Impact

Dataset size is considered a major concern in the medical domain, where lack of data is a common occurrence. This study aims to investigate the impact of dataset size on the overall performance of supervised classification models. We examined the performance of six widely-used models in the medical field, including support vector machine (SVM), neural networks (NN), C4.5 decision tree (DT), random forest (RF), adaboost (AB), and naïve Bayes (NB) on eighteen small medical UCI datasets. We further implemented three dataset size reduction scenarios on two large datasets and analyze the performance of the models when trained on each resulting dataset with respect to accuracy, precision, recall, f-score, specificity, and area under the ROC curve (AUC). Our results indicated that the overall performance of classifiers depend on how much a dataset represents the original distribution rather than its size. Moreover, we found that the most robust model for limited medical data is AB and NB, followed by SVM, and then RF and NN, while the least robust model is DT. Furthermore, an interesting observation is that a robust machine learning model to limited dataset does not necessary imply that it provides the best performance compared to other models.

Download Full-text

Creating a Chinese suicide dictionary for identifying suicide risk on social media

PeerJ ◽

10.7717/peerj.1455 ◽

2015 ◽

Vol 3 ◽

pp. e1455 ◽

Cited By ~ 10

Author(s):

Meizhen Lv ◽

Ang Li ◽

Tianli Liu ◽

Tingshao Zhu

Keyword(s):

Social Media ◽

Suicide Risk ◽

Classification Performance ◽

Support Vector ◽

Accurate Identification ◽

Vector Machines ◽

Social Media Service ◽

Linguistic Inquiry ◽

Suicide Prevention Programs ◽

Expert Ratings

Introduction.Suicide has become a serious worldwide epidemic. Early detection of individual suicide risk in population is important for reducing suicide rates. Traditional methods are ineffective in identifying suicide risk in time, suggesting a need for novel techniques. This paper proposes to detect suicide risk on social media using a Chinese suicide dictionary.Methods.To build the Chinese suicide dictionary, eight researchers were recruited to select initial words from 4,653 posts published on Sina Weibo (the largest social media service provider in China) and two Chinese sentiment dictionaries (HowNet and NTUSD). Then, another three researchers were recruited to filter out irrelevant words. Finally, remaining words were further expanded using a corpus-based method. After building the Chinese suicide dictionary, we tested its performance in identifying suicide risk on Weibo. First, we made a comparison of the performance in both detecting suicidal expression in Weibo posts and evaluating individual levels of suicide risk between the dictionary-based identifications and the expert ratings. Second, to differentiate between individuals with high and non-high scores on self-rating measure of suicide risk (Suicidal Possibility Scale, SPS), we built Support Vector Machines (SVM) models on the Chinese suicide dictionary and the Simplified Chinese Linguistic Inquiry and Word Count (SCLIWC) program, respectively. After that, we made a comparison of the classification performance between two types of SVM models.Results and Discussion.Dictionary-based identifications were significantly correlated with expert ratings in terms of both detecting suicidal expression (r= 0.507) and evaluating individual suicide risk (r= 0.455). For the differentiation between individuals with high and non-high scores on SPS, the Chinese suicide dictionary (t1:F1= 0.48; t2:F1= 0.56) produced a more accurate identification than SCLIWC (t1:F1= 0.41; t2:F1= 0.48) on different observation windows.Conclusions.This paper confirms that, using social media, it is possible to implement real-time monitoring individual suicide risk in population. Results of this study may be useful to improve Chinese suicide prevention programs and may be insightful for other countries.

Download Full-text

Optimization of RBF-SVM hyperparameters using genetic algorithm for face recognit

Nigerian Journal of Technology ◽

10.4314/njt.v39i4.27 ◽

2021 ◽

Vol 39 (4) ◽

pp. 1190-1197

Author(s):

Y. Ibrahim ◽

E. Okafor ◽

B. Yahaya

Keyword(s):

Genetic Algorithm ◽

Deep Learning ◽

Face Recognition ◽

Regularization Parameter ◽

Local Binary Patterns ◽

Classification Performance ◽

Support Vector ◽

Vector Machines ◽

Linear Svm ◽

Learning Architectures

Manual grid-search tuning of machine learning hyperparameters is very time-consuming. Hence, to curb this problem, we propose the use of a genetic algorithm (GA) for the selection of optimal radial-basis-function based support vector machine (RBF-SVM) hyperparameters; regularization parameter C and cost-factor γ. The resulting optimal parameters were used during the training of face recognition models. To train the models, we independently extracted features from the ORL face image dataset using local binary patterns (handcrafted) and deep learning architectures (pretrained variants of VGGNet). The resulting features were passed as input to either linear-SVM or optimized RBF-SVM. The results show that the models from optimized RBFSVM combined with deep learning or hand-crafted features yielded performances that surpass models obtained from Linear-SVM combined with the aforementioned features in most of the data splits. The study demonstrated that it is profitable to optimize the hyperparameters of an SVM to obtain the best classification performance. Keywords: Face Recognition, Feature Extraction, Local Binary Patterns, Transfer Learning, Genetic Algorithm and Support Vector Machines.

Download Full-text

ChemTok: A New Rule Based Tokenizer for Chemical Named Entity Recognition

BioMed Research International ◽

10.1155/2016/4248026 ◽

2016 ◽

Vol 2016 ◽

pp. 1-9 ◽

Cited By ~ 5

Author(s):

Abbas Akkasi ◽

Ekrem Varoğlu ◽

Nazife Dimililer

Keyword(s):

Conditional Random Fields ◽

Named Entity Recognition ◽

Classification Performance ◽

Entity Recognition ◽

Support Vector ◽

Learning Approaches ◽

Data Set ◽

Rule Based ◽

Named Entity ◽

Vector Machines

Named Entity Recognition (NER) from text constitutes the first step in many text mining applications. The most important preliminary step for NER systems using machine learning approaches is tokenization where raw text is segmented into tokens. This study proposes an enhanced rule based tokenizer, ChemTok, which utilizes rules extracted mainly from the train data set. The main novelty of ChemTok is the use of the extracted rules in order to merge the tokens split in the previous steps, thus producing longer and more discriminative tokens. ChemTok is compared to the tokenization methods utilized by ChemSpot and tmChem. Support Vector Machines and Conditional Random Fields are employed as the learning algorithms. The experimental results show that the classifiers trained on the output of ChemTok outperforms all classifiers trained on the output of the other two tokenizers in terms of classification performance, and the number of incorrectly segmented entities.

Download Full-text

On the parameter optimization of Support Vector Machines for binary classification

Journal of Integrative Bioinformatics ◽

10.1515/jib-2012-201 ◽

2012 ◽

Vol 9 (3) ◽

pp. 33-43 ◽

Cited By ~ 30

Author(s):

Paulo Gaspar ◽

Jaime Carbonell ◽

José Luís Oliveira

Keyword(s):

Support Vector Machines ◽

Binary Classification ◽

Classification Performance ◽

Biological Data ◽

Parameters Optimization ◽

Support Vector ◽

Minimal Risk ◽

Class Separation ◽

Vector Machines ◽

Analyse Data

Summary Classifying biological data is a common task in the biomedical context. Predicting the class of new, unknown information allows researchers to gain insight and make decisions based on the available data. Also, using classification methods often implies choosing the best parameters to obtain optimal class separation, and the number of parameters might be large in biological datasets.Support Vector Machines provide a well-established and powerful classification method to analyse data and find the minimal-risk separation between different classes. Finding that separation strongly depends on the available feature set and the tuning of hyper-parameters. Techniques for feature selection and SVM parameters optimization are known to improve classification accuracy, and its literature is extensive.In this paper we review the strategies that are used to improve the classification performance of SVMs and perform our own experimentation to study the influence of features and hyper-parameters in the optimization process, using several known kernels.

Download Full-text

Influence of Dataset Character on Classification Performance of Support Vector Machines for Grain Analysis

Artificial Intelligence and Applications ◽

10.2316/p.2010.674-071 ◽

2010 ◽

Author(s):

K. Anding ◽

G. Linβ ◽

P. Brückner

Keyword(s):

Support Vector Machines ◽

Classification Performance ◽

Support Vector ◽

Vector Machines

Download Full-text

Classification of Hyperspectral In Vivo Brain Tissue Based on Linear Unmixing

Applied Sciences ◽

10.3390/app10165686 ◽

2020 ◽

Vol 10 (16) ◽

pp. 5686

Author(s):

Ines A. Cruz-Guerrero ◽

Raquel Leon ◽

Daniel U. Campos-Delgado ◽

Samuel Ortega ◽

Himar Fabelo ◽

...

Keyword(s):

Brain Tissue ◽

Classification Performance ◽

Training Dataset ◽

Support Vector ◽

Svm Classifier ◽

Tissue Classification ◽

Processing Times ◽

Main Challenge ◽

Linear Unmixing

Hyperspectral imaging is a multidimensional optical technique with the potential of providing fast and accurate tissue classification. The main challenge is the adequate processing of the multidimensional information usually linked to long processing times and significant computational costs, which require expensive hardware. In this study, we address the problem of tissue classification for intraoperative hyperspectral images of in vivo brain tissue. For this goal, two methodologies are introduced that rely on a blind linear unmixing (BLU) scheme for practical tissue classification. Both methodologies identify the characteristic end-members related to the studied tissue classes by BLU from a training dataset and classify the pixels by a minimum distance approach. The proposed methodologies are compared with a machine learning method based on a supervised support vector machine (SVM) classifier. The methodologies based on BLU achieve speedup factors of ~459× and ~429× compared to the SVM scheme, while keeping constant and even slightly improving the classification performance.

Download Full-text

IMPROVING SVM PERFORMANCE IN MULTI-LABEL DOMAINS: THRESHOLD ADJUSTMENT

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213012500388 ◽

2013 ◽

Vol 22 (01) ◽

pp. 1250038 ◽

Cited By ~ 4

Author(s):

PEERAPON VATEEKUL ◽

SAREEWAN DENDAMRONGVIT ◽

MIROSLAV KUBAT

Keyword(s):

Classification Performance ◽

Support Vector ◽

Binary Classifier ◽

Research Groups ◽

Computational Costs ◽

Vector Machines ◽

Threshold Adjustment ◽

Good Classification ◽

Good Classification Performance ◽

Suboptimal Behavior

In “multi-label domains,” where the same example can simultaneously belong to two or more classes, it is customary to induce a separate binary classifier for each class, and then use them all in parallel. As a result, some of these classifiers are induced from imbalanced training sets where one class outnumbers the other – a circumstance known to hurt some machine learning paradigms. In the case of Support Vector Machines (SVM), this suboptimal behavior is explained by the fact that SVM seeks to minimize error rate, a criterion that is in domains of this type misleading. This is why several research groups have studied mechanisms to readjust the bias of SVM's hyperplane. The best of these achieves very good classification performance at the price of impractically high computational costs. We propose here an improvement where these cost are reduced to a small fraction without significantly impairing classification.

Download Full-text

Selection of Support Vector Candidates Using Relative Support Distance for Sustainability in Large-Scale Support Vector Machines

Applied Sciences ◽

10.3390/app10196979 ◽

2020 ◽

Vol 10 (19) ◽

pp. 6979

Author(s):

Minho Ryu ◽

Kichun Lee

Keyword(s):

Support Vector Machines ◽

Quadratic Programming ◽

Decision Trees ◽

Programming Problem ◽

Large Scale ◽

Classification Performance ◽

Quadratic Programming Problem ◽

Support Vector ◽

Training Time ◽

Vector Machines

Support vector machines (SVMs) are a well-known classifier due to their superior classification performance. They are defined by a hyperplane, which separates two classes with the largest margin. In the computation of the hyperplane, however, it is necessary to solve a quadratic programming problem. The storage cost of a quadratic programming problem grows with the square of the number of training sample points, and the time complexity is proportional to the cube of the number in general. Thus, it is worth studying how to reduce the training time of SVMs without compromising the performance to prepare for sustainability in large-scale SVM problems. In this paper, we proposed a novel data reduction method for reducing the training time by combining decision trees and relative support distance. We applied a new concept, relative support distance, to select good support vector candidates in each partition generated by the decision trees. The selected support vector candidates improved the training speed for large-scale SVM problems. In experiments, we demonstrated that our approach significantly reduced the training time while maintaining good classification performance in comparison with existing approaches.

Download Full-text

Detection of linear features including bone and skin areas in ultrasound images of joints

10.7287/peerj.preprints.3519v1 ◽

2018 ◽

Author(s):

Artur Bąk ◽

Jakub Segen ◽

Kamil Wereszczyński ◽

Pawel Mielnik ◽

Marcin Fojcik ◽

...

Keyword(s):

Random Forest ◽

Computation Time ◽

Classification Performance ◽

Support Vector ◽

Ultrasound Images ◽

Nearest Neighbour ◽

Vector Machines ◽

Overall Evaluation ◽

Gaussian Blurring ◽

Image Pixels

Identifying the separate parts in ultrasound images such as bone and skin plays the crucial role in synovitis detection task. This paper presents a detector of bone and skin regions in the form of a classifier which is trained on a set of annotated images. Selected regions have labels: skin or bone or none. Feature vectors used by the classifier are assigned to image pixels as a result of passing the image through the bank of linear and nonlinear filters. The filters include Gaussian blurring filter, its first and second order derivatives, Laplacian as well as positive and negative threshold operations applied to the filtered images. We compared multiple supervised learning classifiers including Naive Bayes, k-Nearest Neighbour, Decision Trees, Random Forest, AdaBoost and Support Vector Machines (SVM) with various kernels, using four classification performance scores and computation time. The Random Forest classifier was selected for the final use, as it gives the best overall evaluation results.

Download Full-text