The effect of representative training dataset selection on the classification performance of the promoter sequences

SEMANTIC SEGMENTATION OF BENTHIC COMMUNITIES FROM ORTHO-MOSAIC MAPS

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-2-w10-151-2019 ◽

2019 ◽

Vol XLII-2/W10 ◽

pp. 151-158 ◽

Cited By ~ 5

Author(s):

G. Pavoni ◽

M. Corsini ◽

M. Callieri ◽

M. Palma ◽

R. Scopigno

Keyword(s):

Visual Analysis ◽

Marine Organism ◽

Benthic Communities ◽

Semantic Segmentation ◽

Classification Performance ◽

Training Dataset ◽

Non Invasive ◽

Visual Sampling ◽

Organism Identification

Abstract. Visual sampling techniques represent a valuable resource for a rapid, non-invasive data acquisition for underwater monitoring purposes. Long-term monitoring projects usually requires the collection of large quantities of data, and the visual analysis of a human expert operator remains, in this context, a very time consuming task. It has been estimated that only the 1-2% of the acquired images are later analyzed by scientists (Beijbom et al., 2012). Strategies for the automatic recognition of benthic communities are required to effectively exploit all the information contained in visual data. Supervised learning methods, the most promising classification techniques in this field, are commonly affected by two recurring issues: the wide diversity of marine organism, and the small amount of labeled data. In this work, we discuss the advantages offered by the use of annotated high resolution ortho-mosaics of seabed to classify and segment the investigated specimens, and we suggest several strategies to obtain a considerable per-pixel classification performance although the use of a reduced training dataset composed by a single ortho-mosaic. The proposed methodology can be applied to a large number of different species, making the procedure of marine organism identification an highly adaptable task.

Download Full-text

Automated Detection of Paleoenvironmental Proxy, Eucampia Index, in a Microscopic Slide Using a Convolutional Neural Network System

10.21203/rs.3.rs-88945/v1 ◽

2020 ◽

Author(s):

Saki Ishino ◽

Takuya Itaki

Keyword(s):

Southern Ocean ◽

Large Scale ◽

Classification Performance ◽

Automated Detection ◽

Model Verification ◽

Training Dataset ◽

Test Dataset ◽

Counting Error ◽

Index Value ◽

Particle Images

Abstract The Eucampia Index, which is calculated from valve ratio of Antarctic diatom Eucampia ainarctica varieties, has been expected to be a useful indicator of sea ice coverage or/and sea surface temperature variation in the Southern Ocean. To verify the relationship between the index value and the environmental factors, considerable effort is needed to classify and count valves of E. antarctica in a very large number of samples. In this study, to realize automated detection of the Eucampia Index, we constructed a deep-learning (one of the learning methods of artificial intelligence) based models for identifying Eucampia valves from various particles in a diatom slide. The microfossil Classification and Rapid Accumulation Device (miCRAD) system, which can be used for scanning a slide and cropping images of particles automatically, was employed to collect images in training dataset for the model and test dataset for model verification. As a result of classifying particle images in the test dataset by the initial model "Eant_1000px_200616", accuracy was 78.8%. The Eucampia Index value prepared in the test dataset was 0.80, and the value predicted using the developed model from the same dataset was 0.76. The predicted value was in the range of the manual counting error. These results suggest that the classification performance of the model is similar to that of a human expert. This study revealed that a model capable of detecting the ratio of two diatom species can be constructed using the miCRAD system for the first time. The miCRAD system connected with the developed model in this study is capable of automatically classifying particle images at the same time of capturing images so that the system can be applied to a large-scale analysis of the Eucampia index in the Southern Ocean. Depending on the setting of the classification category, similar method is relevant to investigators who have to process a large number of diatom samples such as for detecting specific species for biostratigraphic and paleoenvironmental studies.

Download Full-text

An Efficient and Lightweight Convolutional Neural Network for Remote Sensing Image Scene Classification

Sensors ◽

10.3390/s20071999 ◽

2020 ◽

Vol 20 (7) ◽

pp. 1999 ◽

Cited By ~ 6

Author(s):

Donghang Yu ◽

Qing Xu ◽

Haitao Guo ◽

Chuan Zhao ◽

Yuzhun Lin ◽

...

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Convolutional Neural Network ◽

Visual Recognition ◽

Feature Fusion ◽

Remote Sensing Image ◽

Classification Performance ◽

Image Features ◽

Training Dataset ◽

Scene Classification

Classifying remote sensing images is vital for interpreting image content. Presently, remote sensing image scene classification methods using convolutional neural networks have drawbacks, including excessive parameters and heavy calculation costs. More efficient and lightweight CNNs have fewer parameters and calculations, but their classification performance is generally weaker. We propose a more efficient and lightweight convolutional neural network method to improve classification accuracy with a small training dataset. Inspired by fine-grained visual recognition, this study introduces a bilinear convolutional neural network model for scene classification. First, the lightweight convolutional neural network, MobileNetv2, is used to extract deep and abstract image features. Each feature is then transformed into two features with two different convolutional layers. The transformed features are subjected to Hadamard product operation to obtain an enhanced bilinear feature. Finally, the bilinear feature after pooling and normalization is used for classification. Experiments are performed on three widely used datasets: UC Merced, AID, and NWPU-RESISC45. Compared with other state-of-art methods, the proposed method has fewer parameters and calculations, while achieving higher accuracy. By including feature fusion with bilinear pooling, performance and accuracy for remote scene classification can greatly improve. This could be applied to any remote sensing image classification task.

Download Full-text

Developing Support Vector Machine with New Fuzzy Selection for the Infringement of a Patent Rights Problem

Mathematics ◽

10.3390/math8081263 ◽

2020 ◽

Vol 8 (8) ◽

pp. 1263

Author(s):

Chih-Yao Chang ◽

Kuo-Ping Lin

Keyword(s):

Support Vector Machine ◽

Data Quality ◽

Classification Performance ◽

Patent Infringement ◽

Classification Method ◽

Training Dataset ◽

Support Vector ◽

Classification Problems ◽

Data Quality Management ◽

Patent Rights

Classification problems are very important issues in real enterprises. In the patent infringement issue, accurate classification could help enterprises to understand court decisions to avoid patent infringement. However, the general classification method does not perform well in the patent infringement problem because there are too many complex variables. Therefore, this study attempts to develop a classification method, the support vector machine with new fuzzy selection (SVMFS), to judge the infringement of patent rights. The raw data are divided into training and testing sets. However, the data quality of the training set is not easy to evaluate. Effective data quality management requires a structural core that can support data operations. This study adopts new fuzzy selection based on membership values, which are generated from fuzzy c-means clustering, to select appropriate data to enhance the classification performance of the support vector machine (SVM). An empirical example based on the SVMFS shows that the proposed SVMFS can obtain a superior accuracy rate. Moreover, the new fuzzy selection also verifies that it can effectively select the training dataset.

Download Full-text

Classification of Hyperspectral In Vivo Brain Tissue Based on Linear Unmixing

Applied Sciences ◽

10.3390/app10165686 ◽

2020 ◽

Vol 10 (16) ◽

pp. 5686

Author(s):

Ines A. Cruz-Guerrero ◽

Raquel Leon ◽

Daniel U. Campos-Delgado ◽

Samuel Ortega ◽

Himar Fabelo ◽

...

Keyword(s):

Brain Tissue ◽

Classification Performance ◽

Training Dataset ◽

Support Vector ◽

Svm Classifier ◽

Tissue Classification ◽

Processing Times ◽

Main Challenge ◽

Linear Unmixing

Hyperspectral imaging is a multidimensional optical technique with the potential of providing fast and accurate tissue classification. The main challenge is the adequate processing of the multidimensional information usually linked to long processing times and significant computational costs, which require expensive hardware. In this study, we address the problem of tissue classification for intraoperative hyperspectral images of in vivo brain tissue. For this goal, two methodologies are introduced that rely on a blind linear unmixing (BLU) scheme for practical tissue classification. Both methodologies identify the characteristic end-members related to the studied tissue classes by BLU from a training dataset and classify the pixels by a minimum distance approach. The proposed methodologies are compared with a machine learning method based on a supervised support vector machine (SVM) classifier. The methodologies based on BLU achieve speedup factors of ~459× and ~429× compared to the SVM scheme, while keeping constant and even slightly improving the classification performance.

Download Full-text

Brain State Decoding Based on fMRI Using Semisupervised Sparse Representation Classifications

Computational Intelligence and Neuroscience ◽

10.1155/2018/3956536 ◽

2018 ◽

Vol 2018 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Jing Zhang ◽

Chuncheng Zhang ◽

Li Yao ◽

Xiaojie Zhao ◽

Zhiying Long

Keyword(s):

Sparse Representation ◽

Supervised Learning ◽

Reconstruction Error ◽

Classification Performance ◽

Semisupervised Learning ◽

Fmri Data ◽

Training Dataset ◽

Brain State ◽

Classification Techniques ◽

Average Coefficient

Multivariate classification techniques have been widely applied to decode brain states using functional magnetic resonance imaging (fMRI). Due to variabilities in fMRI data and the limitation of the collection of human fMRI data, it is not easy to train an efficient and robust supervised-learning classifier for fMRI data. Among various classification techniques, sparse representation classifier (SRC) exhibits a state-of-the-art classification performance in image classification. However, SRC has rarely been applied to fMRI-based decoding. This study aimed to improve SRC using unlabeled testing samples to allow it to be effectively applied to fMRI-based decoding. We proposed a semisupervised-learning SRC with an average coefficient (semiSRC-AVE) method that performed the classification using the average coefficient of each class instead of the reconstruction error and selectively updated the training dataset using new testing data with high confidence to improve the performance of SRC. Simulated and real fMRI experiments were performed to investigate the feasibility and robustness of semiSRC-AVE. The results of the simulated and real fMRI experiments showed that semiSRC-AVE significantly outperformed supervised learning SRC with an average coefficient (SRC-AVE) method and showed better performance than the other three semisupervised learning methods.

Download Full-text

Prediction of Myoelectric Biomarkers in Post-Stroke Gait

Sensors ◽

10.3390/s21165334 ◽

2021 ◽

Vol 21 (16) ◽

pp. 5334

Author(s):

Iqram Hussain ◽

Se-Jin Park

Keyword(s):

Muscular Activity ◽

Classification Performance ◽

Lower Limbs ◽

Training Dataset ◽

Control Group ◽

Mean Power ◽

Predictive Tool ◽

Post Stroke ◽

Healthy Elderly ◽

Power Frequency

Electromyography (EMG) is sensitive to neuromuscular changes resulting from ischemic stroke and is considered a potential predictive tool of post-stroke gait and rehabilitation management. This study aimed to evaluate the potential myoelectric biomarkers for the classification of stroke-impaired muscular activity of the stroke patient group and the muscular activity of the control healthy adult group. We also proposed an EMG-based gait monitoring system consisting of a portable EMG device, cloud-based data processing, data analytics, and a health advisor service. This system was investigated with 48 stroke patients (mean age 70.6 years, 65% male) admitted into the emergency unit of a hospital and 75 healthy elderly volunteers (mean age 76.3 years, 32% male). EMG was recorded during walking using the portable device at two muscle positions: the bicep femoris muscle and the lateral gastrocnemius muscle of both lower limbs. The statistical result showed that the mean power frequency (MNF), median power frequency (MDF), peak power frequency (PKF), and mean power (MNP) of the stroke group differed significantly from those of the healthy control group. In the machine learning analysis, the neural network model showed the highest classification performance (precision: 88%, specificity: 89%, accuracy: 80%) using the training dataset and highest classification performance (precision: 72%, specificity: 74%, accuracy: 65%) using the testing dataset. This study will be helpful to understand stroke-impaired gait changes and decide post-stroke rehabilitation.

Download Full-text

Data Augmentation with Suboptimal Warping for Time-Series Classification

Sensors ◽

10.3390/s20010098 ◽

2019 ◽

Vol 20 (1) ◽

pp. 98 ◽

Cited By ~ 3

Author(s):

Krzysztof Kamycki ◽

Tomasz Kapuscinski ◽

Mariusz Oszust

Keyword(s):

Time Series ◽

Data Augmentation ◽

Nearest Neighbor ◽

Multivariate Time Series ◽

Metric Learning ◽

Classification Performance ◽

Training Dataset ◽

Time Series Classification ◽

Extensive Evaluation ◽

The Impact

In this paper, a novel data augmentation method for time-series classification is proposed. In the introduced method, a new time-series is obtained in warped space between suboptimally aligned input examples of different lengths. Specifically, the alignment is carried out constraining the warping path and reducing its flexibility. It is shown that the resultant synthetic time-series can form new class boundaries and enrich the training dataset. In this work, the comparative evaluation of the proposed augmentation method against related techniques on representative multivariate time-series datasets is presented. The performance of methods is examined using the nearest neighbor classifier with the dynamic time warping (NN-DTW), LogDet divergence-based metric learning with triplet constraints (LDMLT), and the recently introduced time-series cluster kernel (NN-TCK). The impact of the augmentation on the classification performance is investigated, taking into account entire datasets and cases with a small number of training examples. The extensive evaluation reveals that the introduced method outperforms related augmentation algorithms in terms of the obtained classification accuracy.

Download Full-text

A Qualitative Transcriptional Signature for Predicting Microsatellite Instability Status of Right-sided Colon Cancer

10.21203/rs.2.10681/v2 ◽

2019 ◽

Author(s):

Yelin Fu ◽

Lishuang Qi ◽

Wenbing Guo ◽

Liangliang Jin ◽

Kai Song ◽

...

Keyword(s):

Colon Cancer ◽

Microsatellite Instability ◽

Selection Process ◽

Polymerase Chain Reaction Amplification ◽

Classification Performance ◽

Training Dataset ◽

Transcriptional Signature ◽

Gene Pairs ◽

Molecular Features ◽

Relative Expression Orderings

Abstract Background: Microsatellite instability (MSI) accounts for about 15% of colorectal cancer and is associated with prognosis. Today, MSI is usually detected by polymerase chain reaction amplification of specific microsatellite markers. However, the instability is identified by comparing the length of microsatellite repeats in tumor and normal samples. In this work, we developed a qualitative transcriptional signature to individually predict MSI status for right-sided colon cancer (RCC) based on tumor samples. Results: Using RCC samples, based on the relative expression orderings (REOs) of gene pairs, we extracted a signature consisting of 10 gene pairs (10-GPS) to predict MSI status for RCC through a feature selection process. A sample is predicted as MSI when the gene expression orderings of at least 7 gene pairs vote for MSI; otherwise the microsatellite stability (MSS). The classification performance reached the largest F-score in the training dataset. This signature was verified in four independent datasets of RCCs with the F-scores of 1, 0.9630, 0.9412 and 0.8798, respectively. Additionally, the hierarchical clustering analyses and molecular features also supported the correctness of the reclassifications of the MSI status by 10-GPS. Conclusions: The qualitative transcriptional signature can be used to classify MSI status of RCC samples at the individualized level.

Download Full-text

Hyperspectral Image Classification Based on a Shuffled Group Convolutional Neural Network with Transfer Learning

Remote Sensing ◽

10.3390/rs12111780 ◽

2020 ◽

Vol 12 (11) ◽

pp. 1780 ◽

Cited By ~ 2

Author(s):

Yao Liu ◽

Lianru Gao ◽

Chenchao Xiao ◽

Ying Qu ◽

Ke Zheng ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Convolutional Neural Network ◽

Transfer Learning ◽

Classification Accuracy ◽

Hyperspectral Image ◽

Classification Performance ◽

Training Dataset ◽

Hyperspectral Image Classification ◽

Different Levels

Convolutional neural networks (CNNs) have been widely applied in hyperspectral imagery (HSI) classification. However, their classification performance might be limited by the scarcity of labeled data to be used for training and validation. In this paper, we propose a novel lightweight shuffled group convolutional neural network (abbreviated as SG-CNN) to achieve efficient training with a limited training dataset in HSI classification. SG-CNN consists of SG conv units that employ conventional and atrous convolution in different groups, followed by channel shuffle operation and shortcut connection. In this way, SG-CNNs have less trainable parameters, whilst they can still be accurately and efficiently trained with fewer labeled samples. Transfer learning between different HSI datasets is also applied on the SG-CNN to further improve the classification accuracy. To evaluate the effectiveness of SG-CNNs for HSI classification, experiments have been conducted on three public HSI datasets pretrained on HSIs from different sensors. SG-CNNs with different levels of complexity were tested, and their classification results were compared with fine-tuned ShuffleNet2, ResNeXt, and their original counterparts. The experimental results demonstrate that SG-CNNs can achieve competitive classification performance when the amount of labeled data for training is poor, as well as efficiently providing satisfying classification results.

Download Full-text