The effect of representative training dataset selection on the classification performance of the promoter sequences

Author(s):  
Ayse Gul Yaman ◽  
Tolga Can
Author(s):  
G. Pavoni ◽  
M. Corsini ◽  
M. Callieri ◽  
M. Palma ◽  
R. Scopigno

<p><strong>Abstract.</strong> Visual sampling techniques represent a valuable resource for a rapid, non-invasive data acquisition for underwater monitoring purposes. Long-term monitoring projects usually requires the collection of large quantities of data, and the visual analysis of a human expert operator remains, in this context, a very time consuming task. It has been estimated that only the 1-2% of the acquired images are later analyzed by scientists (Beijbom et al., 2012). Strategies for the automatic recognition of benthic communities are required to effectively exploit all the information contained in visual data. Supervised learning methods, the most promising classification techniques in this field, are commonly affected by two recurring issues: the wide diversity of marine organism, and the small amount of labeled data.</p><p> In this work, we discuss the advantages offered by the use of annotated high resolution ortho-mosaics of seabed to classify and segment the investigated specimens, and we suggest several strategies to obtain a considerable per-pixel classification performance although the use of a reduced training dataset composed by a single ortho-mosaic. The proposed methodology can be applied to a large number of different species, making the procedure of marine organism identification an highly adaptable task.</p>


2020 ◽  
Author(s):  
Saki Ishino ◽  
Takuya Itaki

Abstract The Eucampia Index, which is calculated from valve ratio of Antarctic diatom Eucampia ainarctica varieties, has been expected to be a useful indicator of sea ice coverage or/and sea surface temperature variation in the Southern Ocean. To verify the relationship between the index value and the environmental factors, considerable effort is needed to classify and count valves of E. antarctica in a very large number of samples. In this study, to realize automated detection of the Eucampia Index, we constructed a deep-learning (one of the learning methods of artificial intelligence) based models for identifying Eucampia valves from various particles in a diatom slide. The microfossil Classification and Rapid Accumulation Device (miCRAD) system, which can be used for scanning a slide and cropping images of particles automatically, was employed to collect images in training dataset for the model and test dataset for model verification. As a result of classifying particle images in the test dataset by the initial model "Eant_1000px_200616", accuracy was 78.8%. The Eucampia Index value prepared in the test dataset was 0.80, and the value predicted using the developed model from the same dataset was 0.76. The predicted value was in the range of the manual counting error. These results suggest that the classification performance of the model is similar to that of a human expert. This study revealed that a model capable of detecting the ratio of two diatom species can be constructed using the miCRAD system for the first time. The miCRAD system connected with the developed model in this study is capable of automatically classifying particle images at the same time of capturing images so that the system can be applied to a large-scale analysis of the Eucampia index in the Southern Ocean. Depending on the setting of the classification category, similar method is relevant to investigators who have to process a large number of diatom samples such as for detecting specific species for biostratigraphic and paleoenvironmental studies.


Sensors ◽  
2020 ◽  
Vol 20 (7) ◽  
pp. 1999 ◽  
Author(s):  
Donghang Yu ◽  
Qing Xu ◽  
Haitao Guo ◽  
Chuan Zhao ◽  
Yuzhun Lin ◽  
...  

Classifying remote sensing images is vital for interpreting image content. Presently, remote sensing image scene classification methods using convolutional neural networks have drawbacks, including excessive parameters and heavy calculation costs. More efficient and lightweight CNNs have fewer parameters and calculations, but their classification performance is generally weaker. We propose a more efficient and lightweight convolutional neural network method to improve classification accuracy with a small training dataset. Inspired by fine-grained visual recognition, this study introduces a bilinear convolutional neural network model for scene classification. First, the lightweight convolutional neural network, MobileNetv2, is used to extract deep and abstract image features. Each feature is then transformed into two features with two different convolutional layers. The transformed features are subjected to Hadamard product operation to obtain an enhanced bilinear feature. Finally, the bilinear feature after pooling and normalization is used for classification. Experiments are performed on three widely used datasets: UC Merced, AID, and NWPU-RESISC45. Compared with other state-of-art methods, the proposed method has fewer parameters and calculations, while achieving higher accuracy. By including feature fusion with bilinear pooling, performance and accuracy for remote scene classification can greatly improve. This could be applied to any remote sensing image classification task.


Mathematics ◽  
2020 ◽  
Vol 8 (8) ◽  
pp. 1263
Author(s):  
Chih-Yao Chang ◽  
Kuo-Ping Lin

Classification problems are very important issues in real enterprises. In the patent infringement issue, accurate classification could help enterprises to understand court decisions to avoid patent infringement. However, the general classification method does not perform well in the patent infringement problem because there are too many complex variables. Therefore, this study attempts to develop a classification method, the support vector machine with new fuzzy selection (SVMFS), to judge the infringement of patent rights. The raw data are divided into training and testing sets. However, the data quality of the training set is not easy to evaluate. Effective data quality management requires a structural core that can support data operations. This study adopts new fuzzy selection based on membership values, which are generated from fuzzy c-means clustering, to select appropriate data to enhance the classification performance of the support vector machine (SVM). An empirical example based on the SVMFS shows that the proposed SVMFS can obtain a superior accuracy rate. Moreover, the new fuzzy selection also verifies that it can effectively select the training dataset.


2020 ◽  
Vol 10 (16) ◽  
pp. 5686
Author(s):  
Ines A. Cruz-Guerrero ◽  
Raquel Leon ◽  
Daniel U. Campos-Delgado ◽  
Samuel Ortega ◽  
Himar Fabelo ◽  
...  

Hyperspectral imaging is a multidimensional optical technique with the potential of providing fast and accurate tissue classification. The main challenge is the adequate processing of the multidimensional information usually linked to long processing times and significant computational costs, which require expensive hardware. In this study, we address the problem of tissue classification for intraoperative hyperspectral images of in vivo brain tissue. For this goal, two methodologies are introduced that rely on a blind linear unmixing (BLU) scheme for practical tissue classification. Both methodologies identify the characteristic end-members related to the studied tissue classes by BLU from a training dataset and classify the pixels by a minimum distance approach. The proposed methodologies are compared with a machine learning method based on a supervised support vector machine (SVM) classifier. The methodologies based on BLU achieve speedup factors of ~459× and ~429× compared to the SVM scheme, while keeping constant and even slightly improving the classification performance.


2018 ◽  
Vol 2018 ◽  
pp. 1-12 ◽  
Author(s):  
Jing Zhang ◽  
Chuncheng Zhang ◽  
Li Yao ◽  
Xiaojie Zhao ◽  
Zhiying Long

Multivariate classification techniques have been widely applied to decode brain states using functional magnetic resonance imaging (fMRI). Due to variabilities in fMRI data and the limitation of the collection of human fMRI data, it is not easy to train an efficient and robust supervised-learning classifier for fMRI data. Among various classification techniques, sparse representation classifier (SRC) exhibits a state-of-the-art classification performance in image classification. However, SRC has rarely been applied to fMRI-based decoding. This study aimed to improve SRC using unlabeled testing samples to allow it to be effectively applied to fMRI-based decoding. We proposed a semisupervised-learning SRC with an average coefficient (semiSRC-AVE) method that performed the classification using the average coefficient of each class instead of the reconstruction error and selectively updated the training dataset using new testing data with high confidence to improve the performance of SRC. Simulated and real fMRI experiments were performed to investigate the feasibility and robustness of semiSRC-AVE. The results of the simulated and real fMRI experiments showed that semiSRC-AVE significantly outperformed supervised learning SRC with an average coefficient (SRC-AVE) method and showed better performance than the other three semisupervised learning methods.


Sensors ◽  
2021 ◽  
Vol 21 (16) ◽  
pp. 5334
Author(s):  
Iqram Hussain ◽  
Se-Jin Park

Electromyography (EMG) is sensitive to neuromuscular changes resulting from ischemic stroke and is considered a potential predictive tool of post-stroke gait and rehabilitation management. This study aimed to evaluate the potential myoelectric biomarkers for the classification of stroke-impaired muscular activity of the stroke patient group and the muscular activity of the control healthy adult group. We also proposed an EMG-based gait monitoring system consisting of a portable EMG device, cloud-based data processing, data analytics, and a health advisor service. This system was investigated with 48 stroke patients (mean age 70.6 years, 65% male) admitted into the emergency unit of a hospital and 75 healthy elderly volunteers (mean age 76.3 years, 32% male). EMG was recorded during walking using the portable device at two muscle positions: the bicep femoris muscle and the lateral gastrocnemius muscle of both lower limbs. The statistical result showed that the mean power frequency (MNF), median power frequency (MDF), peak power frequency (PKF), and mean power (MNP) of the stroke group differed significantly from those of the healthy control group. In the machine learning analysis, the neural network model showed the highest classification performance (precision: 88%, specificity: 89%, accuracy: 80%) using the training dataset and highest classification performance (precision: 72%, specificity: 74%, accuracy: 65%) using the testing dataset. This study will be helpful to understand stroke-impaired gait changes and decide post-stroke rehabilitation.


Sensors ◽  
2019 ◽  
Vol 20 (1) ◽  
pp. 98 ◽  
Author(s):  
Krzysztof Kamycki ◽  
Tomasz Kapuscinski ◽  
Mariusz Oszust

In this paper, a novel data augmentation method for time-series classification is proposed. In the introduced method, a new time-series is obtained in warped space between suboptimally aligned input examples of different lengths. Specifically, the alignment is carried out constraining the warping path and reducing its flexibility. It is shown that the resultant synthetic time-series can form new class boundaries and enrich the training dataset. In this work, the comparative evaluation of the proposed augmentation method against related techniques on representative multivariate time-series datasets is presented. The performance of methods is examined using the nearest neighbor classifier with the dynamic time warping (NN-DTW), LogDet divergence-based metric learning with triplet constraints (LDMLT), and the recently introduced time-series cluster kernel (NN-TCK). The impact of the augmentation on the classification performance is investigated, taking into account entire datasets and cases with a small number of training examples. The extensive evaluation reveals that the introduced method outperforms related augmentation algorithms in terms of the obtained classification accuracy.


2019 ◽  
Author(s):  
Yelin Fu ◽  
Lishuang Qi ◽  
Wenbing Guo ◽  
Liangliang Jin ◽  
Kai Song ◽  
...  

Abstract Background: Microsatellite instability (MSI) accounts for about 15% of colorectal cancer and is associated with prognosis. Today, MSI is usually detected by polymerase chain reaction amplification of specific microsatellite markers. However, the instability is identified by comparing the length of microsatellite repeats in tumor and normal samples. In this work, we developed a qualitative transcriptional signature to individually predict MSI status for right-sided colon cancer (RCC) based on tumor samples. Results: Using RCC samples, based on the relative expression orderings (REOs) of gene pairs, we extracted a signature consisting of 10 gene pairs (10-GPS) to predict MSI status for RCC through a feature selection process. A sample is predicted as MSI when the gene expression orderings of at least 7 gene pairs vote for MSI; otherwise the microsatellite stability (MSS). The classification performance reached the largest F-score in the training dataset. This signature was verified in four independent datasets of RCCs with the F-scores of 1, 0.9630, 0.9412 and 0.8798, respectively. Additionally, the hierarchical clustering analyses and molecular features also supported the correctness of the reclassifications of the MSI status by 10-GPS. Conclusions: The qualitative transcriptional signature can be used to classify MSI status of RCC samples at the individualized level.


2020 ◽  
Vol 12 (11) ◽  
pp. 1780 ◽  
Author(s):  
Yao Liu ◽  
Lianru Gao ◽  
Chenchao Xiao ◽  
Ying Qu ◽  
Ke Zheng ◽  
...  

Convolutional neural networks (CNNs) have been widely applied in hyperspectral imagery (HSI) classification. However, their classification performance might be limited by the scarcity of labeled data to be used for training and validation. In this paper, we propose a novel lightweight shuffled group convolutional neural network (abbreviated as SG-CNN) to achieve efficient training with a limited training dataset in HSI classification. SG-CNN consists of SG conv units that employ conventional and atrous convolution in different groups, followed by channel shuffle operation and shortcut connection. In this way, SG-CNNs have less trainable parameters, whilst they can still be accurately and efficiently trained with fewer labeled samples. Transfer learning between different HSI datasets is also applied on the SG-CNN to further improve the classification accuracy. To evaluate the effectiveness of SG-CNNs for HSI classification, experiments have been conducted on three public HSI datasets pretrained on HSIs from different sensors. SG-CNNs with different levels of complexity were tested, and their classification results were compared with fine-tuned ShuffleNet2, ResNeXt, and their original counterparts. The experimental results demonstrate that SG-CNNs can achieve competitive classification performance when the amount of labeled data for training is poor, as well as efficiently providing satisfying classification results.


Sign in / Sign up

Export Citation Format

Share Document