The Impact of Pixel Resolution, Integration Scale, Preprocessing, and Feature Normalization on Texture Analysis for Mass Classification in Mammograms

International Journal of Optics ◽

10.1155/2016/1370259 ◽

2016 ◽

Vol 2016 ◽

pp. 1-12 ◽

Cited By ~ 4

Author(s):

Mohamed Abdel-Nasser ◽

Jaime Melendez ◽

Antonio Moreno ◽

Domenec Puig

Keyword(s):

Texture Analysis ◽

Spatial Arrangement ◽

Classification Performance ◽

Support Vector ◽

Forward Selection ◽

Mass Detection ◽

Pixel Resolution ◽

Feature Normalization ◽

Mass Classification ◽

The Impact

Texture analysis methods are widely used to characterize breast masses in mammograms. Texture gives information about the spatial arrangement of the intensities in the region of interest. This information has been used in mammogram analysis applications such as mass detection, mass classification, and breast density estimation. In this paper, we study the effect of factors such as pixel resolution, integration scale, preprocessing, and feature normalization on the performance of those texture methods for mass classification. The classification performance was assessed considering linear and nonlinear support vector machine classifiers. To find the best combination among the studied factors, we used three approaches: greedy, sequential forward selection (SFS), and exhaustive search. On the basis of our study, we conclude that the factors studied affect the performance of texture methods, so the best combination of these factors should be determined to achieve the best performance with each texture method. SFS can be an appropriate way to approach the factor combination problem because it is less computationally intensive than the other methods.

Download Full-text

Impact of Dataset Size on Classification Performance: An Empirical Evaluation in the Medical Domain

Applied Sciences ◽

10.3390/app11020796 ◽

2021 ◽

Vol 11 (2) ◽

pp. 796

Author(s):

Alhanoof Althnian ◽

Duaa AlSaeed ◽

Heyam Al-Baity ◽

Amani Samha ◽

Alanoud Bin Dris ◽

...

Keyword(s):

Empirical Evaluation ◽

Classification Performance ◽

Support Vector ◽

Robust Model ◽

Original Distribution ◽

C4.5 Decision Tree ◽

Dataset Size ◽

Overall Performance ◽

Medical Domain ◽

The Impact

Dataset size is considered a major concern in the medical domain, where lack of data is a common occurrence. This study aims to investigate the impact of dataset size on the overall performance of supervised classification models. We examined the performance of six widely-used models in the medical field, including support vector machine (SVM), neural networks (NN), C4.5 decision tree (DT), random forest (RF), adaboost (AB), and naïve Bayes (NB) on eighteen small medical UCI datasets. We further implemented three dataset size reduction scenarios on two large datasets and analyze the performance of the models when trained on each resulting dataset with respect to accuracy, precision, recall, f-score, specificity, and area under the ROC curve (AUC). Our results indicated that the overall performance of classifiers depend on how much a dataset represents the original distribution rather than its size. Moreover, we found that the most robust model for limited medical data is AB and NB, followed by SVM, and then RF and NN, while the least robust model is DT. Furthermore, an interesting observation is that a robust machine learning model to limited dataset does not necessary imply that it provides the best performance compared to other models.

Download Full-text

Laplacian Embedded Infinite Kernel Model for Semi-Supervised Classification

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001416500221 ◽

2016 ◽

Vol 30 (10) ◽

pp. 1650022 ◽

Cited By ~ 1

Author(s):

Tao Yang ◽

Dongmei Fu ◽

Chunhong Wu

Keyword(s):

Objective Function ◽

Kernel Function ◽

Supervised Classification ◽

Average Distance ◽

Classification Performance ◽

Support Vector ◽

Multiple Features ◽

Processing Data ◽

Benchmark Datasets ◽

The Impact

Promoted by its convexity and low time complexity, Laplacian embedded support vector regression (LapESVR) model based on manifold regularization (MR) has assumed an important role in semi-supervised classification. Conventionally, the LapESVR model is based on a single kernel function that is intrinsically capable of describing one feature mapping relation only. However, when the data to be processed is from a complex dataset where multiple features of the data are required to be treated, the classification performance using the LapESVR based on a single kernel substantially degrade, indicating that the classification requirement in this case is beyond the capability of the LapESVR. In addition, the processing data is often subject to the impact of abnormal data samples; therefore, in practice assigning a fixed value that is related to the average distance of the processing data as the parameter value of kernel function of the LapESVR is by no means optimal. To solve the problems as mentioned regarding the LapESVR, this paper proposes a Laplacian embedded infinite kernel regression (LapEIKR) model. The proposed model combines the multiple kernels linearly to improve its ability of characterization of the processing data, typical in semi-supervised classification of complex datasets, with multiple features. Further, the parameter setting of the multiple kernels of the LapEIKR model is turned into an optimization problem by formulating a corresponding minimum objective function and an iterative algorithm, and then the values of the settings are facilitated to be obtained by a formulated calculation, assuming the optimal values with respect to the designed objective function. Comparative experiments on the UCI datasets, benchmark datasets and Caltech256 datasets show that the proposed LapEIKR model is improving in terms of adaptivity and efficiency.

Download Full-text

Discriminating Emotions in the Valence Dimension from Speech Using Timbre Features

Applied Sciences ◽

10.3390/app9122470 ◽

2019 ◽

Vol 9 (12) ◽

pp. 2470 ◽

Cited By ~ 7

Author(s):

Anvarjon Tursunov ◽

Soonil Kwon ◽

Hee-Suk Pang

Keyword(s):

Short Term Memory ◽

Classification Performance ◽

Support Vector ◽

Emotional Speech ◽

Acoustic Features ◽

Discrete Emotions ◽

Forward Selection ◽

Mel Frequency Cepstral Coefficients ◽

Speech Database ◽

Emotional Speech Database

The most used and well-known acoustic features of a speech signal, the Mel frequency cepstral coefficients (MFCC), cannot characterize emotions in speech sufficiently when a classification is performed to classify both discrete emotions (i.e., anger, happiness, sadness, and neutral) and emotions in valence dimension (positive and negative). The main reason for this is that some of the discrete emotions, such as anger and happiness, share similar acoustic features in the arousal dimension (high and low) but are different in the valence dimension. Timbre is a sound quality that can discriminate between two sounds even with the same pitch and loudness. In this paper, we analyzed timbre acoustic features to improve the classification performance of discrete emotions as well as emotions in the valence dimension. Sequential forward selection (SFS) was used to find the most relevant acoustic features among timbre acoustic features. The experiments were carried out on the Berlin Emotional Speech Database and the Interactive Emotional Dyadic Motion Capture Database. Support vector machine (SVM) and long short-term memory recurrent neural network (LSTM-RNN) were used to classify emotions. The significant classification performance improvements were achieved using a combination of baseline and the most relevant timbre acoustic features, which were found by applying SFS on a classification of emotions for the Berlin Emotional Speech Database. From extensive experiments, it was found that timbre acoustic features could characterize emotions sufficiently in a speech in the valence dimension.

Download Full-text

Optimization of Network Topology in Computer-Aided Detection Schemes Using Phased Searching with NEAT in a Time-Scaled Framework

Cancer Informatics ◽

10.4137/cin.s13885 ◽

2014 ◽

Vol 13s1 ◽

pp. CIN.S13885

Author(s):

Maxine Tan ◽

Jiantao Pu ◽

Bin Zheng

Keyword(s):

Selection Process ◽

Learning Task ◽

Classification Performance ◽

Optimal Topology ◽

Support Vector ◽

Mass Detection ◽

Detection Scheme ◽

Linear Discriminant ◽

Computer Aided ◽

Fixed Topology

In the field of computer-aided mammographic mass detection, many different features and classifiers have been tested. Frequently, the relevant features and optimal topology for the artificial neural network (ANN)-based approaches at the classification stage are unknown, and thus determined by trial-and-error experiments. In this study, we analyzed a classifier that evolves ANNs using genetic algorithms (GAs), which combines feature selection with the learning task. The classifier named “Phased Searching with NEAT in a Time-Scaled Framework” was analyzed using a dataset with 800 malignant and 800 normal tissue regions in a 10-fold cross-validation framework. The classification performance measured by the area under a receiver operating characteristic (ROC) curve was 0.856 ± 0.029. The result was also compared with four other well-established classifiers that include fixed-topology ANNs, support vector machines (SVMs), linear discriminant analysis (LDA), and bagged decision trees. The results show that Phased Searching outperformed the LDA and bagged decision tree classifiers, and was only significantly outperformed by SVM. Furthermore, the Phased Searching method required fewer features and discarded superfluous structure or topology, thus incurring a lower feature computational and training and validation time requirement. Analyses performed on the network complexities evolved by Phased Searching indicate that it can evolve optimal network topologies based on its complexification and simplification parameter selection process. From the results, the study also concluded that the three classifiers – SVM, fixed-topology ANN, and Phased Searching with NeuroEvolution of Augmenting Topologies (NEAT) in a Time-Scaled Framework – are performing comparably well in our mammographic mass detection scheme.

Download Full-text

Investigation of Heterogeneity Sources for Occupational Task Recognition via Transfer Learning

Sensors ◽

10.3390/s21196677 ◽

2021 ◽

Vol 21 (19) ◽

pp. 6677

Author(s):

Sahand Hajifar ◽

Saeb Ragani Lamooki ◽

Lora A. Cavuoto ◽

Fadel M. Megahed ◽

Hongyue Sun

Keyword(s):

Support Vector Machine ◽

Activity Recognition ◽

Domain Adaptation ◽

Classification Performance ◽

Support Vector ◽

Standard Support Vector Machine ◽

Occupational Tasks ◽

Testing Data ◽

Occupational Task ◽

The Impact

Human activity recognition has been extensively used for the classification of occupational tasks. Existing activity recognition approaches perform well when training and testing data follow an identical distribution. However, in the real world, this condition may be violated due to existing heterogeneities among training and testing data, which results in degradation of classification performance. This study aims to investigate the impact of four heterogeneity sources, cross-sensor, cross-subject, joint cross-sensor and cross-subject, and cross-scenario heterogeneities, on classification performance. To that end, two experiments called separate task scenario and mixed task scenario were conducted to simulate tasks of electrical line workers under various heterogeneity sources. Furthermore, a support vector machine classifier equipped with domain adaptation was used to classify the tasks and benchmarked against a standard support vector machine baseline. Our results demonstrated that the support vector machine equipped with domain adaptation outperformed the baseline for cross-sensor, joint cross-subject and cross-sensor, and cross-subject cases, while the performance of support vector machine equipped with domain adaptation was not better than that of the baseline for cross-scenario case. Therefore, it is of great importance to investigate the impact of heterogeneity sources on classification performance and if needed, leverage domain adaptation methods to improve the performance.

Download Full-text

Comparison of methods for texture analysis of QUS parametric images in the characterization of breast lesions

PLoS ONE ◽

10.1371/journal.pone.0244965 ◽

2020 ◽

Vol 15 (12) ◽

pp. e0244965

Author(s):

Laurentius O. Osapoetra ◽

William Chan ◽

William Tran ◽

Michael C. Kolios ◽

Gregory J. Czarnota

Keyword(s):

Texture Analysis ◽

Classification Performance ◽

Support Vector ◽

Diagnostic Model ◽

Breast Lesions ◽

Gray Level ◽

Parametric Images ◽

Analysis Methods ◽

Tumour Margin

Purpose Accurate and timely diagnosis of breast carcinoma is very crucial because of its high incidence and high morbidity. Screening can improve overall prognosis by detecting the disease early. Biopsy remains as the gold standard for pathological confirmation of malignancy and tumour grading. The development of diagnostic imaging techniques as an alternative for the rapid and accurate characterization of breast masses is necessitated. Quantitative ultrasound (QUS) spectroscopy is a modality well suited for this purpose. This study was carried out to evaluate different texture analysis methods applied on QUS spectral parametric images for the characterization of breast lesions. Methods Parametric images of mid-band-fit (MBF), spectral-slope (SS), spectral-intercept (SI), average scatterer diameter (ASD), and average acoustic concentration (AAC) were determined using QUS spectroscopy from 193 patients with breast lesions. Texture methods were used to quantify heterogeneities of the parametric images. Three statistical-based approaches for texture analysis that include Gray Level Co-occurrence Matrix (GLCM), Gray Level Run-length Matrix (GRLM), and Gray Level Size Zone Matrix (GLSZM) methods were evaluated. QUS and texture-parameters were determined from both tumour core and a 5-mm tumour margin and were used in comparison to histopathological analysis in order to classify breast lesions as either benign or malignant. We developed a diagnostic model using different classification algorithms including linear discriminant analysis (LDA), k-nearest neighbours (KNN), support vector machine with radial basis function kernel (SVM-RBF), and an artificial neural network (ANN). Model performance was evaluated using leave-one-out cross-validation (LOOCV) and hold-out validation. Results Classifier performances ranged from 73% to 91% in terms of accuracy dependent on tumour margin inclusion and classifier methodology. Utilizing information from tumour core alone, the ANN achieved the best classification performance of 93% sensitivity, 88% specificity, 91% accuracy, 0.95 AUC using QUS parameters and their GLSZM texture features. Conclusions A QUS-based framework and texture analysis methods enabled classification of breast lesions with >90% accuracy. The results suggest that optimizing method for extracting discriminative textural features from QUS spectral parametric images can improve classification performance. Evaluation of the proposed technique on a larger cohort of patients with proper validation technique demonstrated the robustness and generalization of the approach.

Download Full-text

The Impact of Simulated Spectral Noise on Random Forest and Oblique Random Forest Classification Performance

Journal of Spectroscopy ◽

10.1155/2018/8316918 ◽

2018 ◽

Vol 2018 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Na’eem Hoosen Agjee ◽

Onisimo Mutanga ◽

Kabir Peerbhay ◽

Riyad Ismail

Keyword(s):

Random Forest ◽

Classification Performance ◽

Machine Learning Algorithms ◽

Support Vector ◽

Random Forest Classification ◽

Node Splitting ◽

Forest Classification ◽

Vector Machines ◽

Classifier Performance ◽

The Impact

Hyperspectral datasets contain spectral noise, the presence of which adversely affects the classifier performance to generalize accurately. Despite machine learning algorithms being regarded as robust classifiers that generalize well under unfavourable noisy conditions, the extent of this is poorly understood. This study aimed to evaluate the influence of simulated spectral noise (10%, 20%, and 30%) on random forest (RF) and oblique random forest (oRF) classification performance using two node-splitting models (ridge regression (RR) and support vector machines (SVM)) to discriminate healthy and low infested water hyacinth plants. Results from this study showed that RF was slightly influenced by simulated noise with classification accuracies decreasing for week one and week two with the addition of 30% noise. In comparison to RF, oRF-RR and oRF-SVM yielded higher test accuracies (oRF-RR: 5.36%–7.15%; oRF-SVM: 3.58%–5.36%) and test kappa coefficients (oRF-RR: 10.72%–14.29%; oRF-SVM: 7.15%–10.72%). Notably, oRF-RR test accuracies and kappa coefficients remained consistent irrespective of simulated noise level for week one and week two while similar results were achieved for week three using oRF-SVM. Overall, this study has demonstrated that oRF-RR can be regarded a robust classification algorithm that is not influenced by noisy spectral conditions.

Download Full-text

The Impact of Normalization Approaches to Automatically Detect Radiogenomic Phenotypes Characterizing Breast Cancer Receptors Status

Cancers ◽

10.3390/cancers12020518 ◽

2020 ◽

Vol 12 (2) ◽

pp. 518 ◽

Cited By ~ 4

Author(s):

Rossana Castaldo ◽

Katia Pane ◽

Emanuele Nicolai ◽

Marco Salvatore ◽

Monica Franzese

Keyword(s):

Breast Cancer ◽

Triple Negative ◽

The Cancer Genome Atlas ◽

Machine Learning Techniques ◽

Prognostic Indicators ◽

Imaging Biomarker ◽

Support Vector ◽

Dynamic Contrast Enhancement ◽

Feature Normalization ◽

The Impact

In breast cancer studies, combining quantitative radiomic with genomic signatures can help identifying and characterizing radiogenomic phenotypes, in function of molecular receptor status. Biomedical imaging processing lacks standards in radiomic feature normalization methods and neglecting feature normalization can highly bias the overall analysis. This study evaluates the effect of several normalization techniques to predict four clinical phenotypes such as estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor 2 (HER2), and triple negative (TN) status, by quantitative features. The Cancer Imaging Archive (TCIA) radiomic features from 91 T1-weighted Dynamic Contrast Enhancement MRI of invasive breast cancers were investigated in association with breast invasive carcinoma miRNA expression profiling from the Cancer Genome Atlas (TCGA). Three advanced machine learning techniques (Support Vector Machine, Random Forest, and Naïve Bayesian) were investigated to distinguish between molecular prognostic indicators and achieved an area under the ROC curve (AUC) values of 86%, 93%, 91%, and 91% for the prediction of ER+ versus ER−, PR+ versus PR−, HER2+ versus HER2−, and triple-negative, respectively. In conclusion, radiomic features enable to discriminate major breast cancer molecular subtypes and may yield a potential imaging biomarker for advancing precision medicine.

Download Full-text

Impact of Texture Information on Crop Classification with Machine Learning and UAV Images

Applied Sciences ◽

10.3390/app9040643 ◽

2019 ◽

Vol 9 (4) ◽

pp. 643 ◽

Cited By ~ 9

Author(s):

Geun-Ho Kwak ◽

No-Wook Park

Keyword(s):

Time Series ◽

Texture Features ◽

Classification Performance ◽

Support Vector ◽

Texture Information ◽

Multi Temporal ◽

Crop Classification ◽

Uav Images ◽

Uav Image ◽

The Impact

Unmanned aerial vehicle (UAV) images that can provide thematic information at much higher spatial and temporal resolutions than satellite images have great potential in crop classification. Due to the ultra-high spatial resolution of UAV images, spatial contextual information such as texture is often used for crop classification. From a data availability viewpoint, it is not always possible to acquire time-series UAV images due to limited accessibility to the study area. Thus, it is necessary to improve classification performance for situations when a single or minimum number of UAV images are available for crop classification. In this study, we investigate the potential of gray-level co-occurrence matrix (GLCM)-based texture information for crop classification with time-series UAV images and machine learning classifiers including random forest and support vector machine. In particular, the impact of combining texture and spectral information on the classification performance is evaluated for cases that use only one UAV image or multi-temporal images as input. A case study of crop classification in Anbandegi of Korea was conducted for the above comparisons. The best classification accuracy was achieved when multi-temporal UAV images which can fully account for the growth cycles of crops were combined with GLCM-based texture features. However, the impact of the utilization of texture information was not significant. In contrast, when one August UAV image was used for crop classification, the utilization of texture information significantly affected the classification performance. Classification using texture features extracted from GLCM with larger kernel size significantly improved classification accuracy, an improvement of 7.72%p in overall accuracy for the support vector machine classifier, compared with classification based solely on spectral information. These results indicate the usefulness of texture information for classification of ultra-high-spatial-resolution UAV images, particularly when acquisition of time-series UAV images is difficult and only one UAV image is used for crop classification.

Download Full-text

Computer aided mass detection in mammography with temporal change analysis

Computer Science and Information Systems ◽

10.2298/csis141230049m ◽

2015 ◽

Vol 12 (4) ◽

pp. 1255-1272 ◽

Cited By ~ 3

Author(s):

Fei Ma ◽

Limin Yu ◽

Gang Liu ◽

Qiang Niu

Keyword(s):

Temporal Change ◽

Graph Matching ◽

Spatial Relations ◽

Classification Performance ◽

Temporal Changes ◽

Image Features ◽

Mass Detection ◽

Change Analysis ◽

Temporal Features ◽

Mass Classification

This paper presents a method to extract change information from temporal mammogram pairs and to incorporate the temporal change information in the malignant mass classification. In this method, a temporal mammogram registration framework which is based on spatial relations between regions of interest and graph matching was used to create correspondences between regions of current mammogram and regions of previous mammogram. 18 image features were then used to capture the differences (temporal changes) between the matched regions. To assess the contribution of temporal change information to the mass detection, 5 methods were designed to combine mass classification on image features measured on single regions and mass classification on temporal features to improve overall mass classification. The method was tested on 95 pairs of temporal mammograms using k-fold cross validation procedure. The experimental results showed that, when combining two classification results using linear combination or by taking minimum value, the Az score of overall classification performance increased from 0.8843 to 0.8989 and 0.8863 respectively. The results demonstrated that registering temporal mammograms, measuring temporal changes from matched regions and incorporating the change information in the mass classification improves the overall mass detection.

Download Full-text