Multivariate Features Extraction and Effective Decision Making Using Machine Learning Approaches

Sondes Gharsellaoui; Majdi Mansouri; Shady S. Refaat; Haitham Abu-Rub; Hassani Messaoud

doi:10.3390/en13030609

Multivariate Features Extraction and Effective Decision Making Using Machine Learning Approaches

Energies ◽

10.3390/en13030609 ◽

2020 ◽

Vol 13 (3) ◽

pp. 609 ◽

Cited By ~ 1

Author(s):

Sondes Gharsellaoui ◽

Majdi Mansouri ◽

Shady S. Refaat ◽

Haitham Abu-Rub ◽

Hassani Messaoud

Keyword(s):

Machine Learning ◽

Air Conditioning ◽

Synthetic Data ◽

Simulated Data ◽

Principal Component ◽

Features Extraction ◽

Fault Detection And Isolation ◽

Fault Classification ◽

Learning Approaches ◽

Multiscale Representation

Fault Detection and Isolation (FDI) in Heating, Ventilation, and Air Conditioning (HVAC) systems is an important approach to guarantee the human safety of these systems. Therefore, the implementation of a FDI framework is required to reduce the energy needs for buildings and improving indoor environment quality. The main goal of this paper is to merge the benefits of multiscale representation, Principal Component Analysis (PCA), and Machine Learning (ML) classifiers to improve the efficiency of the detection and isolation of Air Conditioning (AC) systems. First, the multivariate statistical features extraction and selection is achieved using the PCA method. Then, the multiscale representation is applied to separate feature from noise and approximately decorrelate autocorrelation between available measurements. Third, the extracted and selected features are introduced to several machine learning classifiers for fault classification purposes. The effectiveness and higher classification accuracy of the developed Multiscale PCA (MSPCA)-based ML technique is demonstrated using two examples: synthetic data and simulated data extracted from Air Conditioning systems.

Download Full-text

Early Detection of Diabetic Retinopathy Using PCA-Firefly Based Deep Learning Model

Electronics ◽

10.3390/electronics9020274 ◽

2020 ◽

Vol 9 (2) ◽

pp. 274 ◽

Cited By ~ 30

Author(s):

Thippa Reddy Gadekallu ◽

Neelu Khare ◽

Sweta Bhattacharya ◽

Saurabh Singh ◽

Praveen Kumar Reddy Maddikunta ◽

...

Keyword(s):

Machine Learning ◽

Diabetic Retinopathy ◽

Deep Learning ◽

Early Detection ◽

Dimensionality Reduction ◽

Vision Loss ◽

Principal Component ◽

Screening Methods ◽

Learning Approaches ◽

Time Period

Diabetic Retinopathy is a major cause of vision loss and blindness affecting millions of people across the globe. Although there are established screening methods - fluorescein angiography and optical coherence tomography for detection of the disease but in majority of the cases, the patients remain ignorant and fail to undertake such tests at an appropriate time. The early detection of the disease plays an extremely important role in preventing vision loss which is the consequence of diabetes mellitus remaining untreated among patients for a prolonged time period. Various machine learning and deep learning approaches have been implemented on diabetic retinopathy dataset for classification and prediction of the disease but majority of them have neglected the aspect of data pre-processing and dimensionality reduction, leading to biased results. The dataset used in the present study is a diabetes retinopathy dataset collected from the UCI machine learning repository. At its inceptions, the raw dataset is normalized using the Standardscalar technique and then Principal Component Analysis (PCA) is used to extract the most significant features in the dataset. Further, Firefly algorithm is implemented for dimensionality reduction. This reduced dataset is fed into a Deep Neural Network Model for classification. The results generated from the model is evaluated against the prevalent machine learning models and the results justify the superiority of the proposed model in terms of Accuracy, Precision, Recall, Sensitivity and Specificity.

Download Full-text

MITRE: predicting host status from microbiota time-series data

10.1101/447250 ◽

2018 ◽

Author(s):

Elijah Bogart ◽

Richard Creswell ◽

Georg K. Gerber

Keyword(s):

Machine Learning ◽

Time Series ◽

Time Series Data ◽

Synthetic Data ◽

Black Box ◽

Series Data ◽

Learning Approaches ◽

Rule Engine ◽

Microbiome Composition ◽

Host Status

AbstractLongitudinal studies are crucial for discovering casual relationships between the microbiome and human disease. We present Microbiome Interpretable Temporal Rule Engine (MITRE), the first machine learning method specifically designed for predicting host status from microbiome time-series data. Our method maintains interpretability by learning predictive rules over automatically inferred time-periods and phylogenetically related microbes. We validate MITRE’s performance on semi-synthetic data, and five real datasets measuring microbiome composition over time in infant and adult cohorts. Our results demonstrate that MITRE performs on par or outperforms “black box” machine learning approaches, providing a powerful new tool enabling discovery of biologically interpretable relationships between microbiome and human host.

Download Full-text

A Novel PCA-Firefly Based XGBoost Classification Model for Intrusion Detection in Networks Using GPU

Electronics ◽

10.3390/electronics9020219 ◽

2020 ◽

Vol 9 (2) ◽

pp. 219 ◽

Cited By ~ 37

Author(s):

Sweta Bhattacharya ◽

Siva Rama Krishnan S ◽

Praveen Kumar Reddy Maddikunta ◽

Rajesh Kaluri ◽

Saurabh Singh ◽

...

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Comprehensive Evaluation ◽

Detection System ◽

Human Life ◽

Principal Component ◽

Cyber Attacks ◽

Classification Model ◽

Learning Approaches ◽

Machine Learning Model

The enormous popularity of the internet across all spheres of human life has introduced various risks of malicious attacks in the network. The activities performed over the network could be effortlessly proliferated, which has led to the emergence of intrusion detection systems. The patterns of the attacks are also dynamic, which necessitates efficient classification and prediction of cyber attacks. In this paper we propose a hybrid principal component analysis (PCA)-firefly based machine learning model to classify intrusion detection system (IDS) datasets. The dataset used in the study is collected from Kaggle. The model first performs One-Hot encoding for the transformation of the IDS datasets. The hybrid PCA-firefly algorithm is then used for dimensionality reduction. The XGBoost algorithm is implemented on the reduced dataset for classification. A comprehensive evaluation of the model is conducted with the state of the art machine learning approaches to justify the superiority of our proposed approach. The experimental results confirm the fact that the proposed model performs better than the existing machine learning models.

Download Full-text

A data-driven methodology for the classification of different liquids in artificial taste recognition applications with a pulse voltammetric electronic tongue

International Journal of Distributed Sensor Networks ◽

10.1177/1550147719881601 ◽

2019 ◽

Vol 15 (10) ◽

pp. 155014771988160 ◽

Cited By ~ 6

Author(s):

Jersson X Leon-Medina ◽

Leydi J Cardenas-Flechas ◽

Diego A Tibaduiza

Keyword(s):

Machine Learning ◽

Pattern Recognition ◽

Data Analysis ◽

Nearest Neighbor ◽

Electronic Tongue ◽

Sensor Arrays ◽

Principal Component ◽

Machine Learning Techniques ◽

Learning Approaches ◽

K Nearest Neighbor

Electronic tongue-type sensor arrays are devices used to determine the quality of substances and seek to imitate the main components of the human sense of taste. For this purpose, an electronic tongue-based system makes use of sensors, data acquisition systems, and a pattern recognition system. Particularly, in the latter, machine learning techniques are useful in data analysis and have been used to solve classification and regression problems. However, one of the problems in the use of this kind of device is associated with the development of reliable pattern recognition algorithms and robust data analysis. In this sense, this work introduces a taste recognition methodology, which is composed of several steps including unfolding data, data normalization, principal component analysis for compressing the data, and classification through different machine learning models. The proposed methodology is tested using data from an electronic tongue with 13 different liquid substances; this electronic tongue uses multifrequency large amplitude pulse signal voltammetry. Results show that the methodology is able to perform the classification accurately and the best results are obtained when it includes the use of K-nearest neighbor machine in terms of accuracy compared with other kinds of machine learning approaches. Besides, the comparison to evaluate the methodology is made with different classification performance measures that show the behavior of the process in a single number.

Download Full-text

Hidden Patterns of Anti-HLA Class I Alloreactivity Revealed Through Machine Learning

Frontiers in Immunology ◽

10.3389/fimmu.2021.670956 ◽

2021 ◽

Vol 12 ◽

Author(s):

Angeliki G. Vittoraki ◽

Asimina Fylaktou ◽

Katerina Tarassi ◽

Zafeiris Tsinaris ◽

Alexandra Siorenta ◽

...

Keyword(s):

Machine Learning ◽

Current Knowledge ◽

A Priori ◽

Principal Component ◽

Hla Class I ◽

Class I ◽

Learning Approaches ◽

Hla Class I Antigens ◽

Reactive Groups ◽

Antigenic Targets

Detection of alloreactive anti-HLA antibodies is a frequent and mandatory test before and after organ transplantation to determine the antigenic targets of the antibodies. Nowadays, this test involves the measurement of fluorescent signals generated through antibody–antigen reactions on multi-beads flow cytometers. In this study, in a cohort of 1,066 patients from one country, anti-HLA class I responses were analyzed on a panel of 98 different antigens. Knowing that the immune system responds typically to “shared” antigenic targets, we studied the clustering patterns of antibody responses against HLA class I antigens without any a priori hypothesis, applying two unsupervised machine learning approaches. At first, the principal component analysis (PCA) projections of intra-locus specific responses showed that anti-HLA-A and anti-HLA-C were the most distantly projected responses in the population with the anti-HLA-B responses to be projected between them. When PCA was applied on the responses against antigens belonging to a single locus, some already known groupings were confirmed while several new cross-reactive patterns of alloreactivity were detected. Anti-HLA-A responses projected through PCA suggested that three cross-reactive groups accounted for about 70% of the variance observed in the population, while anti-HLA-B responses were mainly characterized by a distinction between previously described Bw4 and Bw6 cross-reactive groups followed by several yet undocumented or poorly described ones. Furthermore, anti-HLA-C responses could be explained by two major cross-reactive groups completely overlapping with previously described C1 and C2 allelic groups. A second feature-based analysis of all antigenic specificities, projected as a dendrogram, generated a robust measure of allelic antigenic distances depicting bead-array defined cross reactive groups. Finally, amino acid combinations explaining major population specific cross-reactive groups were described. The interpretation of the results was based on the current knowledge of the antigenic targets of the antibodies as they have been characterized either experimentally or computationally and appear at the HLA epitope registry.

Download Full-text

EXPLAIN IT TO ME – FACING REMOTE SENSING CHALLENGES IN THE BIO- AND GEOSCIENCES WITH EXPLAINABLE MACHINE LEARNING

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-3-2020-817-2020 ◽

2020 ◽

Vol V-3-2020 ◽

pp. 817-824

Author(s):

R. Roscher ◽

B. Bohn ◽

M. F. Duarte ◽

J. Garcke

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Neural Networks ◽

Simulated Data ◽

High Accuracy ◽

Learning Approaches ◽

Learning Methods ◽

Machine Learning Methods ◽

Recent Advances ◽

Post Hoc

Abstract. For some time now, machine learning methods have been indispensable in many application areas. Especially with the recent development of efficient neural networks, these methods are increasingly used in the sciences to obtain scientific outcomes from observational or simulated data. Besides a high accuracy, a desired goal is to learn explainable models. In order to reach this goal and obtain explanation, knowledge from the respective domain is necessary, which can be integrated into the model or applied post-hoc. We discuss explainable machine learning approaches which are used to tackle common challenges in the bio- and geosciences, such as limited amount of labeled data or the provision of reliable and scientific consistent results. We show that recent advances in machine learning to enhance transparency, interpretability, and explainability are helpful in overcoming these challenges.

Download Full-text

Classification of CAD dataset by using principal component analysis and machine learning approaches

2018 5th International Conference on Electrical and Electronic Engineering (ICEEE) ◽

10.1109/iceee2.2018.8391358 ◽

2018 ◽

Author(s):

Ali Cuvitoglu ◽

Zerrin Isik

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Principal Component ◽

Component Analysis ◽

Learning Approaches

Download Full-text

Gear pitting fault diagnosis using disentangled features from unsupervised deep learning

Proceedings of the Institution of Mechanical Engineers Part O Journal of Risk and Reliability ◽

10.1177/1748006x18822447 ◽

2019 ◽

Vol 233 (5) ◽

pp. 719-730 ◽

Cited By ~ 2

Author(s):

Yongzhi Qu ◽

Yue Zhang ◽

Miao He ◽

David He ◽

Chen Jiao ◽

...

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Fault Diagnosis ◽

Principal Component ◽

Training Data ◽

Health State ◽

Learning Approaches ◽

Time Frequency ◽

Fault Diagnosis And Prognosis ◽

Machinery Fault Diagnosis

Effective feature extraction is critical for machinery fault diagnosis and prognosis. The use of time–frequency features for machinery fault diagnosis has prevailed in the last decade. However, more attentions have been drawn to machine learning–based features. While time–frequency domain features can be directly correlated to fault types and fault levels, data-driven features are typically abstract representations. Therefore, classical machine learning approaches require large amount of training data to classify these abstract features for fault diagnosis. This article proposed a fully unsupervised feature extraction method for “meaningful” feature mining, named disentangled tone mining. It is shown that disentangled tone mining can effectively extract the hidden “trend” associated with machinery health state, which can be used directly for online anomaly detection and prediction. Compared with wavelet transform and time domain statistics, disentangled tone mining can better extract fault-related features and reflect the fault degradation process. Shallow methods, such as principal component analysis, multidimensional scaling and single-layer sparse autoencoder, are shown to be inferior in terms of disentangled feature learning for machinery signals. Simulation analysis is also provided to demonstrate and explain the potential mechanism underlying the proposed method.

Download Full-text

Principal Component Analysis and Machine Learning Approaches for Photovoltaic Power Prediction: A Comparative Study

Applied Sciences ◽

10.3390/app11177943 ◽

2021 ◽

Vol 11 (17) ◽

pp. 7943

Author(s):

Souhaila Chahboun ◽

Mohamed Maaroufi

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Industrial Revolution ◽

Performance Metrics ◽

Principal Component ◽

Support Vector ◽

Full Potential ◽

Learning Approaches ◽

Power Prediction ◽

Intelligent Sensors

Nowadays, in the context of the industrial revolution 4.0, considerable volumes of data are being generated continuously from intelligent sensors and connected objects. The proper understanding and use of these amounts of data are crucial levers of performance and innovation. Machine learning is the technology that allows the full potential of big datasets to be exploited. As a branch of artificial intelligence, it enables us to discover patterns and make predictions from data based on statistics, data mining, and predictive analysis. The key goal of this study was to use machine learning approaches to forecast the hourly power produced by photovoltaic panels. A comparison analysis of various predictive models including elastic net, support vector regression, random forest, and Bayesian regularized neural networks was carried out to identify the models providing the best predicting results. The principal components analysis used to reduce the dimensionality of the input data revealed six main factor components that could explain up to 91.95% of the variation in all variables. Finally, performance metrics demonstrated that Bayesian regularized neural networks achieved the best results, giving an accuracy of R2 = 99.99% and RMSE = 0.002 kW.

Download Full-text

The use of principal component analysis to measure fundamental cognitive processes in neuropsychological data

10.1101/2021.11.10.468133 ◽

2021 ◽

Author(s):

Christoph Sperber

Keyword(s):

Principal Component Analysis ◽

Cognitive Functions ◽

High Reliability ◽

Synthetic Data ◽

Simulated Data ◽

Principal Component ◽

Component Analysis ◽

Filter Function ◽

Typical Structure ◽

Factor Rotation

For years, dissociation studies on neurological single cases were the dominant method to infer fundamental cognitive functions in neuropsychology. In contrast, the association between deficits was considered to be of less epistemological value and even misleading. Still, principal component analysis (PCA), an associational method for dimensionality reduction, recently became popular for the identification of fundamental functions. The current study evaluated the ability of PCA to identify the fundamental variables underlying a battery of measures. Synthetic data were simulated to resemble typical neuropsychological data, including varying dissociation patterns. In most experiments, PCA succeeded to measure the underlying target variables with high up to almost perfect precision. However, this success relied on additional factor rotation. Unroated PCA struggled with the dependence of data and often failed. On the other hand, the performance of rotated factor solutions required single measures that anchored the rotation. When no test scores existed that primarily and precisely measured each underlying target variable, rotated solutions also failed their intended purpose. Further, the dimensionality of the simulated data was consistently underestimated. Commonly used strategies to estimate the number of meaningful factors appear to be inappropriate for neuropsychological data. Finally, simulations suggested a high potential of PCA to denoise data, with factor rotation providing an additional filter function. This can be invaluable in neuropsychology, where measures are often inherently noisy, and PCA can be superior to common compound measures - such as the arithmetic mean - in the measurement of variables with high reliability. In summary, PCA appears to be a powerful tool in neuropsychology that is well capable to infer fundamental cognitive functions with high precision, but the typical structure of neuropsychological data places clear limitations and a risk of a complete methodological failure on the method.

Download Full-text