scholarly journals Multi-Label Classification Based on Random Forest Algorithm for Non-Intrusive Load Monitoring System

Processes ◽  
2019 ◽  
Vol 7 (6) ◽  
pp. 337 ◽  
Author(s):  
Xin Wu ◽  
Yuchen Gao ◽  
Dian Jiao

Non-intrusive load monitoring (NILM) is an effective method to optimize energy consumption patterns. Since the concept of NILM was proposed, extensive research has focused on energy disaggregation or load identification. The traditional method is to disaggregate mixed signals, and then identify the independent load. This paper proposes a multi-label classification method using Random Forest (RF) as a learning algorithm for non-intrusive load identification. Multi-label classification can be used to determine which categories data belong to. This classification can help to identify the operation states of independent loads from mixed signals without disaggregation. The experiments are conducted in real environment and public data set respectively. Several basic electrical features are selected as the classification feature to build the classification model. These features are also compared to select the most suitable features for classification by feature importance parameters. The classification accuracy and F-score of the proposed method can reach 0.97 and 0.98, respectively.

2021 ◽  
Vol 11 (6) ◽  
pp. 1592-1598
Author(s):  
Xufei Liu

The early detection of cardiovascular diseases based on electrocardiogram (ECG) is very important for the timely treatment of cardiovascular patients, which increases the survival rate of patients. ECG is a visual representation that describes changes in cardiac bioelectricity and is the basis for detecting heart health. With the rise of edge machine learning and Internet of Things (IoT) technologies, small machine learning models have received attention. This study proposes an ECG automatic classification method based on Internet of Things technology and LSTM network to achieve early monitoring and early prevention of cardiovascular diseases. Specifically, this paper first proposes a single-layer bidirectional LSTM network structure. Make full use of the timing-dependent features of the sampling points before and after to automatically extract features. The network structure is more lightweight and the calculation complexity is lower. In order to verify the effectiveness of the proposed classification model, the relevant comparison algorithm is used to verify on the MIT-BIH public data set. Secondly, the model is embedded in a wearable device to automatically classify the collected ECG. Finally, when an abnormality is detected, the user is alerted by an alarm. The experimental results show that the proposed model has a simple structure and a high classification and recognition rate, which can meet the needs of wearable devices for monitoring ECG of patients.


2019 ◽  
Vol 11 (11) ◽  
pp. 3222 ◽  
Author(s):  
Pascal Schirmer ◽  
Iosif Mporas

In this paper we evaluate several well-known and widely used machine learning algorithms for regression in the energy disaggregation task. Specifically, the Non-Intrusive Load Monitoring approach was considered and the K-Nearest-Neighbours, Support Vector Machines, Deep Neural Networks and Random Forest algorithms were evaluated across five datasets using seven different sets of statistical and electrical features. The experimental results demonstrated the importance of selecting both appropriate features and regression algorithms. Analysis on device level showed that linear devices can be disaggregated using statistical features, while for non-linear devices the use of electrical features significantly improves the disaggregation accuracy, as non-linear appliances have non-sinusoidal current draw and thus cannot be well parametrized only by their active power consumption. The best performance in terms of energy disaggregation accuracy was achieved by the Random Forest regression algorithm.


2020 ◽  
Vol 10 (1) ◽  
pp. 1-11
Author(s):  
Arvind Shrivastava ◽  
Nitin Kumar ◽  
Kuldeep Kumar ◽  
Sanjeev Gupta

The paper deals with the Random Forest, a popular classification machine learning algorithm to predict bankruptcy (distress) for Indian firms. Random Forest orders firms according to their propensity to default or their likelihood to become distressed. This is also useful to explain the association between the tendency of firm failure and its features. The results are analyzed vis-à-vis Tree Net. Both in-sample and out of sample estimations have been performed to compare Random Forest with Tree Net, which is a cutting edge data mining tool known to provide satisfactory estimation results. An exhaustive data set comprising companies from varied sectors have been included in the analysis. It is found that Tree Net procedure provides improved classification and predictive performance vis-à-vis Random Forest methodology consistently that may be utilized further by industry analysts and researchers alike for predictive purposes.


Author(s):  
MUSTAPHA LEBBAH ◽  
YOUNÈS BENNANI ◽  
NICOLETA ROGOVSCHI

This paper introduces a probabilistic self-organizing map for topographic clustering, analysis and visualization of multivariate binary data or categorical data using binary coding. We propose a probabilistic formalism dedicated to binary data in which cells are represented by a Bernoulli distribution. Each cell is characterized by a prototype with the same binary coding as used in the data space and the probability of being different from this prototype. The learning algorithm, Bernoulli on self-organizing map, that we propose is an application of the EM standard algorithm. We illustrate the power of this method with six data sets taken from a public data set repository. The results show a good quality of the topological ordering and homogenous clustering.


2021 ◽  
Author(s):  
Marc Raphael ◽  
Michael Robitaille ◽  
Jeff Byers ◽  
Joseph Christodoulides

Abstract Machine learning algorithms hold the promise of greatly improving live cell image analysis by way of (1) analyzing far more imagery than can be achieved by more traditional manual approaches and (2) by eliminating the subjective nature of researchers and diagnosticians selecting the cells or cell features to be included in the analyzed data set. Currently, however, even the most sophisticated model based or machine learning algorithms require user supervision, meaning the subjectivity problem is not removed but rather incorporated into the algorithm’s initial training steps and then repeatedly applied to the imagery. To address this roadblock, we have developed a self-supervised machine learning algorithm that recursively trains itself directly from the live cell imagery data, thus providing objective segmentation and quantification. The approach incorporates an optical flow algorithm component to self-label cell and background pixels for training, followed by the extraction of additional feature vectors for the automated generation of a cell/background classification model. Because it is self-trained, the software has no user-adjustable parameters and does not require curated training imagery. The algorithm was applied to automatically segment cells from their background for a variety of cell types and five commonly used imaging modalities - fluorescence, phase contrast, differential interference contrast (DIC), transmitted light and interference reflection microscopy (IRM). The approach is broadly applicable in that it enables completely automated cell segmentation for long-term live cell phenotyping applications, regardless of the input imagery’s optical modality, magnification or cell type.


2021 ◽  
Author(s):  
Michael C. Robitaille ◽  
Jeff M. Byers ◽  
Joseph A. Christodoulides ◽  
Marc P. Raphael

Machine learning algorithms hold the promise of greatly improving live cell image analysis by way of (1) analyzing far more imagery than can be achieved by more traditional manual approaches and (2) by eliminating the subjective nature of researchers and diagnosticians selecting the cells or cell features to be included in the analyzed data set. Currently, however, even the most sophisticated model based or machine learning algorithms require user supervision, meaning the subjectivity problem is not removed but rather incorporated into the algorithm's initial training steps and then repeatedly applied to the imagery. To address this roadblock, we have developed a self-supervised machine learning algorithm that recursively trains itself directly from the live cell imagery data, thus providing objective segmentation and quantification. The approach incorporates an optical flow algorithm component to self-label cell and background pixels for training, followed by the extraction of additional feature vectors for the automated generation of a cell/background classification model. Because it is self-trained, the software has no user-adjustable parameters and does not require curated training imagery. The algorithm was applied to automatically segment cells from their background for a variety of cell types and five commonly used imaging modalities - fluorescence, phase contrast, differential interference contrast (DIC), transmitted light and interference reflection microscopy (IRM). The approach is broadly applicable in that it enables completely automated cell segmentation for long-term live cell phenotyping applications, regardless of the input imagery's optical modality, magnification or cell type.


2021 ◽  
Vol 8 (3) ◽  
pp. 209-221
Author(s):  
Li-Li Wei ◽  
Yue-Shuai Pan ◽  
Yan Zhang ◽  
Kai Chen ◽  
Hao-Yu Wang ◽  
...  

Abstract Objective To study the application of a machine learning algorithm for predicting gestational diabetes mellitus (GDM) in early pregnancy. Methods This study identified indicators related to GDM through a literature review and expert discussion. Pregnant women who had attended medical institutions for an antenatal examination from November 2017 to August 2018 were selected for analysis, and the collected indicators were retrospectively analyzed. Based on Python, the indicators were classified and modeled using a random forest regression algorithm, and the performance of the prediction model was analyzed. Results We obtained 4806 analyzable data from 1625 pregnant women. Among these, 3265 samples with all 67 indicators were used to establish data set F1; 4806 samples with 38 identical indicators were used to establish data set F2. Each of F1 and F2 was used for training the random forest algorithm. The overall predictive accuracy of the F1 model was 93.10%, area under the receiver operating characteristic curve (AUC) was 0.66, and the predictive accuracy of GDM-positive cases was 37.10%. The corresponding values for the F2 model were 88.70%, 0.87, and 79.44%. The results thus showed that the F2 prediction model performed better than the F1 model. To explore the impact of sacrificial indicators on GDM prediction, the F3 data set was established using 3265 samples (F1) with 38 indicators (F2). After training, the overall predictive accuracy of the F3 model was 91.60%, AUC was 0.58, and the predictive accuracy of positive cases was 15.85%. Conclusions In this study, a model for predicting GDM with several input variables (e.g., physical examination, past history, personal history, family history, and laboratory indicators) was established using a random forest regression algorithm. The trained prediction model exhibited a good performance and is valuable as a reference for predicting GDM in women at an early stage of pregnancy. In addition, there are certain requirements for the proportions of negative and positive cases in sample data sets when the random forest algorithm is applied to the early prediction of GDM.


2021 ◽  
Vol 21 (9) ◽  
pp. 2773-2789
Author(s):  
Jacob Hirschberg ◽  
Alexandre Badoux ◽  
Brian W. McArdell ◽  
Elena Leonarduzzi ◽  
Peter Molnar

Abstract. The prediction of debris flows is relevant because this type of natural hazard can pose a threat to humans and infrastructure. Debris-flow (and landslide) early warning systems often rely on rainfall intensity–duration (ID) thresholds. Multiple competing methods exist for the determination of such ID thresholds but have not been objectively and thoroughly compared at multiple scales, and a validation and uncertainty assessment is often missing in their formulation. As a consequence, updating, interpreting, generalizing and comparing rainfall thresholds is challenging. Using a 17-year record of rainfall and 67 debris flows in a Swiss Alpine catchment (Illgraben), we determined ID thresholds and associated uncertainties as a function of record duration. Furthermore, we compared two methods for rainfall definition based on linear regression and/or true-skill-statistic maximization. The main difference between these approaches and the well-known frequentist method is that non-triggering rainfall events were also considered for obtaining ID-threshold parameters. Depending on the method applied, the ID-threshold parameters and their uncertainties differed significantly. We found that 25 debris flows are sufficient to constrain uncertainties in ID-threshold parameters to ±30 % for our study site. We further demonstrated the change in predictive performance of the two methods if a regional landslide data set with a regional rainfall product was used instead of a local one with local rainfall measurements. Hence, an important finding is that the ideal method for ID-threshold determination depends on the available landslide and rainfall data sets. Furthermore, for the local data set we tested if the ID-threshold performance can be increased by considering other rainfall properties (e.g. antecedent rainfall, maximum intensity) in a multivariate statistical learning algorithm based on decision trees (random forest). The highest predictive power was reached when the peak 30 min rainfall intensity was added to the ID variables, while no improvement was achieved by considering antecedent rainfall for debris-flow predictions in Illgraben. Although the increase in predictive performance with the random forest model over the classical ID threshold was small, such a framework could be valuable for future studies if more predictors are available from measured or modelled data.


Energies ◽  
2020 ◽  
Vol 13 (9) ◽  
pp. 2148 ◽  
Author(s):  
Pascal A. Schirmer ◽  
Iosif Mporas ◽  
Akbar Sheikh-Akbari

A data-driven methodology to improve the energy disaggregation accuracy during Non-Intrusive Load Monitoring is proposed. In detail, the method uses a two-stage classification scheme, with the first stage consisting of classification models processing the aggregated signal in parallel and each of them producing a binary device detection score, and the second stage consisting of fusion regression models for estimating the power consumption for each of the electrical appliances. The accuracy of the proposed approach was tested on three datasets—ECO (Electricity Consumption & Occupancy), REDD (Reference Energy Disaggregation Data Set), and iAWE (Indian Dataset for Ambient Water and Energy)—which are available online, using four different classifiers. The presented approach improves the estimation accuracy by up to 4.1% with respect to a basic energy disaggregation architecture, while the improvement on device level was up to 10.1%. Analysis on device level showed significant improvement of power consumption estimation accuracy especially for continuous and nonlinear appliances across all evaluated datasets.


2013 ◽  
Vol 2013 ◽  
pp. 1-10 ◽  
Author(s):  
Jianwei Liu ◽  
Shuang Cheng Li ◽  
Xionglin Luo

Support vector machine is an effective classification and regression method that uses machine learning theory to maximize the predictive accuracy while avoiding overfitting of data.L2regularization has been commonly used. If the training dataset contains many noise variables,L1regularization SVM will provide a better performance. However, bothL1andL2are not the optimal regularization method when handing a large number of redundant values and only a small amount of data points is useful for machine learning. We have therefore proposed an adaptive learning algorithm using the iterative reweightedp-norm regularization support vector machine for 0 <p≤ 2. A simulated data set was created to evaluate the algorithm. It was shown that apvalue of 0.8 was able to produce better feature selection rate with high accuracy. Four cancer data sets from public data banks were used also for the evaluation. All four evaluations show that the new adaptive algorithm was able to achieve the optimal prediction error using apvalue less thanL1norm. Moreover, we observe that the proposedLppenalty is more robust to noise variables than theL1andL2penalties.


Sign in / Sign up

Export Citation Format

Share Document