Multi-Label Classification Based on Random Forest Algorithm for Non-Intrusive Load Monitoring System

Xin Wu; Yuchen Gao; Dian Jiao

doi:10.3390/pr7060337

Multi-Label Classification Based on Random Forest Algorithm for Non-Intrusive Load Monitoring System

Processes ◽

10.3390/pr7060337 ◽

2019 ◽

Vol 7 (6) ◽

pp. 337 ◽

Cited By ~ 11

Author(s):

Xin Wu ◽

Yuchen Gao ◽

Dian Jiao

Keyword(s):

Random Forest ◽

Learning Algorithm ◽

Classification Model ◽

Load Identification ◽

Data Set ◽

Classification Feature ◽

Public Data ◽

Energy Disaggregation ◽

Load Monitoring ◽

Energy Consumption Patterns

Non-intrusive load monitoring (NILM) is an effective method to optimize energy consumption patterns. Since the concept of NILM was proposed, extensive research has focused on energy disaggregation or load identification. The traditional method is to disaggregate mixed signals, and then identify the independent load. This paper proposes a multi-label classification method using Random Forest (RF) as a learning algorithm for non-intrusive load identification. Multi-label classification can be used to determine which categories data belong to. This classification can help to identify the operation states of independent loads from mixed signals without disaggregation. The experiments are conducted in real environment and public data set respectively. Several basic electrical features are selected as the classification feature to build the classification model. These features are also compared to select the most suitable features for classification by feature importance parameters. The classification accuracy and F-score of the proposed method can reach 0.97 and 0.98, respectively.

Download Full-text

A Novel ECG Automatic Detection Using LongShort-Term Memory Network and Internet of Things Technology

Journal of Medical Imaging and Health Informatics ◽

10.1166/jmihi.2021.3684 ◽

2021 ◽

Vol 11 (6) ◽

pp. 1592-1598

Author(s):

Xufei Liu

Keyword(s):

Machine Learning ◽

Cardiovascular Diseases ◽

Internet Of Things ◽

Network Structure ◽

Recognition Rate ◽

Classification Model ◽

Data Set ◽

Public Data ◽

Lstm Network ◽

Internet Of Things Technology

The early detection of cardiovascular diseases based on electrocardiogram (ECG) is very important for the timely treatment of cardiovascular patients, which increases the survival rate of patients. ECG is a visual representation that describes changes in cardiac bioelectricity and is the basis for detecting heart health. With the rise of edge machine learning and Internet of Things (IoT) technologies, small machine learning models have received attention. This study proposes an ECG automatic classification method based on Internet of Things technology and LSTM network to achieve early monitoring and early prevention of cardiovascular diseases. Specifically, this paper first proposes a single-layer bidirectional LSTM network structure. Make full use of the timing-dependent features of the sampling points before and after to automatically extract features. The network structure is more lightweight and the calculation complexity is lower. In order to verify the effectiveness of the proposed classification model, the relevant comparison algorithm is used to verify on the MIT-BIH public data set. Secondly, the model is embedded in a wearable device to automatically classify the collected ECG. Finally, when an abnormality is detected, the user is alerted by an alarm. The experimental results show that the proposed model has a simple structure and a high classification and recognition rate, which can meet the needs of wearable devices for monitoring ECG of patients.

Download Full-text

Statistical and Electrical Features Evaluation for Electrical Appliances Energy Disaggregation

Sustainability ◽

10.3390/su11113222 ◽

2019 ◽

Vol 11 (11) ◽

pp. 3222 ◽

Cited By ~ 15

Author(s):

Pascal Schirmer ◽

Iosif Mporas

Keyword(s):

Random Forest ◽

Machine Learning Algorithms ◽

Support Vector ◽

Random Forest Regression ◽

Nearest Neighbours ◽

Energy Disaggregation ◽

Vector Machines ◽

Non Linear ◽

Load Monitoring ◽

Sinusoidal Current

In this paper we evaluate several well-known and widely used machine learning algorithms for regression in the energy disaggregation task. Specifically, the Non-Intrusive Load Monitoring approach was considered and the K-Nearest-Neighbours, Support Vector Machines, Deep Neural Networks and Random Forest algorithms were evaluated across five datasets using seven different sets of statistical and electrical features. The experimental results demonstrated the importance of selecting both appropriate features and regression algorithms. Analysis on device level showed that linear devices can be disaggregated using statistical features, while for non-linear devices the use of electrical features significantly improves the disaggregation accuracy, as non-linear appliances have non-sinusoidal current draw and thus cannot be well parametrized only by their active power consumption. The best performance in terms of energy disaggregation accuracy was achieved by the Random Forest regression algorithm.

Download Full-text

Corporate distress prediction using random forest and tree net for india

Journal of Management and Science ◽

10.26524/jms.2020.1 ◽

2020 ◽

Vol 10 (1) ◽

pp. 1-11

Author(s):

Arvind Shrivastava ◽

Nitin Kumar ◽

Kuldeep Kumar ◽

Sanjeev Gupta

Keyword(s):

Random Forest ◽

Learning Algorithm ◽

Predictive Performance ◽

Machine Learning Algorithm ◽

Data Set ◽

Data Mining Tool ◽

Distress Prediction ◽

Out Of Sample ◽

Mining Tool ◽

Corporate Distress

The paper deals with the Random Forest, a popular classification machine learning algorithm to predict bankruptcy (distress) for Indian firms. Random Forest orders firms according to their propensity to default or their likelihood to become distressed. This is also useful to explain the association between the tendency of firm failure and its features. The results are analyzed vis-à-vis Tree Net. Both in-sample and out of sample estimations have been performed to compare Random Forest with Tree Net, which is a cutting edge data mining tool known to provide satisfactory estimation results. An exhaustive data set comprising companies from varied sectors have been included in the analysis. It is found that Tree Net procedure provides improved classification and predictive performance vis-à-vis Random Forest methodology consistently that may be utilized further by industry analysts and researchers alike for predictive purposes.

Download Full-text

A PROBABILISTIC SELF-ORGANIZING MAP FOR BINARY DATA TOPOGRAPHIC CLUSTERING

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026808002351 ◽

2008 ◽

Vol 07 (04) ◽

pp. 363-383 ◽

Cited By ~ 10

Author(s):

MUSTAPHA LEBBAH ◽

YOUNÈS BENNANI ◽

NICOLETA ROGOVSCHI

Keyword(s):

Binary Data ◽

Learning Algorithm ◽

Data Sets ◽

Self Organizing Map ◽

Data Set ◽

Binary Coding ◽

Public Data ◽

Multivariate Binary Data ◽

Self Organizing

This paper introduces a probabilistic self-organizing map for topographic clustering, analysis and visualization of multivariate binary data or categorical data using binary coding. We propose a probabilistic formalism dedicated to binary data in which cells are represented by a Bernoulli distribution. Each cell is characterized by a prototype with the same binary coding as used in the data space and the probability of being different from this prototype. The learning algorithm, Bernoulli on self-organizing map, that we propose is an application of the EM standard algorithm. We illustrate the power of this method with six data sets taken from a public data set repository. The results show a good quality of the topological ordering and homogenous clustering.

Download Full-text

A Self-Supervised Machine Learning Approach for Objective Live Cell Segmentation and Analysis

10.21203/rs.3.rs-147010/v1 ◽

2021 ◽

Author(s):

Marc Raphael ◽

Michael Robitaille ◽

Jeff Byers ◽

Joseph Christodoulides

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Learning Algorithms ◽

Label Cell ◽

Live Cell ◽

Machine Learning Algorithms ◽

Classification Model ◽

Supervised Machine Learning ◽

Cell Segmentation ◽

Data Set

Abstract Machine learning algorithms hold the promise of greatly improving live cell image analysis by way of (1) analyzing far more imagery than can be achieved by more traditional manual approaches and (2) by eliminating the subjective nature of researchers and diagnosticians selecting the cells or cell features to be included in the analyzed data set. Currently, however, even the most sophisticated model based or machine learning algorithms require user supervision, meaning the subjectivity problem is not removed but rather incorporated into the algorithm’s initial training steps and then repeatedly applied to the imagery. To address this roadblock, we have developed a self-supervised machine learning algorithm that recursively trains itself directly from the live cell imagery data, thus providing objective segmentation and quantification. The approach incorporates an optical flow algorithm component to self-label cell and background pixels for training, followed by the extraction of additional feature vectors for the automated generation of a cell/background classification model. Because it is self-trained, the software has no user-adjustable parameters and does not require curated training imagery. The algorithm was applied to automatically segment cells from their background for a variety of cell types and five commonly used imaging modalities - fluorescence, phase contrast, differential interference contrast (DIC), transmitted light and interference reflection microscopy (IRM). The approach is broadly applicable in that it enables completely automated cell segmentation for long-term live cell phenotyping applications, regardless of the input imagery’s optical modality, magnification or cell type.

Download Full-text

A Self-Supervised Machine Learning Approach for Objective Live Cell Segmentation and Analysis

10.1101/2021.01.07.425773 ◽

2021 ◽

Author(s):

Michael C. Robitaille ◽

Jeff M. Byers ◽

Joseph A. Christodoulides ◽

Marc P. Raphael

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Learning Algorithms ◽

Label Cell ◽

Live Cell ◽

Machine Learning Algorithms ◽

Classification Model ◽

Supervised Machine Learning ◽

Cell Segmentation ◽

Data Set

Machine learning algorithms hold the promise of greatly improving live cell image analysis by way of (1) analyzing far more imagery than can be achieved by more traditional manual approaches and (2) by eliminating the subjective nature of researchers and diagnosticians selecting the cells or cell features to be included in the analyzed data set. Currently, however, even the most sophisticated model based or machine learning algorithms require user supervision, meaning the subjectivity problem is not removed but rather incorporated into the algorithm's initial training steps and then repeatedly applied to the imagery. To address this roadblock, we have developed a self-supervised machine learning algorithm that recursively trains itself directly from the live cell imagery data, thus providing objective segmentation and quantification. The approach incorporates an optical flow algorithm component to self-label cell and background pixels for training, followed by the extraction of additional feature vectors for the automated generation of a cell/background classification model. Because it is self-trained, the software has no user-adjustable parameters and does not require curated training imagery. The algorithm was applied to automatically segment cells from their background for a variety of cell types and five commonly used imaging modalities - fluorescence, phase contrast, differential interference contrast (DIC), transmitted light and interference reflection microscopy (IRM). The approach is broadly applicable in that it enables completely automated cell segmentation for long-term live cell phenotyping applications, regardless of the input imagery's optical modality, magnification or cell type.

Download Full-text

Application of machine learning algorithm for predicting gestational diabetes mellitus in early pregnancy†

Frontiers of Nursing ◽

10.2478/fon-2021-0022 ◽

2021 ◽

Vol 8 (3) ◽

pp. 209-221

Author(s):

Li-Li Wei ◽

Yue-Shuai Pan ◽

Yan Zhang ◽

Kai Chen ◽

Hao-Yu Wang ◽

...

Keyword(s):

Diabetes Mellitus ◽

Machine Learning ◽

Random Forest ◽

Prediction Model ◽

Predictive Accuracy ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Random Forest Algorithm ◽

Random Forest Regression ◽

Data Set

Abstract Objective To study the application of a machine learning algorithm for predicting gestational diabetes mellitus (GDM) in early pregnancy. Methods This study identified indicators related to GDM through a literature review and expert discussion. Pregnant women who had attended medical institutions for an antenatal examination from November 2017 to August 2018 were selected for analysis, and the collected indicators were retrospectively analyzed. Based on Python, the indicators were classified and modeled using a random forest regression algorithm, and the performance of the prediction model was analyzed. Results We obtained 4806 analyzable data from 1625 pregnant women. Among these, 3265 samples with all 67 indicators were used to establish data set F1; 4806 samples with 38 identical indicators were used to establish data set F2. Each of F1 and F2 was used for training the random forest algorithm. The overall predictive accuracy of the F1 model was 93.10%, area under the receiver operating characteristic curve (AUC) was 0.66, and the predictive accuracy of GDM-positive cases was 37.10%. The corresponding values for the F2 model were 88.70%, 0.87, and 79.44%. The results thus showed that the F2 prediction model performed better than the F1 model. To explore the impact of sacrificial indicators on GDM prediction, the F3 data set was established using 3265 samples (F1) with 38 indicators (F2). After training, the overall predictive accuracy of the F3 model was 91.60%, AUC was 0.58, and the predictive accuracy of positive cases was 15.85%. Conclusions In this study, a model for predicting GDM with several input variables (e.g., physical examination, past history, personal history, family history, and laboratory indicators) was established using a random forest regression algorithm. The trained prediction model exhibited a good performance and is valuable as a reference for predicting GDM in women at an early stage of pregnancy. In addition, there are certain requirements for the proportions of negative and positive cases in sample data sets when the random forest algorithm is applied to the early prediction of GDM.

Download Full-text

Evaluating methods for debris-flow prediction based on rainfall in an Alpine catchment

Natural Hazards and Earth System Science ◽

10.5194/nhess-21-2773-2021 ◽

2021 ◽

Vol 21 (9) ◽

pp. 2773-2789

Author(s):

Jacob Hirschberg ◽

Alexandre Badoux ◽

Brian W. McArdell ◽

Elena Leonarduzzi ◽

Peter Molnar

Keyword(s):

Random Forest ◽

Debris Flow ◽

Debris Flows ◽

Rainfall Intensity ◽

Multiple Scales ◽

Learning Algorithm ◽

Predictive Performance ◽

Early Warning Systems ◽

Antecedent Rainfall ◽

Data Set

Abstract. The prediction of debris flows is relevant because this type of natural hazard can pose a threat to humans and infrastructure. Debris-flow (and landslide) early warning systems often rely on rainfall intensity–duration (ID) thresholds. Multiple competing methods exist for the determination of such ID thresholds but have not been objectively and thoroughly compared at multiple scales, and a validation and uncertainty assessment is often missing in their formulation. As a consequence, updating, interpreting, generalizing and comparing rainfall thresholds is challenging. Using a 17-year record of rainfall and 67 debris flows in a Swiss Alpine catchment (Illgraben), we determined ID thresholds and associated uncertainties as a function of record duration. Furthermore, we compared two methods for rainfall definition based on linear regression and/or true-skill-statistic maximization. The main difference between these approaches and the well-known frequentist method is that non-triggering rainfall events were also considered for obtaining ID-threshold parameters. Depending on the method applied, the ID-threshold parameters and their uncertainties differed significantly. We found that 25 debris flows are sufficient to constrain uncertainties in ID-threshold parameters to ±30 % for our study site. We further demonstrated the change in predictive performance of the two methods if a regional landslide data set with a regional rainfall product was used instead of a local one with local rainfall measurements. Hence, an important finding is that the ideal method for ID-threshold determination depends on the available landslide and rainfall data sets. Furthermore, for the local data set we tested if the ID-threshold performance can be increased by considering other rainfall properties (e.g. antecedent rainfall, maximum intensity) in a multivariate statistical learning algorithm based on decision trees (random forest). The highest predictive power was reached when the peak 30 min rainfall intensity was added to the ID variables, while no improvement was achieved by considering antecedent rainfall for debris-flow predictions in Illgraben. Although the increase in predictive performance with the random forest model over the classical ID threshold was small, such a framework could be valuable for future studies if more predictors are available from measured or modelled data.

Download Full-text

Energy Disaggregation Using Two-Stage Fusion of Binary Device Detectors

Energies ◽

10.3390/en13092148 ◽

2020 ◽

Vol 13 (9) ◽

pp. 2148 ◽

Cited By ~ 1

Author(s):

Pascal A. Schirmer ◽

Iosif Mporas ◽

Akbar Sheikh-Akbari

Keyword(s):

Power Consumption ◽

Electricity Consumption ◽

Estimation Accuracy ◽

Two Stage ◽

Data Set ◽

Second Stage ◽

Energy Disaggregation ◽

Stage Classification ◽

Load Monitoring ◽

Detection Score

A data-driven methodology to improve the energy disaggregation accuracy during Non-Intrusive Load Monitoring is proposed. In detail, the method uses a two-stage classification scheme, with the first stage consisting of classification models processing the aggregated signal in parallel and each of them producing a binary device detection score, and the second stage consisting of fusion regression models for estimating the power consumption for each of the electrical appliances. The accuracy of the proposed approach was tested on three datasets—ECO (Electricity Consumption & Occupancy), REDD (Reference Energy Disaggregation Data Set), and iAWE (Indian Dataset for Ambient Water and Energy)—which are available online, using four different classifiers. The presented approach improves the estimation accuracy by up to 4.1% with respect to a basic energy disaggregation architecture, while the improvement on device level was up to 10.1%. Analysis on device level showed significant improvement of power consumption estimation accuracy especially for continuous and nonlinear appliances across all evaluated datasets.

Download Full-text

Iterative Reweighted Noninteger Norm Regularizing SVM for Gene Expression Data Classification

Computational and Mathematical Methods in Medicine ◽

10.1155/2013/768404 ◽

2013 ◽

Vol 2013 ◽

pp. 1-10 ◽

Cited By ~ 5

Author(s):

Jianwei Liu ◽

Shuang Cheng Li ◽

Xionglin Luo

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Adaptive Learning ◽

Predictive Accuracy ◽

Learning Algorithm ◽

Training Dataset ◽

Support Vector ◽

Data Set ◽

Cancer Data ◽

Public Data

Support vector machine is an effective classification and regression method that uses machine learning theory to maximize the predictive accuracy while avoiding overfitting of data.L2regularization has been commonly used. If the training dataset contains many noise variables,L1regularization SVM will provide a better performance. However, bothL1andL2are not the optimal regularization method when handing a large number of redundant values and only a small amount of data points is useful for machine learning. We have therefore proposed an adaptive learning algorithm using the iterative reweightedp-norm regularization support vector machine for 0 <p≤ 2. A simulated data set was created to evaluate the algorithm. It was shown that apvalue of 0.8 was able to produce better feature selection rate with high accuracy. Four cancer data sets from public data banks were used also for the evaluation. All four evaluations show that the new adaptive algorithm was able to achieve the optimal prediction error using apvalue less thanL1norm. Moreover, we observe that the proposedLppenalty is more robust to noise variables than theL1andL2penalties.

Download Full-text