A Pattern-Recognition-Based Ensemble Data Imputation Framework for Sensors from Building Energy Systems

Liang Zhang

doi:10.3390/s20205947

A Pattern-Recognition-Based Ensemble Data Imputation Framework for Sensors from Building Energy Systems

Sensors ◽

10.3390/s20205947 ◽

2020 ◽

Vol 20 (20) ◽

pp. 5947

Author(s):

Liang Zhang

Keyword(s):

Pattern Recognition ◽

Missing Data ◽

Energy Systems ◽

Building Energy ◽

Imputation Method ◽

Validation Dataset ◽

Data Imputation ◽

Validation Data ◽

Imputation Methods ◽

Building Energy Systems

Building operation data are important for monitoring, analysis, modeling, and control of building energy systems. However, missing data is one of the major data quality issues, making data imputation techniques become increasingly important. There are two key research gaps for missing sensor data imputation in buildings: the lack of customized and automated imputation methodology, and the difficulty of the validation of data imputation methods. In this paper, a framework is developed to address these two gaps. First, a validation data generation module is developed based on pattern recognition to create a validation dataset to quantify the performance of data imputation methods. Second, a pool of data imputation methods is tested under the validation dataset to find an optimal single imputation method for each sensor, which is termed as an ensemble method. The method can reflect the specific mechanism and randomness of missing data from each sensor. The effectiveness of the framework is demonstrated by 18 sensors from a real campus building. The overall accuracy of data imputation for those sensors improves by 18.2% on average compared with the best single data imputation method.

Download Full-text

Fault detection based on Bayesian network and missing data imputation for building energy systems

Applied Thermal Engineering ◽

10.1016/j.applthermaleng.2020.116051 ◽

2021 ◽

Vol 182 ◽

pp. 116051

Author(s):

Zhanwei Wang ◽

Lin Wang ◽

Yingying Tan ◽

Junfei Yuan

Keyword(s):

Missing Data ◽

Fault Detection ◽

Bayesian Network ◽

Energy Systems ◽

Building Energy ◽

Data Imputation ◽

Missing Data Imputation ◽

Building Energy Systems

Download Full-text

Advanced methods for missing values imputation based on similarity learning

PeerJ Computer Science ◽

10.7717/peerj-cs.619 ◽

2021 ◽

Vol 7 ◽

pp. e619

Author(s):

Khaled M. Fouad ◽

Mahmoud M. Ismail ◽

Ahmad Taher Azar ◽

Mona M. Arafa

Keyword(s):

Missing Data ◽

Missing Values ◽

Imputation Accuracy ◽

Nearest Neighbors ◽

Imputation Method ◽

Data Imputation ◽

K Nearest Neighbors ◽

Missing Data Imputation ◽

K Value ◽

Imputation Methods

The real-world data analysis and processing using data mining techniques often are facing observations that contain missing values. The main challenge of mining datasets is the existence of missing values. The missing values in a dataset should be imputed using the imputation method to improve the data mining methods’ accuracy and performance. There are existing techniques that use k-nearest neighbors algorithm for imputing the missing values but determining the appropriate k value can be a challenging task. There are other existing imputation techniques that are based on hard clustering algorithms. When records are not well-separated, as in the case of missing data, hard clustering provides a poor description tool in many cases. In general, the imputation depending on similar records is more accurate than the imputation depending on the entire dataset's records. Improving the similarity among records can result in improving the imputation performance. This paper proposes two numerical missing data imputation methods. A hybrid missing data imputation method is initially proposed, called KI, that incorporates k-nearest neighbors and iterative imputation algorithms. The best set of nearest neighbors for each missing record is discovered through the records similarity by using the k-nearest neighbors algorithm (kNN). To improve the similarity, a suitable k value is estimated automatically for the kNN. The iterative imputation method is then used to impute the missing values of the incomplete records by using the global correlation structure among the selected records. An enhanced hybrid missing data imputation method is then proposed, called FCKI, which is an extension of KI. It integrates fuzzy c-means, k-nearest neighbors, and iterative imputation algorithms to impute the missing data in a dataset. The fuzzy c-means algorithm is selected because the records can belong to multiple clusters at the same time. This can lead to further improvement for similarity. FCKI searches a cluster, instead of the whole dataset, to find the best k-nearest neighbors. It applies two levels of similarity to achieve a higher imputation accuracy. The performance of the proposed imputation techniques is assessed by using fifteen datasets with variant missing ratios for three types of missing data; MCAR, MAR, MNAR. These different missing data types are generated in this work. The datasets with different sizes are used in this paper to validate the model. Therefore, proposed imputation techniques are compared with other missing data imputation methods by means of three measures; the root mean square error (RMSE), the normalized root mean square error (NRMSE), and the mean absolute error (MAE). The results show that the proposed methods achieve better imputation accuracy and require significantly less time than other missing data imputation methods.

Download Full-text

Missing Data Imputation Method for Autism Prediction

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d4551.018520 ◽

2020 ◽

Vol 8 (5) ◽

pp. 940-944

Keyword(s):

Machine Learning ◽

Missing Data ◽

Missing Values ◽

Imputation Method ◽

Support Vector ◽

Data Imputation ◽

Missing Data Imputation ◽

Imputation Methods ◽

Significant Difference ◽

Friedman's Test

Missing data imputation is essential task becauseremoving all records with missing values will discard useful information from other attributes. This paper estimates the performanceof prediction for autism dataset with imputed missing values. Statistical imputation methods like mean, imputation with zero or constant and machine learning imputation methods like K-nearest neighbour chained Equation methods were compared with the proposed deep learning imputation method. The predictions of patients with autistic spectrum disorder were measured using support vector machine for imputed dataset. Among the imputation methods, Deeplearningalgorithm outperformed statistical and machine learning imputation methods. The same is validated using significant difference in p values revealed using Friedman’s test

Download Full-text

Comparison of Single and MICE Imputation Methods for Missing Values: A Simulation Study

Pertanika Journal of Science and Technology ◽

10.47836/pjst.29.2.15 ◽

2021 ◽

Vol 29 (2) ◽

Author(s):

Nurul Azifah Mohd Pauzi ◽

Yap Bee Wah ◽

Sayang Mohd Deni ◽

Siti Khatijah Nor Abdul Rahim ◽

Suhartono

Keyword(s):

Missing Data ◽

Simulation Study ◽

Missing Values ◽

Imputation Method ◽

Quality Data ◽

Data Imputation ◽

Sample Sizes ◽

Imputation Methods ◽

Mean Imputation ◽

Simulation Results

High quality data is essential in every field of research for valid research findings. The presence of missing data in a dataset is common and occurs for a variety of reasons such as incomplete responses, equipment malfunction and data entry error. Single and multiple data imputation methods have been developed for data imputation of missing values. This study investigated the performance of single imputation using mean and multiple imputation method using Multivariate Imputation by Chained Equations (MICE) via a simulation study. The MCAR which means missing completely at random were generated randomly for ten levels of missing rates (proportion of missing data): 5% to 50% for different sample sizes. Mean Square Error (MSE) was used to evaluate the performance of the imputation methods. Data imputation method depends on data types. Mean imputation is commonly used to impute missing values for continuous variable while MICE method can handle both continuous and categorical variables. The simulation results indicate that group mean imputation (GMI) performed better compared to overall mean imputation (OMI) and MICE with lowest value of MSE for all sample sizes and missing rates. The MSE of OMI, GMI, and MICE increases when missing rate increases. The MICE method has the lowest performance (i.e. highest MSE) when percentage of missing rates is more than 15%. Overall, GMI is more superior compared to OMI and MICE for all missing rates and sample size for MCAR mechanism. An application to a real dataset confirmed the findings of the simulation results. The findings of this study can provide knowledge to researchers and practitioners on which imputation method is more suitable when the data involves missing data.

Download Full-text

A Two-stage Deep Autoencoder-based Missing Data Imputation Method for Wind Farm SCADA Data

IEEE Sensors Journal ◽

10.1109/jsen.2021.3061109 ◽

2021 ◽

pp. 1-1

Author(s):

Xin Liu ◽

Zijun Zhang

Keyword(s):

Missing Data ◽

Wind Farm ◽

Imputation Method ◽

Data Imputation ◽

Two Stage ◽

Missing Data Imputation

Download Full-text

A real-time abnormal operation pattern detection method for building energy systems based on association rule bases

Building Simulation ◽

10.1007/s12273-021-0791-x ◽

2021 ◽

Author(s):

Chaobo Zhang ◽

Yang Zhao ◽

Yangze Zhou ◽

Xuejun Zhang ◽

Tingting Li

Keyword(s):

Real Time ◽

Association Rule ◽

Detection Method ◽

Energy Systems ◽

Building Energy ◽

Pattern Detection ◽

Operation Pattern ◽

Building Energy Systems ◽

Rule Bases ◽

Abnormal Operation

Download Full-text

Generic visual data mining-based framework for revealing abnormal operation patterns in building energy systems

Automation in Construction ◽

10.1016/j.autcon.2021.103624 ◽

2021 ◽

Vol 125 ◽

pp. 103624

Author(s):

Chaobo Zhang ◽

Yang Zhao ◽

Tingting Li ◽

Xuejun Zhang ◽

Meriem Adnouni

Keyword(s):

Data Mining ◽

Energy Systems ◽

Building Energy ◽

Visual Data ◽

Visual Data Mining ◽

Building Energy Systems ◽

Abnormal Operation

Download Full-text

A Discrete Missing Data Imputation Method Based on Improved Multi-layer Perceptron

10.1109/idaacs53288.2021.9661028 ◽

2021 ◽

Author(s):

Chunyan Yan ◽

Jianyu Yuan ◽

Zhiwei Ye ◽

Zhiyong Yang

Keyword(s):

Missing Data ◽

Imputation Method ◽

Data Imputation ◽

Multi Layer Perceptron ◽

Missing Data Imputation

Download Full-text

Delta-T-based operational signatures for operation pattern and fault diagnosis of building energy systems

Energy and Buildings ◽

10.1016/j.enbuild.2021.111769 ◽

2021 ◽

pp. 111769

Author(s):

Taesung Lee ◽

Sungmin Yoon ◽

Kwanghee Won

Keyword(s):

Fault Diagnosis ◽

Energy Systems ◽

Building Energy ◽

Operation Pattern ◽

Building Energy Systems

Download Full-text

Random Forest Missing Data Imputation Methods: Implications for Predicting At-Risk Students

Advances in Intelligent Systems and Computing - Intelligent Systems Design and Applications ◽

10.1007/978-3-030-49342-4_29 ◽

2020 ◽

pp. 298-308

Author(s):

Bevan I. Smith ◽

Charles Chimedza ◽

Jacoba H. Bührmann

Keyword(s):

At Risk ◽

Missing Data ◽

Random Forest ◽

At Risk Students ◽

Data Imputation ◽

Missing Data Imputation ◽

Imputation Methods

Download Full-text