AUTHOR=Shen Long , He Xin , Liu Mingqun , Qin Risheng , Guo Cheng , Meng Xian , Duan Ruimin TITLE=A Flexible Ensemble Algorithm for Big Data Cleaning of PMUs JOURNAL=Frontiers in Energy Research VOLUME=9 YEAR=2021 URL=https://www.frontiersin.org/journals/energy-research/articles/10.3389/fenrg.2021.695057 DOI=10.3389/fenrg.2021.695057 ISSN=2296-598X ABSTRACT=
With an increasing application of Phase Measurement Units in the smart grid, it is becoming inevitable for PMUs to operate in severe conditions, which results in outliers and missing data. However, conventional techniques take excessive time to clean outliers and fill missing data due to lacking support from a big data platform. In this paper, a flexible ensemble algorithm is proposed to implement a precise and scalable data clean by the existing big data platform “Apache Spark.” In the proposed scheme, an ensemble model based on a soft voting approach utilizes principal component analysis in conjunction with the K-means, Gaussian mixture model, and isolation forest technique to detect outliers. The proposed scheme uses a gradient boosting decision tree for each extracted feature of PMUs for the data filling process after detecting outliers. The test results demonstrate that the proposed model achieves high accuracy and recall by comparing simulated and real-world Phase measurement unit data using the local outlier factor algorithm and Density-Based Spatial Clustering of Application with Noise (DBSCAN). The mean absolute error, root mean square error and R2-score criteria are used to validate the proposed method’s data filling results against contemporary techniques such as decision tree and linear regression algorithms.