Search documents

Browse topics

Document details

Improving the Quality of PV Plant Performance Analysis by Increasing Data Integrity and Reliability: a Data-Driven Approach Using Machine Learning Techniques
G. Oviedo Hernández, E. Capra, S. Lindig, P.V. Chiantore, D. Moser
PV Systems and Storage – Modelling, Design, Operation and Performance
Subtopic: Operation, Performance and Maintenance of PV Systems
Event: 37th European Photovoltaic Solar Energy Conference and Exhibition
Session: 5DO.2.5
ISBN: 3-936338-73-6
0,00 EUR
Document(s): presentation


In today’s highly competitive O&M PV market, where employing and maintaining highly accurate networks of on-site sensors proves challenging, data-driven solutions play a leading role to turn raw data from the field into reliable actionable insights. PV plant’s data from SCADA and monitoring systems is constantly subject to quality issues and the uncertainty related to it is directly reflected on the quality and reliability of the performance metrics (KPIs) used. In this study, the impact of the quality of the most relevant input parameters (i.e. output energy and irradiation) for the calculation of PV plant KPIs is evaluated and different data cleaning and imputation techniques are benchmarked. The main objective of this work is to improve the quality of PV performance analysis by minimizing the negative effects of using incomplete and/or corrupted time-series as input for the calculation of PV plant KPIs (such as Performance Ratio and Availability). This objective will be achieved through the assessment of different data sources with different intrinsic quality. First, raw data from on-site sensors is compared with satellite-derived data (two different sources will be benchmarked). Special emphasis is given to irradiance sensors (usually pyranometers), being the plane of array (POA) irradiance one of the variables with the greatest impact on performance evaluation. Later, a consistent data quality control is proposed to assess the sensors’ health status to proceed with the corresponding cleaning procedure. At this stage, the concept of ‘virtual sensor’ is introduced, that solves the problem of having incomplete raw data by generating time-series with no missing data that efficiently combine on-site measurements with satellite data. Furthermore, the advantage of performing data imputation using Machine Learning (ML) techniques is demonstrated by applying three good-performing algorithms (Random Forest, Bagging and Gradient Boosting Regressor) to replace missing data with highly accurate predicted values.