An increasing number of photovoltaic (PV) systems are being installed worldwide and residential sector is responsible for a large part of this growth. Small scale PV systems do not have complex measuring devices and their breakdowns are not spotted immediately by the system owner
...
An increasing number of photovoltaic (PV) systems are being installed worldwide and residential sector is responsible for a large part of this growth. Small scale PV systems do not have complex measuring devices and their breakdowns are not spotted immediately by the system owners. This might lead to prolonged time without generating power and creating both financial loss and environmental damage. This thesis presents a method of PV yield nowcasting laying foundations for remote monitoring. Early detection of faults is the first step towards eliminating the described issues. In this project four machine learning models for predicting solar yield were developed: ElasticNet, Polynomial Regression, Random Forest, and Extreme Gradient Boosting (XGBoost). The models were created both for daily and hourly data sets, as some inverters can log daily yields only. In both cases, the utilized data set consisted of data for the time between July 1st, 2018 and June 30th, 2019 and corresponding to 1,102 PV systems which is five times more than the largest data set studied in the found literature. The average PV system size in the data set is 4.44 kWp. Utilized inputs next to weather data and previous yields included shading factor describing fraction of direct light unable to reach PV system due to surrounding obstacles. Calculation of shading facor was based on 360⁰ pictures taken at the site. XGBoost algorithm turned out to be the most suitable for the task of PV yield nowcasting obtaining RMSE of 1.48 kWh and MAE of 0.877 kWh for hourly data aggregated to daily values and evaluated on future time steps. Currently used commercial software of Solar Monkey has RMSE equal 2.237 kWh and MAE equal 1.5 kWh. XGBoost model trained on daily data obtained RMSE 1.185 kWh and MAE 0.698 kWh outperforming hourly model most likely due to utilization of Hidden Markov Model for data cleaning. Next to overall performance, per system metrics were calculated for the hourly XGBoost. Mean individual RMSE for previously seen systems is 1.656 kWh while for unseen systems it equals 1.666 kWh. This means the model scales well to previously unseen systems and implies that its parallelized version is not necessary. Also, the model's learning saturates after seeing data corresponding to one year and 278 PV systems. Precalculation of GPOA worsened performance with respect to the model utilizing GHI. Hourly XGBoost has hourly RMSE of 0.281 kWh under clear sky and 0.377 kWh under partly cloudy sky which indicates it is more mistaken for cloudy conditions. This could be caused by low quality of cloud coverage data. The model also has large relative errors for small irradiance values which occur mostly in January and December, as well as just after sunrise and just before sunset. This issue is caused by using squared error as loss function during model training. Despite these shortcomings, the conclusive results recommend industrial implementation of the developed model.