HTML
-
In this paper, Yangjiang City (21.5°–22.7°N, 111.3°–112.4°E), Guangdong Province, was selected as the research area. Located on the southwestern coast of Guangdong Province, Yangjiang experiences a typical subtropical monsoon climate, with long durations of sunshine and abundant heat. The terrain of Yangjiang City is dominated by mountainous and hilly areas, with mountains in its eastern, western, and northern parts, while its southern part faces the South China Sea. The four PV power stations in Yangjiang City, selected as targets in this study, are all situated in relatively flat terrain, with an altitude of less than 30 m. The distribution of the four PV power stations is shown in Fig. 1.
The observational data used were GHI data from the four meteorological stations at the center of each PV power station. These observational data were chosen as they can better represent the GHI observations at each PV power station. The duration of the observation sequence was the entire year of 2022, and the observation frequency was 15 minutes. The GHI observational data obtained from each PV power station were subjected to quality control procedures according to the Solar Energy Resource Assessment Method GB/T 37526–2019 [27], and any data exceeding 1400 W m–2 and data remaining unchanged for longer than five consecutive hours were treated as default values. After the quality control procedure, 53.24%, 96.29%, 96.28%, and 96.16% of the total observational data from the four PV power stations were retained. Notably, data from PV station 1 in April, May, August, September, and December were not available, but the quality of data during the remainder of the year was much better.
The numerical models adopted in this paper were the China Meteorological Administration Wind Energy and Solar Energy Prediction System (CMA-WSP), the Mesoscale Weather Numerical Prediction System of China Meteorological Administration (CMA-MESO), the China Meteorological Administration Regional Mesoscale Numerical Prediction System-Guangdong (CMA-GD), and the Weather Research and Forecasting Model-Solar (WRF-SOLAR). The CMA-WSP, CMA-MESO, and CMA-GD models are wind and solar numerical forecasting models independently developed by the CMA and operated in real time, while the WRF-SOLAR model is an important part of the National Center for Atmospheric Research (NCAR) solar power forecasting system. The WRF-SOLAR model was designed specifically to meet solar forecasting demands, and it is also operated in real time by the CMA. The initial time of these four models is 12:00 UTC. For the PV power stations, since a 24-h ahead forecast is required for assessment (i.e., 24 hours starting at 00:00 on the forthcoming day, Beijing time, BJT), the model forecast period was therefore chosen as 52 hours. Moreover, the forecast sequence duration was still the whole year of 2022. The forecast details of each model used are provided in Table 1.
Model Temporal resolution (min) Spatial resolution (km) Forecast element Forecast period (h) CMA-WSP 15 9 GHI 52 CMA-MESO 60 3 CMA-GD 15 3 WRF-SOLAR 15 9 Table 1. Details of the numerical prediction models.
In this research, the proximal point algorithm was applied to extract GHI forecast data at the target points (the locations of the four PV power stations) from the four numerical models listed in Table 1 for forecast dataset development.
-
To conduct GHI forecast multimodel ensemble experiments, the dynamic variable weight ensemble method was adopted for modeling station-by-station forecasts. The key technique was a weighted bias-removed ensemble, where the weights were determined by the reciprocal of the prediction errors of each numerical model within a certain training period that dynamically slid with the rolling updates of the forecasts (Liu et al. [28]). First, with the use of the GHI observations of the PV power stations, deviation calculations and dynamic deviation corrections of different numerical model forecasts were performed. Then, through statistical analysis of the deviation values, the weight of each numerical model was determined, and a dynamic variable weight multimodel ensemble forecast model could be established for each PV power station. In this paper, the training period was set to 10 days, namely, by applying the error analysis and weight coefficient data of the previous ten days, rolling correction and daily ensemble forecasting for a specific day could be fulfilled, and the specific steps are as follows:
By calculating the forecast errors of each numerical model forecast for each PV power station, we set the training period to 10 days:
$$ \mathrm{BE}_{m i}=F_{m i}-\mathrm{O}_i $$ (1) where Fmi is the GHI forecast value at the i-th forecast time of the m-th numerical model for a single power station, Oi denotes the corresponding GHI observation value, and BEmi is the error of the i-th forecast time of the m-th numerical model.
Sequencing the errors of the m-th numerical model in ascending order at all forecast times during the training period for a single power station, we applied the percentile method to calculate the deviation in the GHI forecasts of each numerical model:
$$ \mathrm{BES}_m=\frac{\mathrm{BE}_{m 0.25}+2 \mathrm{BE}_{m 0.5}+\mathrm{BE}_{m 0.75}}{4} $$ (2) where BESm is the systematic forecast deviation during the training period of the m-th numerical forecast model.
The forecasts of each numerical model could be corrected as follows:
$$ \mathrm{FF}_{m i}=F_{m i}-\mathrm{BES}_m $$ (3) where FFmi is the forecast result at the i-th forecast time of the m-th numerical model after deviation correction.
According to the statistical results of the forecast errors of each numerical model during the training period, the ensemble weight of each numerical model could be calculated as follows:
$$ W_m=\frac{\frac{1}{A_m}}{{\sum}_{m=1}^M \frac{1}{A_m}} $$ (4) where M is the number of numerical models involved in the ensemble, and Am is the sum of the AE values during the training period of the m-th numerical prediction model.
The dynamic variable weight multimodel ensemble model could be established as follows:
$$ Y_i=\sum\limits_{m=1}^M W_m \times \mathrm{FF}_{m i} $$ (5) where Yi is the multimodel ensemble GHI forecast result at the i-th forecast time.
Given that the temporal resolution of the CMA-MESO model is 1 hour and that of the other three numerical models is 15 minutes, the four models were assembled for each hour of the day, while three models—CMA-WSP, CMA-GD, and WRF-SOLAR—were assembled for the remainder of the time.
The MAE, RMSE, R, and AEmax (as the temporal resolution was 15 minutes, there were 96 forecast results per day, corresponding to 96 AEs, and the maximum one was defined as AEmax) were used for evaluation. All the evaluations were independent. These metrics could be obtained as follows:
$$ \mathrm{AE}=\left|F_i-O_i\right| $$ (6) $$ \mathrm{MAE}=\frac{1}{n} \sum\limits_{i=1}^n\left|\left(F_i-O_i\right)\right| $$ (7) $$ \mathrm{RMSE}=\sqrt{\frac{1}{n} \sum\limits_{i=1}^n\left(F_i-O_i\right)^2} $$ (8) $$ R=\frac{{\sum}_{i=1}^n\left(F_i-\overline{F}_i\right)\left(O_i-\overline{O}_i\right)}{\sqrt{{\sum}_{i=1}^n\left(F_i-\overline{F}_i\right)^2 {\sum}_{i=1}^n\left(O_i-\overline{O}_i\right)^2}} $$ (9) where Oi is the observation value, Fi is the forecast value, n is the total number of samples, Oi is the average value of the observation samples, and Fi is the average value of the forecast samples.
In this study, the GHI forecast error was evaluated for the region (the four PV power stations as a whole) and for individual stations. The evaluation period ranged from 0 to 24 hours a day ahead (i.e., 29–52 hours of each numerical model forecast), and the evaluation duration was the entire year of 2022.
2.1. Data
2.2. Methods
-
Table 2 provides the results of the performance evaluation of the monthly GHI forecasts for the region. Notably, the CMA-GD model exhibited the smallest error from January to December among the four numerical models. The average MAE was 134 W m–2, and the RMSE was 204 W m–2 for 12 months. On the other hand, the CMA-WSP model attained the highest R value, and the average value for 12 months reached 0.86. There were differences in each month: the CMA-WSP model performed best in January, February, March, and December; the CMA-GD model performed best in April, May, June, July, August, and November; the CMA-MESO model performed best in September; the WRF-SOLAR model performed best in October. The forecast errors of the multimodel ensemble (referred to as ENSEMBLE in the figures and tables) were significantly reduced: the average MAE reached 119 W m–2, and the RMSE was 181 W m–2 for 12 months, which were 11.19% and 11.27% lower, respectively, than those of the optimal numerical models, while R slightly increased to 0.88. The forecast performance of the multimodel ensemble obviously differed each month. The smallest errors occurred from January to September and also in November, and the MAE and RMSE values decreased by 0.97%–15.96% and 3.31%–18.40%, respectively, compared to those of the monthly optimal numerical model forecasts. However, in October, the forecast error after the multimodel ensemble application was slightly greater than that of the optimal numerical model forecast. In December, the optimal numerical model and multimodel ensemble attained similar forecast performance levels. The improvement in R obtained by the multimodel ensemble was not obvious compared with that of the optimal numerical model forecast, and the values ranged from only 1.12% to 2.47% in 8 out of 12 months.
Month MAE (W m–2) RMSE (W m–2) R ENSE MBLE CMA WSP CMA MESO CMA GD WRF SOLAR ENSE MBLE CMA WSP CMA MESO CMA GD WRF SOLAR ENSE MBLE CMA WSP CMA MESO CMA GD WRF SOLAR 1 105 122 131 135 134 180 207 211 216 225 0.79 0.79 0.76 0.72 0.77 2 81 91 106 101 113 139 154 177 163 193 0.90 0.89 0.83 0.86 0.84 3 114 130 132 142 165 176 209 208 217 262 0.87 0.85 0.83 0.81 0.82 4 109 150 121 110 147 164 237 192 170 233 0.92 0.89 0.90 0.92 0.90 5 119 156 151 137 172 184 244 240 217 269 0.86 0.84 0.78 0.81 0.83 6 127 156 158 152 180 193 235 246 237 270 0.85 0.83 0.76 0.77 0.82 7 141 150 142 141 151 202 232 217 209 236 0.91 0.89 0.90 0.90 0.90 8 140 166 161 166 171 206 249 251 249 260 0.83 0.82 0.77 0.77 0.81 9 136 160 131 147 159 199 247 203 226 243 0.89 0.87 0.90 0.86 0.88 10 141 122 130 132 103 191 195 185 180 163 0.94 0.93 0.94 0.94 0.95 11 103 129 126 118 140 161 200 186 174 211 0.86 0.80 0.82 0.84 0.83 12 107 109 113 129 117 172 171 168 191 181 0.88 0.87 0.88 0.84 0.87 Mean 119 137 133 134 146 181 215 207 204 229 0.88 0.86 0.84 0.84 0.85 Table 2. Comparison of the performance of the monthly multimodel ensemble and numerical model forecasts.
The forecast errors were directly related to the observed GHI values, which exhibited obvious diurnal variations. To analyze the performance of the multimodel ensemble forecasts at different GHI intensities, we considered three GHI intervals in this study: (0, 400), [400, 700], and (700, 1500). Fig. 2 shows the results of the intensity level evaluation of regional GHI forecasts. The performance of each numerical forecast model varied within different intervals. The multimodel ensemble mainly improved GHI forecasting performance below 700 W m–2. In particular, at GHI levels lower than 400 W m–2, the forecast error of the CMA-GD model, among four numerical models, was the smallest, but the multimodel ensemble error was smaller than that of the CMA-GD model, with a 7.56% to 28.28% reduction in the RMSE value. The CMA-GD and CMA-WSP model forecasts exhibited advantages within the GHI range of 400 to 700 W m–2, while the multimodel ensemble forecast could reduce the RMSE by 4.72% to 26.10% in 9 out of 12 months. At GHI levels greater than 700 W m–2, the forecasts of each numerical forecast model were significantly smaller than the observations. The WRF-SOLAR model had the lowest deviation, and therefore its forecast error was the smallest. However, the RMSE increased after the application of multimodel ensemble. The insufficient samples within intervals above 700 W m–2 (Table 3), coupled with the larger forecast error and fluctuation amplitude of the numerical model, partly contributed to an increased or even reversed systematic correction deviation in the process of correcting each numerical model with the rolling deviation over the previous ten days, resulting in an increase in the forecast error (Eq. 3) after multimodel ensemble implementation.
Figure 2. Comparison of the monthly multimodel ensemble and numerical model forecasts at different intensities. RMSE was used for evaluation, and the different GHI intensities are (a) 0 < GHI<400 W m–2, (b) 400 W m–2≤GHI≤700 W m–2, and (c) 700 W m–2<GHI<1500 W m–2.
GHI (W m–2) Sample size Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec (0, 400) 9653 7311 9121 5104 6563 9244 7765 5950 4591 5868 8893 3293 [400, 700] 1739 950 1969 1389 1268 2279 2498 1253 1484 2809 1795 744 (700, 1500) 489 677 1636 1500 1111 1922 3166 1295 1622 3007 698 577 Table 3. Monthly sample size at different GHI intensities.
-
The GHI value at midday, the key period for the PV power station output, is high, and the accuracy of GHI forecasting is more important for this period. To examine the performance of the multimodel ensemble forecasts at different times of one day, the regional GHI forecasts were compared over time. Fig. 3 shows the evaluation results of the RMSE for each month. It suggests that the improvement effect of the multimodel ensemble was the greatest at midday when the RMSE reached the highest value. Meanwhile, the multimodel ensemble performance varied month by month. In months with larger numerical model forecast errors, the improvement effect of the multimodel ensemble was greater, but the multimodel ensemble yielded a limited improvement effect in months with smaller numerical model forecast errors. Specifically, compared with the optimal numerical model forecast, the multimodel ensemble provided the best GHI forecast improvement in March, May, June, July, and August, and the RMSE of the multimodel ensemble forecast could be reduced by 33–85 W m–2 from 02:00 UTC to 07:00 UTC. In January, February, April, September, and November, the RMSE values of the multimodel ensemble forecasts were slightly smaller than those of the optimal numerical model forecasts. Nevertheless, the performance of the multimodel ensemble forecast in October and December was worse. In general, for the key output period at midday, the multimodel ensemble forecast attained the lowest RMSE and the most stable effect. Therefore, the effectiveness of the multimodel ensemble in improving the GHI forecast accuracy during the key period of the PV output could be verified.
-
Figure 4 shows the forecast errors for each PV power station. In the scope of the numerical model, at PV station 1, the forecast error of the CMA-WSP model was the smallest, and the R of the WRF-SOLAR model was the highest. At PV station 2, the forecast error of the CMA-MESO model was the smallest, and the R of the CMA-WSP model was the highest. At PV station 3, the forecast errors of the CMA-GD and CMA-WSP models were relatively small, and the R of the CMA-WSP model was the highest. At PV station 4, the forecast errors of the CMA-GD and CMA-MESO models were relatively small, and the R of the CMA-WSP model was the highest. Moreover, the numerical model forecast error at PV station 2 was the smallest, indicating that the numerical model prediction capability significantly differed at various PV power stations.
Figure 4. Comparison of the multimodel ensemble and numerical model forecasts at each PV power station. (a) MAE, (b) RMSE, and (c) R were used for evaluation.
To analyze the applicability and duplicability of the multimodel ensemble model at the different PV power stations, the ensemble effect at each PV power station for the whole year was evaluated. The results illustrated that the multimodel ensemble improved the forecast effect at these four PV power stations to different extents, as reflected by the smallest forecast errors and highest R values. Compared with the optimal numerical model forecasts, the MAE and RMSE values of the multimodel ensemble at the four PV power stations decreased by 13–19 W m–2 and 20–30 W m–2, respectively. The R values at the three stations, except for PV station 1, increased by 0.01–0.02.
-
The fluctuation in the PV power output is the greatest challenge for grid integration. In addition to the overall monthly forecast accuracy, power stations and power grids notably consider the transition in the PV power curve (Yu et al. [29]), which can be assessed through the monthly average value of the AEmax of GHI forecasts. The assessment can reflect the precision of the starting/ending time and the magnitude of weather transition determined by numerical models, especially GHI fluctuations due to cloud cover variations, which constitutes a complex issue and bottleneck in the high-temporal-resolution numerical model forecasting. Fig. 5 shows the evaluation results for PV station 4 as an example. The monthly mean of AEmax of each numerical model forecast is usually very large. The AEmax of the CMA-WSP model ranged from 293 W m–2 to 621 W m–2, that of the CMA-MESO model ranged from 281 W m–2 to 504 W m–2, that of the CMA-GD model ranged from 327 W m–2 to 607 W m–2, and that of the WRF-SOLAR model ranged from 314 W m–2 to 669 W m–2. The AEmax exhibited obvious seasonal variation, with high values in summer and low values in winter. Although the AEmax of the CMA-MESO model was the smallest, it could not reflect GHI fluctuations within an hour and therefore led to a limited reference since the temporal resolution of this model is 1 hour. The multimodel ensemble facilitated a reduction in the AEmax to 313–516 W m–2. Moreover, compared with the monthly optimal numerical model forecasts (except for the CMA-MESO), the AEmax from January to September could be reduced by 5.72%–15.90%, while it increased by 2.04%–7.46% from October to December. It should be emphasized that the multimodel ensemble generated a positive effect overall.
-
The accuracy of GHI forecasts is significantly affected by weather conditions. On sunny days, the forecast accuracy is greater, while on cloudy days, due to the uncertainty in cloud cover, the forecasting difficulty increases, and the forecast accuracy decreases accordingly (Da et al. [30]). The forecasts and observations of PV station 4 from May 8 to 12 and from 21:00 UTC to 12:00 UTC the next day were compared to analyze the performance of the multimodel ensemble under variable cloudy weather conditions. Cloud cover observational data were obtained from the Yangjiang National Meteorological Station. The comparison results are shown in Fig. 6. From May 8 to 12, the daytime was mainly cloudy, the cloud cover fluctuated between 80% and 100%, and the GHI fluctuation was significant, which led to notable differences in the magnitude and fluctuation phase between the numerical model forecasts and observations. The forecasts of the CMA-WSP and WRF-SOLAR models were significantly greater than the observations, and the errors could reach above 500 W m–2 at midday. The forecasts of the CMA-GD and CMA-MESO models were lower than the observations, and the variation trend greatly differed from real situation. Although the forecasts of the CMA-MESO model were close to the observations, they could not reflect the refined GHI change due to the lack of 15-minute forecasts. Therefore, the CMA-MESO model could only offer a basic reference. Compared with the numerical model forecasts, the multimodel ensemble forecasts were closer to the observations and provided a better performance with regard to GHI fluctuations. On May 9 and 10, all numerical model forecasts failed to accurately capture GHI changes, and the multimodel ensemble achieved the optimal forecast improvement effect. Although the multimodel ensemble was not the optimal forecast at every moment, it remained the most stable and reliable from a long-term perspective.