HTML
-
The precipitation observational data used in this study was CMPAS, three-source fusion gridded real-time precipitation analysis data from the National Meteorological Information Center of China Meteorological Administration, which combines radar, satellite, and ground observational stations. The topographic characteristics of the study area are shown in Fig. 1. To facilitate analysis, the first-level geographical meteorological divisions and specific regional geographical divisions of China were used to identify the first-level meteorological geographical divisions of the Qinghai-Tibet Plateau and those outside of it. The CMPAS data covered the period from January 1, 2017 to May 12, 2023, with a time resolution of one hour and a spatial resolution of 0.05°×0.05°. The dataset was thusreferred to as CMPAS05. The systematic bias in estimating precipitation from radar and satellite data in CMPAS05 was corrected using the probability density function matching method. Subsequently, the Bayesian model averaging method was employed to integrate ground station measurements, radar, and satellite-derived precipitation data (Pan et al.[31]; Li et al.[32]). Wang et al. [33] conducted an assessment of the CMPAS05 error compared with station-measured precipitation in China in 2020, and the results revealed that the average error between CMPAS05 and station observations was –0.003 mm h–1, with an average relative error of –5.41%. Additionally, the root mean square error (RMSE) and correlation coefficient were 0.730 mm h–1 and 0.786, respectively. CMPAS05 has been demonstrated to possess higher accuracy than similar products and any single-source precipitation data, thereby establishing its reliability as a reference for assessing the performance of precipitation forecast models. The model data used in this study was the ECMWF precipitation forecast product issued at 12:00 UTC each day during the same period. It should be noted that the spatial resolution of ECMWF precipitation forecast was 0.125°×0.125°, with a time resolution of 3 hours. Given the availability of model data in practical forecast operations, the ECMWF precipitation forecast data were truncated to 12 hours and processed into 24-h accumulated precipitation corresponding to observational data. Moreover, to facilitate comparison, the ECMWF model data was interpolated to the same spatial resolution as CMPAS05 using bilinear interpolation. It is a consensus in precipitation forecast verification that model forecast scores may gradually decrease with the increase of forecast lead time and precipitation category. Therefore, this paper focused on analyzing the temporal and spatial forecast performance and scale change characteristics of 24-h precipitation forecast, while briefly discussing the precipitation forecast performance for the first five days (144 hours).
-
The characteristics of forecast and observational data themselves, as well as the deviations between forecast and observation values, are crucial in assessing the forecast quality of numerical models. Therefore, the standard deviation (STD), mean error (ME), and root mean square error (RMSE) of precipitation forecast and observation were calculated, and the formulas are as follows:
$$O_{\text {std }}=\sqrt{\frac{1}{n-1} \sum\limits_{i=1}^n\left(o_i-\bar{o}\right)^2}$$ (1) $$F_{\text {std }}=\sqrt{\frac{1}{n-1} \sum\limits_{i=1}^n\left(f_i-\bar{f}\right)^2}$$ (2) $$\mathrm{ME}=\frac{1}{n} \sum\limits_{i=1}^n\left(f_i-o_i\right)$$ (3) $$\mathrm{MAE}=\frac{1}{n} \sum\limits_{i=1}^n\left(\left|f_i-o_i\right|\right)$$ (4) $$\mathrm{RMSE}=\sqrt{\frac{1}{n} \sum\limits_{i=1}^n\left(f_i-o_i\right)^2}$$ (5) In the equations above, f and f represent the forecast and the forecast mean, while o and o represent the observation and the observation mean. n is the total number of matched grid points between the forecast and the observation within the region. Ostd and Fstd represent the STD of the observation and the forecast, respectively.
The contingency table method was used to calculate classic forecast scores, including the Probability of Detection (POD), Success Ratio (SR), bias, TS, and ETS. Na is set as the number of correctly forecasted precipitation events exceeding the threshold, Nb is the number of missed forecasts, Nc is the number of false alarms, and Nd is the number of correct forecasts with precipitation not exceeding the threshold. C1 represents the number of correct forecasts expected by chance. The detailed calculation formulas for the scores are shown in Table 1. Roebber[34] synthesized a figure that displays POD, SR, bias, and TS on one chart, providing a multidimensional and comprehensive characterization of model forecast performance. Therefore, the Roebber diagram was used to evaluate the overall performance of the model's precipitation forecast.
Definition Formula Reference Bias score $\text { Bias }=\frac{N_{\mathrm{a}}+N_{\mathrm{b}}}{N_{\mathrm{a}}+N_{\mathrm{c}}}$ Haiden et al.[35] Threat score $\mathrm{TS}=\frac{N_{\mathrm{a}}}{N_{\mathrm{a}}+N_{\mathrm{b}}+N_{\mathrm{c}}}$ Schaefer [19] Probability of detection $\mathrm{POD}=\frac{N_{\mathrm{a}}}{N_{\mathrm{a}}+N_{\mathrm{c}}}$ Brooks and Doswell[36]; Simmons and Sutter[37] Success ratio $\mathrm{SR}=\frac{N_{\mathrm{a}}}{N_{\mathrm{a}}+N_{\mathrm{b}}}$ Roebber[34] Equitable threat score $\mathrm{ETS}=\frac{N_{\mathrm{a}}-C_1}{N_{\mathrm{a}}+N_{\mathrm{b}}+N_{\mathrm{c}}-C_1}$ Baldwin and Kai [20]; Roebber [34] Table 1. Verification scores and their calculation formulas.
-
The neighborhood method, also known as the fuzzy method or upscaling (UPS) test, has significant advantages in that it can capture the spatial scale of model prediction ability and can inherit some of the scoring indices of traditional verification methods (Weusthoff et al.[38]). This method compares the coverage of observed and predicted threshold events in the neighborhood window to assess the performance of precipitation forecasts at different scales. The main scoring indices for forecast skills at different spatial scales are Fractions Skill Score (FSS) (Roberts and Lean [39]; Pan et al.[40]). The FSS is calculated as follows:
$$\mathrm{FSS}=1-\frac{\frac{1}{N}\left[\sum\limits_N\left\langle P_f\right\rangle_s^2-\sum\limits_N\left\langle P_o\right\rangle_s^2\right]}{\frac{1}{N}\left[\sum\limits_N\left\langle P_f\right\rangle_s^2+\sum\limits_N\left\langle P_o\right\rangle_s^2\right]}$$ (6) where N is the number of neighborhood windows with scale S in the entire region, $\left\langle P_f\right\rangle_s$ and $\left\langle P_o\right\rangle_s$ represent the probabilities of the occurrence of exceeding a given threshold event in the forecast and observation within the neighborhood window of scale S, respectively.
2.1. Data
2.2. Methods
2.2.1. Classic verification
2.2.2. Neighborhood method verification
-
Before presenting the verification results, a brief analysis of precipitation characteristics in China from 2017 to 2022 using CMPAS05 was conducted. The annual average precipitation in China during this period was 664.8, 682.5, 651.3, 706.5, 691.6, and 631.5 mm, showing significant fluctuations. Research has found that the year with the highest precipitation was 2020 and it was related to the record-breaking Meiyu rainfall (Liu et al.[41]), and the lowest precipitation in 2022 was related to the heatwaves and drought in East China (Wang et al.[42]). Compared with the interannual variation of precipitation, the daily average precipitation exhibited greater fluctuations and variations (Fig. 2a). In the year with the highest precipitation, the daily maximum precipitation exceeded 7.0 mm d–1, while in other years, the daily maximum precipitation remained below 6.0 mm d–1. In 2021, the total rainfall ranked the second, but the daily rainfall did not show a significant deviation when compared with other years. The STD of daily precipitation (Fig. 2b) exhibited a strong interannual variation. In 2017 and 2019, the daily maximum value of the precipitation STD exceeded 20.0 mm d–1, while after 2020, the overall precipitation STD remained below 16.0 mm d–1. Daily precipitation anomalies (Fig. 2c) were primarily concentrated between ±3.0 mm d–1. However, during 2019 to 2021, positive anomalies dominated, particularly with a remarkable positive anomaly in 2020. This might be closely related to the above-average annual precipitation. In contrast, negative anomalies in daily precipitation were relatively more pronounced in other years. Positive anomalies in the STD of daily precipitation (Fig. 2d) were larger than negative anomalies. Similarly, the daily variation followed the interannual variation trend of the STD itself. Noticeably higher anomalies in daily precipitation STD were observed before 2019. Overall, the period from 2017 to 2019 exhibited relatively intense oscillations in daily precipitation with more localized variations, which may be related to the quality of radar and satellite products or product assimilation techniques.
Figure 2. Temporal evolution characteristics of CMPAS05 precipitation in China from 2017 to 2022. (a) Daily precipitation, (b) STD of daily precipitation, (c) anomalies of daily precipitation, and (d) anomalies of STD of daily precipitation. The baseline values for calculating anomalies in (c) and (d) are the average daily precipitation and the average STD of daily precipitation from 2017 to 2022.
The spatial distribution shows that the daily average precipitation had very clear geographical differences (Fig. 3a), with a decreasing trend from the southeast coast to the northwest inland. South China and East China were the high-value centers of daily average precipitation in the country, with a maximum exceeding 10.0 mm d–1. Moreover, the daily average precipitation in the southwest was significantly higher and was the high-value center. In the southwest region, the contribution of topography to the daily average precipitation can be clearly seen. The spatial distribution of the CMPAS05 Ostd (Fig. 3b) was basically consistent with that of the daily average precipitation, with two main differences. One is that the numerical value was significantly larger as a whole, where it exceeded 25.0 d–1, indicating that there was not much variation in the average daily precipitation, but there was significant oscillation in daily precipitation. The other is that in some parts of the northern area, although the daily average precipitation was not large, Ostd was exceptionally large. Due to the influence of special topography, the central part of the southwest region and the eastern part of the Tibetan Plateau were the high-value centers of precipitation frequency ≥0.1 mm in the country (Fig. 3c), with a maximum frequency exceeding 0.8. Although the precipitation in South China and the southeast coast was relatively large, the frequency of precipitation ≥0.1 mm was not relatively high compared with that in the southwest region. The high-value centers of precipitation frequency ≥50.0 mm and ≥0.1 mm were significantly different (Fig. 3d), mainly located in the southern part of the southwest region and East China.
-
POD, SR, bias and TS are geometrically related. Roebber[34] synthesized these scores to draw them in one figure. In order to identify the source of forecast errors more deeply, Roebber's comprehensive figure is used here to illustrate the overall performance of the model. It should be noted that Roebber's comprehensive figure was based on standard precipitation category tests, so it used descriptions such as light rain, moderate rain, and torrential rain. However, in the present study, when the spatial performance of model forecast performance was analyzed, two one-way thresholds were used for convenience of expression, which means precipitation ≥0.1 mm and ≥50.0 mm. From the 24-h precipitation Roebber comprehensive figure (Fig. 4a), it can be seen very clearly the evolution of precipitation forecast scores year by year. The main features can be summarized as follows. First, the precipitation bias of different categories gradually adjusted in the direction of smallness. Although there were fluctuations, the overall trend was very obvious. From 2017 to 2020, all categories of precipitation bias were greater than 1. In 2021 and 2022, the bias for precipitation above heavy rain was less than 1. The precipitation of different categories changed from high hit rate and low success rate before 2020 to basically consistent hit rate and success rate after 2021. This also indicates that the main shortcoming of different categories of precipitation before 2020 was false alarms. After 2021, there has been a significant improvement in the false alarm rate for heavy rainfall and stronger categories. However, the false alarm rate for light and moderate rainfall still remained relatively high. In terms of specific values, the bias of light rain and heavy rain in 2017 were 4.76 and 2.35 respectively, and in 2022 they were 1.96 and 0.81 respectively. The analysis of the forecast performance of all samples in six years at different forecast lead times (Fig. 4b) is also worthy of attention. The precipitation forecast bias was higher than 1 at different categories, and the bias of light rain and moderate rain categories was larger, which was a manifestation of high bias in different years. Furthermore, there was little difference in TS for light rain precipitation at different forecast lead times. In Fig. 4b, the positions of the circles for light rain TS at 24 h and 48 h basically overlap with each other, and there is not much change in the range of 72–144 h. However, as the forecast lead time increased, the decrease in TS for moderate rain or higher category precipitation increased significantly. The main reasons are, on one hand, the triggering mechanism of moderate or heavy rainfall often involves the interaction of more meteorological elements such as convective activity and vertical movement of air, and the interactions of these factors have higher uncertainties; on the other hand, the spatial distribution and short duration of heavy rainfall result in loss of accuracy in forecasting mesoscale convective systems with the extension of forecast lead time (Zhang et al.[43]).
Figure 4. Roebber comprehensive diagram of precipitation forecast score of ECMWF mode from 2017 to 2022. (a) 24-h forecast score of each year, and (b) precipitation score of 24-h to 120-h forecast time in all years. The different shapes of points in the figure represent different categories of precipitation. The yellow curve in the figure is an equal TS line, and the black dotted line is an equal bias line.
The precipitation of light rain (≥0.1 mm) and torrential rain (≥50.0 mm) is an important basis for evaluating the performance of model precipitation forecasts as they have a significant impact on weather conditions in production and daily life, and torrential rain may lead to severe meteorological disasters. The ETS and bias scores for monthly precipitation forecasts of ≥0.1 mm and ≥50.0 mm over six years are shown in Fig. 5. The ETS indicate that, whether it was light rain or torrential rain, the ETS were higher in the summer and lower in the winter, indicating a close correlation between ETS and the frequency of precipitation under different climatic backgrounds. In terms of forecast bias, the frequency of ≥0.1 mm precipitation forecasted by ECMWF was relatively consistent with CMPAS05 during summer, but it significantly increased during winter.The distribution of bias scores for ≥50.0 mm precipitation was not explicitly systematic, with occasional sharp increases in individual months. In terms of interannual variations, the bias for ≥0.1 mm precipitation decreased during the winters from 2017 to 2022, reducing false alarms and leading to an increase in ETS. However, there was little change in bias during summer, but ETS increased significantly. Regarding torrential rain, there was still no clear pattern of interannual variation in bias, but both TS and ETS showed an increasing trend year by year (as shown in Fig. 4 and Fig. 5). One possible reason for the increase in ETS despite the lack of significant changes in bias could be the updates and developments in the model, leading to an improvement in the spatial positioning of precipitation forecasts. Another potential reason could be related to changes in precipitation characteristics, such as a decrease in the STD of daily precipitation since 2020. However, these are just speculations and require further in-depth validation.
Figure 5. Monthly forecast score of annual ECMWF precipitation from 2017 to 2022. (a) ETS of daily precipitation ≥0.1 mm, (b) ETS of daily precipitation ≥50.0 mm, and (c) and (d) same as (a) and (b), but for bias.
Figure 6 shows the spatial distribution of precipitation forecast errors. In terms of spatial pattern, both the daily mean precipitation (Fig. 6a) and the STD (Fig. 6b) of ECMWF forecasts were smoother than those of CMPAS05, possibly because ECMWF has a coarser original resolution than CMPAS05, and thus it cannot accurately depict the spatial distribution of sub-grid scale precipitation. In terms of numerical values, the daily mean precipitation forecasted by ECMWF across the country was 3.02 mm d–1, which was much higher than CMPAS05's 1.62 mm d–1. Especially in the western part of the Qinghai-Tibet Plateau and South China, the difference between the forecast and CMPAS05 was over 3.0 mm d–1. Moreover, the difference between the forecast and CMPAS05 was the largest in the southern part of the Qinghai-Tibet Plateau (28°N, 95°E), exceeding 20.0 mm d–1. This is possibly due to the model's inherent limitations and the sparsity of observation stations, as well as the lack of weather radar, resulting in significant errors in CMPAS05 itself. However, previous studies did not provided an analysis of this region's results (Liu et al.[41]), so further research is still needed to determine the true reasons behind the correctness or potential errors in the model's verify results. Comparison of the STD of the forecast and CMPAS05 shows that although the daily mean precipitation in the eastern part of East China and South China was lower than that in the Southwest, the amplitude of precipitation was very large, and the model can well depict the oscillation characteristics of precipitation in different regions. Moreover, although the daily mean precipitation in the eastern part of Northwest China, North China, and Central China was not large, precipitation fluctuation was relatively intense, and the model also had good prediction. However, except for the area where the forecast and CMPAS05 had a significant difference in the southern part of the Qinghai-Tibet Plateau, the precipitation STD of the model forecast was relatively small compared with that of CMPAS05, indicating that the model underestimated the amplitude of precipitation fluctuations. The high-value area of RMSE (Fig. 6d) and the large daily mean precipitation area had good consistency, indicating that the larger the precipitation in a certain area, the greater the deviation between the model forecast and CMPAS05 may be. However, there were exceptions. In Henan Province in Central China and the northern part of East China where the daily mean precipitation was not large, the ME was relatively small, but the RMSE was significantly larger. A small ME and a large RMSE indicate that the error pattern of the forecast relative to CMPAS05 was not obvious, and there were both positive and negative errors. Conversely, in Southwest China, where the ME was large and the RMSE was small, the error in the forecast relative to CMPAS05 may be relatively consistent in direction.
Figure 6. Spatial distribution of errors of ECMWF 24-h precipitation forecast. (a) Daily average precipitation, (b) STD of daily precipitation, (c) ME of daily average precipitation, (d) RMSE of daily average precipitation, (e) bias of daily precipitation ≥0.1 mm, and (f) bias of daily precipitation ≥50.0 mm.
From the perspective of the frequency deviation bias of precipitation ≥0.1 mm approaching 1 (Fig. 6e), the eastern region of China was better than the western region. For most of the central and eastern regions, the forecast frequency of precipitation ≥0.1 mm was basically the same as or slightly less than that of CMPAS05, while the bias in the north, northeast, and south was slightly higher than 1. The forecast bias of precipitation in the Qinghai-Tibet Plateau and some regions was significantly higher, with a maximum of more than 16 times that of CMPAS05. Some studies have shown that due to the lack of observational data and insufficient description of multi-scale orographic impacts in climate/weather models on the Qinghai-Tibet Plateau and its surrounding areas, the physical mechanisms of model impacts on large terrain need further improvement (Wu et al.[2]; Wang et al.[3]). In historical climate simulations, some models have overestimated precipitation by 30%–40% compared with the long-term average. Increasing observational data can significantly improve the forecasting ability of precipitation. Therefore, the frequent overestimation of precipitation forecasts in plateau regions may be an important aspect that current models need to improve. Examining the bias of precipitation ≥50.0 mm, it is observed that the forecast frequency in the northern and western regions surpasses that of CMPAS05, potentially leading to an elevated occurrence of false alarms. Conversely, in the southern and eastern areas, the Bias remains below 1, thereby posing the risk of missed alarms. Furthermore, it is worth noting that in some areas, the frequency of precipitation above 50.0 mm was relatively low (Fig. 3d), and the frequency of the model forecast was even lower (Fig. 6f), with bias much less than 1, indicating missed alarms in these areas.
Figure 7 illustrates the spatial distribution of TS and ETS. The differences in scores among different regions can be clearly observed. The reasons for the differences in TS and ETS are twofold. On one hand, TS reflects the model's ability to forecast precipitation, while which is influenced by the climate state. In terms of precipitation, regions with high probability of climate occurrence generally have higher TS. ETS, on the other hand, removes the influence of climate background and measures the true forecast ability of the model by excluding the number of times precipitation occurs by chance.
Figure 7. Spatial distribution of TS and ETS of ECMWF daily precipitation forecast. (a) and (b) are the TS of daily precipitation ≥0.1 mm and ≥50.0 mm, respectively; (c) and (d) are same as (a) and (b), but for ETS.
The TS for precipitation ≥0.1 mm in the eastern part of China was significantly better than those for the western part, and the southwest, central, and eastern regions had high TS, with the maximum value in the western part of the southwest region exceeding 0.8. Most areas of the central and eastern regions also had TS exceeding 0.6, and the model forecasts and CMPAS05 frequency basically coincided in these areas, with high TS. These results are consistent with the findings of Liu et al.[41]. Combined with bias, it can be inferred that the model had relatively few false alarms and misses in these areas, and the precipitation location forecasts were relatively consistent with CMPAS05. In the south and east region, there were TS exceeding 0.4 for precipitation ≥0.1 mm, and good prediction performance, but the bias in these areas was significantly greater than 1, with more false alarms. For the Qinghai-Tibet Plateau and southern Xinjiang, the bias of model forecasts was significantly high. Therefore, some areas in these regions had TS lower than 0.1. Fig. 7b shows that the TS for precipitation events ≥50.0 mm demonstrated relatively higher performance in the eastern sectors encompassing the northwest, central, northern, and northeastern regions, in comparison with the less favorable results obtained in the southwest and southern regions.
Although the frequency of precipitation ≥50.0 mm was high in the southwest and southern regions (Fig. 3d), the forecast frequency was comparable or slightly less than CMPAS05 (Fig. 6f). Under the condition that both the forecast and observation resolutions were relatively high, slight phase errors may lead to "double penalties, " resulting in a decrease in forecast scores (Zhang et al.[43]). The ETS after removing the influence of climatic background random hit frequency show that the model's forecast ability for the two threshold precipitation events in the ortheastern regions was generally better than that in the southwest and southern regions. This indicates that, in terms of actual forecast ability, the ECMWF model's ability to forecast precipitation is better in North China than in the south, with particularly notable accuracy in Northeast China in contrast to Southwest China. Liu et al.[41] analyzed the ETS of 24-h precipitation forecasts from ECMWF, and their results were generally consistent with the findings of this study. However, due to the use of CMPAS05 in this study, the regional distribution of ETS was more clearly defined, highlighting the differences with greater clarity.
-
On May 11, 2021, ECMWF made significant upgrades to its Integrated Forecasting System (IFS). The upgrade predominantly focused on improving the IFS cycling forecast model, version 47r2. In the previous versions of IFS, a double precision data storage method was utilized, whereas the upgraded model adopted a single precision data storage approach, resulting in reduced memory consumption and improved processing speed. ECMWF acknowledged that this upgrade did not yield significant improvements in terms of mid-range deterministic high-resolution forecasts, but it did offer benefits for mid-range and long-range ensemble forecasting. In order to analyze the impact of the model upgrade on precipitation forecasts in the Chinese region, the data were divided into two phases based on May 11, 2021, to compare forecast performance of precipitation before and after the upgrade. Due to the issue of date span, we did not use a natural annual time division for this purpose. Instead, we divided the data based on May 11 as the reference point. This approach excluded some data from 2017 and included additional data from 2023. The detailed period division is shown in Table 2.
Period Year Start and end dates Period 1 2017 May 12, 2017 to May 11, 2018 2018 May 12, 2018 to May 11, 2019 2019 May 12, 2019 to May 11, 2020 2020 May 12, 2020 to May 11, 2021 Period 2 2021 May 12, 2021 to May 11, 2022 2022 May 12, 2022 to May 11, 2023 Table 2. Definitions of the two periods before and after the ECMWF update.
Although the sample sizes of the two periods were different, and the weather processes and the location of rain may also differ for different years, the sample size was large, and on the other hand, the evaluation scores were mainly based on simultaneous forecasts and calculated using CMPAS05. Therefore, the model's adjustments and changes can still be analyzed by comparison.
The trends of ME and MAE for the two periods are shown in Fig. 8. Numerically, the ME decreased in the second period, with an average of 0.75 mm in the first period decreasing to 0.68 mm in the second period. However, while the overall ME decreased, the single-point ME increased. The maximum absolute values of single-point ME in the first and second periods were 27.3 mm and 33.6 mm, respectively. Studies have shown that relatively smooth elemental fields are beneficial for reducing ME (Mass et al.[44]). Forecasts containing more local information can provide more details, but if they are incorrect, they may introduce noise, which could be the reason for the increase in single-point ME. There were significant spatial differences in the difference of ME between the two periods (Fig. 8a), with positive biases mainly in North China, South China, Northeast China, and the western part of the Qinghai-Tibet Plateau, and negative biases in other areas, although spatial distribution patterns were not clear overall. The trend of MAE was generally consistent with ME, where the spatial average of MAE decreased from 1.95 mm in the first period to 1.78 mm in the second period. However, there were distinct regional features in the spatial differences of MAE between the two periods (Fig. 8b), with positive differences mainly in North China and Northeast China, and negative differences in most other areas. With the changes in ME and MAE into account considered, it seems fair to conclude that the forecasting capability of the second period has significantly improved relative to the first period.
Figure 8. Difference map of errors between the second and first periods and the difference of STDEV between the two periods of forecast and CMPAS05. (a) ME errors (units: mm), (b) MAE errors (units: mm), (c) the difference between Fstd and Ostd for the first period (units: mm), and (d) the same as (c), but for second period.
By analyzing the difference between the forecast and CMPAS05's STD, it can be seen that in the first period (Fig. 8c), the STD of precipitation forecast was lower than CMPAS05 in most parts of the country, indicating that the model underestimated the amplitude or fluctuation of actual precipitation, or underestimated the extreme values of actual precipitation. In previous evaluations of global numerical model precipitation forecasts (Liu et al.[41]), this conclusion was relatively common, indicating that the model overestimated the forecast of small-scale precipitation and underestimated the forecast of extreme precipitation. In the second period, except for the Qinghai-Tibet Plateau, the STD of most parts of the country significantly increased, where the maximum difference of single-point STD was more than 10.0 mm higher than that in the first period. The national average changed from negative to positive (Table 3), indicating that the model may better predict precipitation extremes, but the amplitude is relatively larger than CMPAS05 overall.
Analysis metric Period 1 Period 2 Difference (Period 2–Period 1) ME 1.67 mm 0.59 mm –1.08 mm STD –1.29 mm 1.03 mm 2.32 mm Bias (≥ 0.1 mm) 4.61 1.91 –2.70 Bias (≥ 50.0 mm) 1.56 0.93 –0.63 ETS (≥ 0.1 mm) 0.19 0.25 0.06 ETS (≥ 50.0 mm) 0.19 0.27 0.08 Table 3. Summary of differences in forecast performance between two periods.
From the perspective of bias comparison, the second period bias (Fig. 9b) was closer to the ideal value compared with the first period (Fig. 9a). In the second period, the range of biases in this area reduced but still remained unusually large. One possible reason is the insufficient forecast ability of the model itself. On the other hand, the sparse meteorological stations in the western region and the uncertainty of CMPAS05 grid precipitation analysis data may also contribute to this. The biases in most parts adjusted downward in the second period (Fig. 9e).
Figure 9. Spatial distribution of bias of daily precipitation forecast for two periods. Bias of daily precipitation 0.1 mm for (a) the first and (b) second periods; (c) and (d) the same as (a) and (b), but for daily precipitation ≥ 50.0 mm; (e) difference between bias with daily precipitation ≥ 0.1 mm of two periods; (f) the same as (e), but for daily precipitation ≥ 50.0 mm.
Upon analysis of the biases associated with precipitation events of ≥50.0 mm, two distinct characteristics emerged. Firstly, the biases in the second period were numerically closer to the ideal value of 1 (Fig. 9f). However, there were regional differences in the performance, with biases in North China showing an increasing trend. Secondly, the high-value area of forecasted torrential rain frequency in the eastern part of Northwest China shifted noticeably southeastward (Figs. 9c and 9d). This may be related to the position of the rain belt in different years. Moreover, from the CMPA sample analysis throughout the study period (Fig. 2d), the frequency of precipitation no less than 50.0 mm moved southeastward. This not only improved the forecast score but also better matched the average state of CMPAS05, indicating that the model may be more accurate in predicting the position of weather systems. However, this conclusion still needs to be further analyzed and verified in combination with the forecast of the high-altitude situational field.
ETS is closely related to bias and POD. However, due to the spatial positioning errors in precipitation forecasts, even when the bias reaches the ideal value and is consistent with observations, the better ETS cannot be achieved (Pan et al.[26]). For precipitation, especially for torrential rainfall above a certain category, the forecast frequency is slightly higher than the observation frequency, which is advantageous for meteorological disaster defense and can also improve ETS. As shown in Fig. 10, the ETS for precipitation ≥0.1 mm and ≥50.0 mm in the second period was significantly higher than that in the first period (Fig. 10a), and the maximum ETS increase for precipitation ≥50.0 mm in some individual points exceeding 0.45. Comparison of the two periods show that the improvement in TS in the second period was not only the result of adjusting the bias values. In fact, both the bias values for precipitation ≥0.1 mm (Fig. 10c) and ≥50.0 mm (Fig. 10d) were relatively reduced in the second period compared with the first period (Fig. 8), but the POD for both cases increased by 0.007 and 0.008, respectively, which means the hit rate increased. Simultaneously, with regard to the false alarm rate (not displayed in the figure), the reduction values for both scenarios were –0.057 and –0.001, respectively. This indicates that the improvement in the model forecast score in the second period was not only due to the reduction in bias, which reduced the false alarm rate, but also because the model can more accurately describe the true state of the atmosphere, and has a better forecast performance for weather systems and heavy precipitation, thereby improving the hit rate.
Figure 10. The difference between the second period forecast score and the first period forecast score. (a) and (b) the difference in the ETS of daily precipitation ≥0.1 mm and precipitation ≥50.0 mm respectively; (c) and (d) the difference in the POD of daily precipitation ≥0.1 mm and precipitation ≥50.0 mm respectively.
-
In high-resolution models, small phase errors can lead to false alarms or missed alarms events, resulting in a "double penalty." The neighborhood method is used to calculate the scale at which the model can obtain the best forecast score through upscaling techniques, which reflects to some extent the deviation between the spatial position of the model precipitation forecast and the observation. A square or circular neighborhood window can be used for actual analysis, and studies have shown that the results are insensitive to the choice of window shape (Nachamkin and Schmidt[45]). In this study, a square neighborhood window was used, and the neighborhood radius was expressed in grid points, with a grid point of 1 representing the resolution of the original grid.
It was found that as the scale increases, the FSS approaches 2b/(b2+1), where b is the ratio of the probability of exceeding a given threshold event for forecast and observation over the entire study area (Roberts and Lean[33]). As the neighborhood radius increases, the traditional skill score will have a maximum value (Skok and Hlandnik[46]), indicating an better neighborhood radius. Beyond this better neighborhood radius, the inclusion of excessive unrelated information may occur due to an excessively large neighborhood range. However, this perspective can also be influenced by specific circumstances and contextual factors, so the specific situation may vary.
Consistent with the classical score, the scale characteristics of ECMWF's precipitation forecast were analyzed. We did not compare the score differences caused by scale changes in different forecast lead times; instead, we analyzed the interannual variations of FSS for the 24-h precipitation and the performance of TS at different spatial scales. When the neighborhood radius became 1, the forecast was tested against the model's own resolution. Fig. 11 shows the FSS for the 24-h precipitation forecast at differentgrid scales from 2017 to 2022. From the perspective of FSS for different precipitation categories, two significant characteristics can be observed. Firstly, the FSS for different precipitation categories generally increased from 2017 to 2022, although the FSS for 2022 was lower than that for 2021, which may be related to the lower precipitation in China in 2022. The FSS for light rain (Fig. 11a) and torrential rain (Fig. 11d) increased from 0.486 and 0.06 in 2017 to 0.578 and 0.119 in 2022, respectively, representing an increase of 18.9% and 98.3%. This trend was also consistent for moderate rain (Fig. 11b) and heavy rain (Fig. 11c). Secondly, the FSS were closely related to the neighborhood area. For example, in 2021, when the neighborhood radius increased from 1 to 3, the neighborhood window area increased by a factor of 9, and the increases in FSS for light rain, moderate rain, heavy rain, and torrential rain were 7.6%, 20.9%, 28.2%, and 35.8%, respectively. When the neighborhood radius increased from 17 to 19, with a 25% increase in the neighborhood window area, the FSS increased by 0.05%, 1.5%, 2.6%, and 4.1%, respectively.
Figure 11. FSS of ECMWF daily precipitation forecasting. (a) Light rain, (b) moderate rain, (c) heavy rain, and (d) torrential rain.
As the FSS represents the area coverage error of forecasted precipitation relative to observations at different scales, a higher FSS indicates a greater degree of agreement between the forecast and observation. Based on the FSS and ETS, the probability of events occurring can be calculated at different neighborhood radii, enabling to further improve the precipitation forecast score (Schwartz and Sobash[47]). Therefore, this characteristic of ECMWF precipitation forecast had important practical significance. Actual forecast operations can obtain greater score gains through smaller upscaling transformations on the basis of the original resolution of the model. This was particularly beneficial for heavy rainfall, including torrential rain, as it can effectively reduce missed events and improve the ability of actual disaster weather warnings. However, it should be noted that the FSS or other scores of an upscaled forecast may be higher, but there was a loss in spatial information which may be relevant for some users. It depended on the specific application which balance between skill and spatial resolution better.
The annual variations of ETS for precipitation forecasts exhibited similar characteristics as FSS (Fig. 12). From 2017 to 2022, the ETS fluctuated and rose, but there were significant differences in ETS for different categories of precipitation at different neighborhood spatial scales. For light rain, the ETS remained relatively stable from 2017 to 2020, ranging from 0.17 to 0.22. However, in 2021 and 2022, the ETS noticeably increased and exceeded 0.25. In contrast to light rain, the ETS for moderate rain and heavy rain increased year by year when the neighborhood radius was 1. When the neighborhood radius increased, the ETS for moderate rain showed fluctuations from 2020 to 2022 (Fig. 12b). However, for heavy rain, the ETS increased annually for all neighborhood radii (Fig. 12c). The ETS for torrential rain also showed significant fluctuations (Fig. 12d), reaching a peak in 2021.
Figure 12. ETS of ECMWF daily precipitation forecast on different scales from 2017 to 2022. (a) Light rain, (b) moderate rain, (c) heavy rain, and (d) torrential rain.
There was not consistent better scale for ETS for different categories of precipitation. For light rain from 2017 to 2020, the better neighborhood scale ranged from 17 to 19 grid points. However, in 2021 and 2022, it shifted to a range of 7 to 11 grid points. The better neighborhood scale for moderate rain varied more widely across different years, ranging from 11 to 19 grid points. For heavy rain, the better neighborhood scale was mainly concentrated within 11 to 17 grid points. The interannual variations of ETS for heavy rain were similar to those of light rain, showing significant annual fluctuations. In 2021, which had the highest precipitation among the six years, both light rain and torrential rain exhibited the best ETS. ETS and some classical dichotomous forecast scores had a better neighborhood radius at different spatial scales. This can provide reference for the selection of the better neighborhood window size when calculating neighborhood probabilities using a single model or ensemble probabilities using multiple models, thus enhancing the interpretive application capability of numerical models.