HTML
-
Daily mean temperature observations from 14 target stations from 2005 to 2014 are selected, and the numbers of the neighboring stations are shown in Fig. 1. The 14 target stations are abbreviated as follows: Beihai (Bh), Chengdu (Cd), Guangzhou (Gz), Haikou (Hk), Hohhot (Hh), Jinghong (Jh), Lhasa (Ls), Lanzhou (Lz), Miyun (My), Mohe (Mh), Nanjing (Nj), Taiyuan (Ty), Urumqi (Um) and Changchun (Cc). The neighboring stations selected are within 100 km or 200 km from the target station. Fig. 2 shows that there is a significant difference in altitude among Ls, Lz and Um, where Bh, Gz, Hk and Jh are in coastal regions and Hh, Jh, Ls and Lz in high altitude regions. The distribution of surface automatic weather stations is determined by the environment and economy of China. The 14 target stations selected in this article are basically located in provincial capitals, which are economic and political centers. They are distributed in different provinces in China, covering different climates and geographical conditions of China.
The dataset used in this article is compiled by the Chinese National Meteorological Center and quality controlled by some basic quality control methods. In order to test the performance of the SRF method, artificial errors are randomly inserted into the observations from the target station (Hubbard[7]). Approximately 3% of observations are selected for the insertion of random errors, and the formula is shown in
$$ k_{\lambda}=s_{\lambda} \cdot p_{\lambda} $$ (1) where k is the value of the insertion error, s is the standard deviation of the observations from the target station, λ is the position for error insertion, and p is a random number with a uniform distribution with a range of ±3.5.
-
The spatial regression test (SRT) assigns weight according to the root-mean-square error between the target station and each of the neighboring stations. For each neighboring station, a linear regression based on an estimate is used:
$$ x_{i}=a_{i}+b_{i} \cdot y_{i} $$ (2) where xi is the estimate of the target station, the data of ith neighboring station (i = 1, 2, …, n) is yi, ai and bi are the regression coefficients, and the weighted estimate x' is obtained by using the standard error of estimate s:
$$ x^{\prime}=\sqrt{\frac{\sum_{i=1}^{N} x_{i}^{2} \cdot s_{i}^{-2}}{\sum_{i=1}^{N} s_{i}^{2}}} $$ (3) where N is the number of neighboring stations used. Then, the weighted standard error of estimate s' is calculated as follows:
$$s^{\prime-2}=N^{-1} \sum_{i=1}^{N} s_{i}^{-2} $$ (4) The confidence intervals are formed as follows:
$$ x^{\prime}-f s^{\prime} \leqslant x \leqslant x^{\prime}+f s^{\prime} $$ (5) where f is the quality control parameter. If the relation in (5) holds, the observations pass the test.
-
The random forest (RF) method belongs to the category of ensemble learning. Schapire developed the probably approximately correct (PCA) learning model, which evaluates strong and weak learning concepts (Borchmann et al.[33]). Random forests combine multiple weak-classifier decision trees with a strong classifier, which is much easier than searching for a strong classifier directly. A random forest is a combination of tree predictors, such that each tree depends on the values of a random vector sampled independently with the same distribution for all trees in the forest (Gomes et al.[34]). For each tree, the random forest selects the training set by the self-help sampling method (Bootstrap), the test set is the samples which are not extracted, and the error estimation is based on the out-ofbag (OOB) estimation. The random forest method can be used for classification and regression. When the dependent variable Y is categorical variable, the model is classified; when the dependent variable Y is a continuous variable, the model is regression. The independent variable X can be a mixture of multiple continuous variables and multiple categorical variables.
Given an ensemble of classifiers $h_{1}(x), h_{2}(x) \cdots h_{k}(x) $, and with the training set drawn randomly from the distribution of the random vector Y, X, define the margin function as formula (6):
$$ mg(X, Y) = a{v_k}I\left( {{h_k}(K) = Y} \right) - {\max\limits_{j \ne Y}}a{v_k}I\left( {{h_k}(K) = j} \right) $$ (6) where I (∙) is the indicator function. The margin measures the extent to which the average number of votes at Y, X for the right class exceeds the average vote for any other class. The larger the margin, the more confidence in the classification. The generalization error is given by:
$$ P E^{*}=P_{X, Y}(m g(X, Y) <0) $$ (7) where the subscripts Y, X indicate that the probability is over the Y, X space. With the increasing number of trees during the construction of the RF models, the generalization error of almost all sequences converges to an upper limit. The upper limit is given by:
$$ P_{X, Y}\left(P\left(h_{k}(X)=Y\right)-\max\limits_{j \neq Y} P\left(h_{k}(K)=j\right) <0\right) $$ (8) For a random forest, the upper limit of the generalization error is a method to measure the accuracy of a single classifier and the dependency between the classifiers. An upper bound for the generalization error is given by:
$$ P E^{*} \leqslant \bar{\rho}\left(1-s^{2}\right) / s^{2} $$ (9) where $ \bar{\rho}$ is the mean value of the correlation. Although the bound is likely to be loose, it fulfills the same suggestive function for random forests as VC-type bounds do for other types of classifiers. It shows that the two ingredients involved in the generalization error for random forests are the strength of the individual classifiers in the forest, and the correlation between them in terms of the raw margin functions.
Generally, no overfitting situation, strong ability to resist noise and estimate the importance of features are the advantages of RF. These advantages are mainly due to the randomness of the selection of the samples and the features. However, the RF method still needs to be upgraded, for example, there is no suitable solution to the choice of mtry in random forests.
-
The RF method grows an ensemble of trees, and each node split is selected randomly from among the best splits; hence, it has strong generalization ability and avoids over-fitting. In the surface meteorological observations, not all neighboring stations have a strong correlation with target stations, and the neighboring stations with weak correlation are equivalent to the weak input of the datasets. Data types with many weak inputs are difficult for typical classifiers, such as neural nets and trees. Thus, the RF method is more suitable for surface meteorological observations. By using random feature selection in addition to bagging, the generalization error is estimated by out-of-bag (OOB) estimation, obtaining concrete results from otherwise theoretical values of strength and correlation. The quality control method was constructed by the SRT and RF methods, and the SRF method can be divided into the following steps. First, dataset L is divided into the training sample Ltrain and testing sample Ltest.The RMSE of ith neighboring stations si is calculated by using formula (2) and formula (12), according to the weighting coefficient, to calculate the new dataset L' :
$$ L_{i}^{\prime}=\frac{s_{i}}{\sum_{i=1}^{N} s_{i}} \cdot L_{i} $$ (10) where N is the number of neighboring stations to be used in the new dataset. Then, the RF regression method is used to train and regress the new dataset. Ultimately, the values predicted by the SRF method yest are compared with observations of the target station (yobs) that have inserted artificial errors. Coefficient f is used to test whether the observed values fall within the confidence intervals:
$$ \left\|y_{\text {est }}-y_{\text {obs }}\right\| \leqslant f \cdot \sigma $$ (11) If the observations of the target station fall within the confidence intervals, the observations pass the SRF test. Fig. 3 illustrates the specific flow of the SRF algorithm.
-
The root mean square error (RMSE), mean absolute error (MAE) and nash-sutcliffe model efficiency coefficient (NSC) are used to evaluate the performance of different methods in this article. Average differences can be described by RMSE or MAE, as RMSE and MAE are among the best overall measures of model performance. MAE and RMSE take the following forms:
$$ \begin{array}{l} \operatorname{RMSE}=\sqrt{\frac{\sum_{i=1}^{n}\left(y_{\text {obs }}-y_{\text {est }}\right)^{2}}{n}} \end{array} $$ (12) $$ \text { MAE }=\frac{1}{n} \cdot \sum_{i=1}^{n}\left\|y_{\text {obs }}-y_{\text {est }}\right\| $$ (13) $$ \text { NSC }=1-\frac{\sum_{i=1}^{n}\left(y_{\text {obs }}-y_{\text {est }}\right)^{2}}{\sum\limits_{i=1}^{n}\left(y_{\text {obs }}-\bar{y}\right)^{2}} $$ (14) where yobs is the observations of the target station, yest is the estimated value of the target station, and $\bar{y} $ is the arithmetic mean of yobs for the test sample i = 1, 2, …, n.
In meteorological data quality control research, a type I error is the incorrect rejection of a true null hypothesis, while a type II error is the failure to reject a false null hypothesis. More simply stated, a type I error is detecting an effect that is not present, while a type II error is failing to detect an effect that is present. In order to balance the two types of errors, Xiong[12] utilized a mean-square ratio of detected errors to the total number of seeds (MSR) to evaluate the performance of the method. Also, MSR is employed to evaluate different quality control methods in this article, which is defined as follows:
$$ \mathrm{MSR}=1-\left(\left(\alpha \cdot r_{1}^{2}\right)+r_{2}^{2}\right)^{0.5} $$ (15) where r1 is the probability of the type I error, r2 is the probability of the type II error, and α is the weight of r1.
3.1. The SRT method
3.2. The RF method
3.3. The SRF method
3.4. Performance measures
-
In this article, daily mean temperature observations from 2005 to 2013 from 14 target stations and their neighboring stations are selected as a training sample, while the 2014 observations are selected as the testing sample. It is necessary to analyze the spatial correlation of the 14 target stations and neighboring stations because the spatial correlation of all stations in a region within 200 km may impact the performance of the quality control model. As shown in Table 1, the results of the spatial correlation are calculated with a semi-variogram (Hanke et al.[35]; Gunst[36]; Maddox and Robert[37]; Deng et al.[38]) and Moran's I (Yuan et al.[39]). When R2 and I are close to 1, a smaller RSS and larger z-value is associated with higher spatial correlation between stations. It is clear that there are high spatial correlations of Bh, Cd, Gz, Hh, My, Ty and Cc, while the spatial correlations of Jh, Ls, Lz, Mh and Um are very low. The information for the 14 target stations is shown in Fig. 4, where the stations are represented by five-pointed star and the number next to the stations' name indicate the number of neighboring stations. The spatial correlation between the target station and neighboring stations are indicated as"high", "low"and "unknown". As the number of neighboring stations for Mh is too small, analyzing the semi-variogram in Mh is not possible. An assessment of the different methods for different target stations shows that spatial correlation does impact the quality control of temperature observations (Chen et al.[40]).
Co Co+C Ao R2 RSS I E(I) mean sd z-value Bh 0.101 0.715 1.17 0.848 0.095 0.649 -0.031 -0.032 0.097 7.031 Cd 0.153 2.398 1.22 0.849 1.2 0.721 -0.015 -0.012 0.071 10.39 Gz 0.188 1.163 2.07 0.922 0.076 0.581 -0.023 -0.023 0.093 6.47 Hk 0.157 0.781 1.53 0.804 0.085 0.478 -0.037 -0.037 0.118 4.348 Hh 0.004 2.85 0.61 0.843 2.72 0.545 -0.029 -0.026 0.112 5.096 Jh 0.001 2.385 0.3 0.326 13.3 0.162 -0.091 -0.083 0.158 1.556 Ls 0.01 6.4 0.55 0.385 79.5 -0.08 -0.1 -0.103 0.179 0.125 Lz 0.83 4.403 0.55 0.489 4.4 0.143 -0.24 -0.023 0.094 1.178 My 0.51 9.029 1.48 0.876 6.55 0.729 -0.013 -0.013 0.065 11.43 Mh / / / / / -0.411 -0.333 -0.349 0.29 -0.211 Nj 0.145 0.606 4.01 0.355 0.071 0.552 -0.12 -0.013 0.066 8.602 Ty 0.001 4.516 1.8 0.929 2.25 0.648 -0.012 -0.013 0.067 9.809 Um 0.01 11.58 1.54 0.371 375 0.446 -0.046 -0.046 0.123 4.014 Cc 0.269 1.069 1.38 0.736 0.287 0.615 -0.027 -0.027 0.104 6.174 Table 1. The spatial correlation indexes for the regions of the 14 target stations.
-
10 neighboring stations were selected as the reference stations for prediction by using the SRT method. It was unknown whether prediction would be improved with the 10 reference stations when the SRT method was combined with the RF method. Therefore, it was necessary to identify the appropriate number of selected neighboring stations to determine whether observations of reference stations should be weighted. Fig. 5(a-c) shows the performance of the RF, SRT and SRF methods when 5, 10, 15, and 20 neighboring stations with lowest standard error were selected as reference stations, where reference stations represented as SRF5, SRF10, SRF15 and SRF20. The performances of the SRF and RF methods were found to be superior to the SRT method, as the SRF method required less time to run than the RF method did. To achieve improved quality control, 15 reference stations were selected and weighted according to performance and runtime. Since the number of neighboring stations in Jh, Ls and Mh was less than 15, the three target stations were tested separately, and the results are shown in Fig. 5(d-f), where the values of MAE in Jh, Ls and Mh are 3.448, 2.747 and 1.276 and the values of RMSE in Jh, Ls and Mh are 3.555, 2.927 and 1.736. The results show that the SRF and RF methods have better performance than the SRT method does in regions with a low density of neighboring stations. Moreover, the MAE and RMSE obtained by the SRF method were much lower than that by the SRT method, this is also consistent with Hubbard' s description of the SRT method which does not apply to stations with few neighboring stations.
Figure 5. The performance of the RF, SRT and SRF methods for different cases: (a) MAE for different neighboring stations, (b) RMSE for different neighboring stations, (c) Time for different neighboring stations, (d) MAE and RMSE for Jh, (e) MAE and RMSE for Ls, and (f) MAE and RMSE for Mh.
In the process of spatial consistency quality control, the selection of the radius of neighboring stations also affects quality control. Thus, different radii of neighboring stations were selected for testing. Fig. 6 depicts the performance difference of the RF, SRF and SRT methods, where the MAE and RMSE obtained by the RF and SRF method were lower than those by the SRT method, and the runtime of the SRF method was less than that of the RF method with the increase of radius. The performance of the SRT method fluctuates greatly when the radius to neighboring stations is less than 80 km. It indicates that the SRT method relies on the radius of neighboring stations, while the SRF method is not affected by the radius of neighboring stations. In addition, the performance and runtime of the SRF method are relatively stable regardless of the change in radius.
Figure 6. The performance of mean of the RF, SRF and SRT methods for different radii (20 km-200 km).
Compared with the RF method, the SRF method can exploit the advantages of the SRT method to extract the most important information to construct a dataset with higher correlation, reducing the runtime of the quality control method while maintaining accuracy. The comprehensive comparison of the performance of the SRT, RF and SRF methods in Fig. 6 demonstrates that the SRF method is superior to the SRT and RF methods with the same number of selected neighboring stations and selected radius.
-
The performances of the RF, SRF and SRT methods in different target stations are shown in Fig. 7, which illustrates that the SRF method is superior to the RF and SRT methods. On one hand, the runtime of the SRF method is less than the RF method, particularly for regions with a large number of neighboring stations. On the other hand, the MAE and RMSE obtained by the SRF method are smaller than the SRT method, especially for regions with few neighboring stations, such as Jh, Ls and Lz, which have low spatial correlation. By comparing the performance of the three methods, it is found that the SRF method has an improved runtime over the RF method and improved accuracy in comparison to the SRT method.
Figure 7. The performance of the RF, SRT, SRF methods for the 14 target stations: (a) Time, (b) MAE, and (c) RMSE.
It is important to note that the SRF method performs much better than the SRT method, indicating that the density of neighboring stations has a considerable impact on the performance of the SRT method; however, the density of neighboring stations has little effect on the performance of the SRF method. In addition, the SRF method has a lower MAE and RMSE than the SRT method in the regions with a large number of neighboring stations. In general, the SRF method is more stable and accurate than the SRT method as the number of neighboring stations changes, and the SRF method is more time efficient than the RF method in regions with a large number of neighboring stations.
The MAE and RMSE obtained by the SRT method were lower than the SRF method for the Hh and Um. To confirm whether this is a particular case or not, it was necessary to analyze the performance of the SRT and SRF methods for the Hh and Um. The performances of the SRT and SRF methods are shown in Fig. 8, where the diamond indicates the station with a performance of the SRT method that is better than the SRF method and the dot indicates the opposite situation and different colors in Fig. 8 indicate altitude. It is clear that the performance of the SRF method is better than the SRT method for most cases, but there are 9 stations in these two regions for which the performance of the SRT method is better than the SRF method. In the future, the selection of the quality control methods for these 9 stations is worth considering.
Figure 8. The performance of the SRT and SRF methods for (a) Hh, (b) Um and their neighboring stations.
In this article, we proposed MSR to evaluate the QC methods, which can deal with the associated trade-offs between the two types of errors. Fig. 9 provides the MSR results of RF, SRT and SRF for different cases, which shows that the SRF can yield a higher MSR than other two methods for all cases. The MSR obtained by RF and SRT are close to each other, and there is no statistical difference except in Hh and Nj. According to Fig. 9, the performance of three methods for the region with high spatial correlations is considerably better than that for the low spatial correlations regions, but SRF is more stable. That is because RF and SRT is sensitive to spatial correlations. Also Fig. 10 illustrates the performance of the three methods with the mean and standard deviation for high spatial correlations regions and low spatial correlations regions. For all the cases, the SRF can yield a good MSR (more than 0.5), obviously outperforming the RF method and the SRT method. Fig. 10(a) shows the SRT method is slight better the RF method in high spatial correlations regions, but on the contrary, RF is better than SRT in low spatial correlations regions (Fig. 10(b)).
-
For geographical reasons, not all stations have the ideal number of neighboring stations, which affects the traditional models. In order to check the quality control performance of the SRF method in the specific regions, 6 regions with their neighboring stations were selected. The distribution of stations in the 6 regions is shown in Fig. 11, where the transparent area indicates the ocean or beyond borders where cannot pace the surface weather station.
Figure 11. The distribution of the stations near the seaside for different regions: (a) Bh, (b) Gz, (c) Hk, (d) Jh, (e) Ls, (f) Mh.
Figure 12 compares the performance of the SRF and SRT methods in the 6 regions, illustrating that the SRF method has better accuracy and stability than the SRT method does. The performance of SRF is similar to that of the SRT method for Gz, Hk and Mh in Fig. 7. However, the performance of SRF is superior to that of the SRT method in the regions of Gz, Hk and Mh in Fig. 12, because the SRT method is more easily affected by the geographical environment than the SRF method. The performance of the SRF method is much better than the SRT method in the regions with few neighboring stations, such as Jh, Ls and Mh. This result illustrates that the SRF method has better stability than the SRT method does.