next up previous contents
Next: Detection of outliers Up: Data Processing Previous: Data Processing

Estimation of scaling factors

As stated before, the Bernese processing software underestimates the true errors of the coordinate solutions because systematic errors or mismodelled parameters are not included in the formal error estimated by the processing software [Hugentobler et al. (2001)]. To obtain a resonable estimate of the daily coordinate errors we need to rescale the formal errors to obtain a realistic error estimation.

This problem has been dealt with in many studies, since all Bernese software users (and probably users of other software as well) necessarily need to face this important problem. However, there is no standard method approved by the GPS community and each study seems to use its own method to obtain a scaling factor to multiply the formal errors. Usually scaling factors are estimated by comparing the scatter of the data to the formal errors. The differences lie in how the scatter is defined and how the full covariance matrix is used to define the errors to be scaled.

The Bernese software offers a method to derive a scaling factor using the combination program ADDNEQ [Hugentobler et al. (2001)], [Braun (2000)]. This method estimates the rms repeatabilities relative to a constant velocity model, meaning that the repeatabilitiy is calculated after subtracting a best straight line fit ( $\hat{y} = at + b$) from the data. From Figure 5 we see that a straight line represents the data poorly. Offsets in the time series due to the the June 2000 earthquakes and radome installation would bias the repeatability estimation considerably. Furthermore, if we remove the coseismic displacement from the east component of VOGS (described in more detail in Section 4.6) and subtract a straight line, obtained by a least squares medhod (see Section 4.2.1), from the data we obtain the residual1 of the time series:

 \begin{displaymath}{\bf r} = {\bf y} - \hat{y} = {\bf y} - a{\bf t} - b,
\end{displaymath} (1)

where ${\bf y}$ is the vector of observed displacements, e.g. for the east component of VOGS and $\hat{y}$ is the best line fit. Figure 6 shows the residual of the east component of VOGS relative to REYK. We see immediately from the figure that a straight line model leaves a residual with periodic variations.

Figure 6: Residual time series of the east component of VOGS obtained by removing offsets due to the June 2000 earthquakes and subtracting a best line fit from the data.
\mbox{\epsfig{figure=figures/vogseleif.eps,width=13cm} }

[Árnadóttir et al. (2000)] use the Bernese network adjustment program COMPAR to calculate the average station coordinates for each week. COMPAR also returns the variation of the daily solutions from the weekly average. They compare the variances to the formal errors and obtain a scaling factor of 3 that they use for all coordinate components and sites. They also process the data using the GIPSY/OASIS software [Webb and Zumberge (1993)] and use there a scaling factor of 2.7 to rescale the coordinate and velocity errors.

[Lowry et al. (2001)] transform the covariance matrix, obtained using the Bernese software, to a local east-north-up coordinate system and define the formal error as the corresponding column sum of the covariance matrix. They estimate the repeatability scaling factors using the 95th% $\chi^2$ repeatability of the coordinates, relative to a time varying velocity model. This results in scaling factors (one for each coordinate direction at each site) ranging from 2.0 to 3.9 (east), 1.6 to 3.6 (north) and 1.5 to 4.0 (up) in their case.

In this study we define the formal errors as the square root of the diagonal elements of the covariance matrix and compare them to the weighted standard deviation, or repeatability, defined as

 \begin{displaymath}WSTD=\left( \frac{N}{N-1} \frac{\displaystyle \sum_{i=1}^{N} ...
...{N} \left(\frac{1}{\sigma_{i}}\right)^2} \right)^{\frac{1}{2}}
\end{displaymath} (2)

where N is number of data points, xi is the daily coordinate value, $\sigma_i$is the daily formal coordinate error and $\bar{x}$ the mean coordinate value. Assuming $\sigma_i$ is a constant equal to $\sigma $, and $x_{i}-\bar{x}$ is normally distributed with standard deviation $\sigma $, then it is easy to use equation 2 to verify that WSTD converges to $\sigma $ for large N.

From Figure 6 we see there are clear annual variations in the data and we face the question whether the residual contains signal or if this is unmodelled noise. If the annual variations are assumed to be measurement noise then the standard deviation for the whole time series is an appropriate measure of the error. If the annual variations are considered a signal, be it from the earth or resulting from the processing, then the annual variations have to be removed from the time series before estimating the standard deviation. Here, the latter approach is chosen, and only the high frequency component is considered as measurement noise. This is in practice achieved by using only short segments (e.g. 50 days) of the time series to calculate WSTD.

We define the scaling factor as the ratio between the repeatability of the residual time series within a specific time interval, and the median formal error within the same time interval:

 \begin{displaymath}s_{tj} = \frac{ WSTD_{tj} }{ \sigma_{tj}^{med} }
\end{displaymath} (3)

where j refers to the east north and up coordinate directions, t refers to the time interval used, med refers to the median of the formal error $\sigma_j$ and WSTD is obtained from equation 2. The median of the formal errors is chosen to represent the average error within a time interval because it is a more robust estimator than the mean. The coordinate errors obtained by using the scaling factor defined in equation 3 do not represent the absolute positioning accuracy of the daily solutions because they also rely on e.g. the coordinates of the reference station (REYK). The rescaled errors, obtained using the scaling factor as in equation 3, represent only the short-term repeatability of the daily solutions.

Offsets in the time series due to the June 2000 SISZ earthquakes and equipment changes are removed (see Sections 4.1.1 and 4.6), as well as outliers (Section 3.2), before estimating the scaling factors. The time series for each station and each coordinate component are then split into kp segments including tdata points each where p refers to the station name as kp will depend on the length of the time series at station p. Each segment in each coordinate component is detrended using a weighted least squares method. From the detrended segments the WSTD and median formal errors are estimated to obtain scaling factors according to equation 3 for the east, north and vertical components for each segment. The mean of the kp scaling factors, for each component at each station, is calculated to obtain a single set of scaling factors in the east, north and vertical components for each station.

Table 4: Left part: Median of the formal errors in east, north and vertical components calculated using all available data from each station. Center part: Scaling factors for all stations, obtained with equation 3, using an interval of 50 data points. The line "Composite" stands for where all the data from all stations were used to compute a single scaling factor for the east, north and vertical components. Right part: The last three columns show WSTD calculated as the mean of the WSTD values obtained for each segment in each coordinate direction for each station (equation 2). In line "Composite" the mean was taken over all WSTD values from all the stations in each component.
  Formal errors [mm] Scale factors WSTD [mm]
Station E N U E N U E N U
AKUR 0.24 0.37 2.05 3.2 5.0 2.7 1.08 1.53 6.77
HLID 0.24 0.37 2.27 3.7 3.4 2.2 0.86 1.16 4.70
HOFN 0.37 0.39 2.35 3.6 3.9 2.6 1.34 1.51 6.11
HVER 0.24 0.37 2.28 4.3 3.5 2.1 1.06 1.33 5.09
HVOL 0.27 0.37 2.27 4.8 4.4 2.4 1.30 1.67 5.80
KIDJ 0.21 0.32 1.96 3.8 3.1 2.1 0.83 0.99 4.16
OLKE 0.23 0.36 2.19 4.3 4.4 2.2 1.03 1.60 5.06
RHOF 0.29 0.42 2.23 4.5 3.9 2.8 1.35 1.65 6.31
SKRO 0.25 0.33 2.01 5.1 4.2 2.6 1.32 1.46 5.61
SOHO 0.26 0.37 2.22 5.0 4.7 2.3 1.40 1.81 5.51
THEY 0.24 0.34 2.05 4.8 4.7 2.5 1.16 1.66 5.44
VMEY 0.23 0.35 2.09 4.4 3.4 2.2 1.00 1.26 4.63
VOGS 0.24 0.38 2.30 3.2 3.2 2.0 0.78 1.24 4.61
Composite - - - 4.2 3.9 2.3 1.11 1.45 5.25

Table 4 summarizes the results, using a time interval length of t = 50. The scaling factor (center columns of Table 4) is generally largest for the east component and smallest for the vertical component. There are considerable variations between stations and the scaling factors range from 3.2 to 5.1 in east, from 3.1 to 5.0 in north and from 2.0 to 2.8 in the vertical component. The median formal errors, calculated using all available data from the stations, shown in the left part of Table 4 also vary between stations, but not nearly as much as the scaling factor. The formal errors are smallest in the east component and by far largest in the vertical component. The values of WSTD, using time interval length of t = 50, are shown in the right part of Table 4. These values represent the short-term scatter in the time series. WSTD is generally smallest in the east component and largest in the vertical component.

Despite the differences in the scaling factors between stations, we derive a single scaling factor for each coordinate component for the whole network. This is in order to simplify programming and discussion. It is not fair to simply take the average of the scale factors over all stations because a different amount of data lies behind each value. Thus we rather weigh the scale factors by the amount of data they are based on by taking the mean of all scaling factors obtained at each segment for all stations in each component. This results in scale factors of value 4.2 (east), 3.9 (north) and 2.3 (vertical), labelled "Composite" in Table 4. As final values we choose to use scale factors of 4.0 for the east and north components and 2.5 for the vertical component. The outlier detection has a significant effect on the scale factor values obtained. This will be discussed in more detail in the end of next chapter.

Equation 3 states that the median error and WSTD are linearly related. A quick look at WSTD as a function of the median error (Figure 7) shows this is not a very good assumption. However, we need to connect these two parameters and a linear relationship seems no worse than any other.

Figure 7: Comparison of the relationship between WSTD and the median of the formal error. The values were obtained using t=50, and come from all segments of all stations: (a) east, (b) north, (c) up.
... \subfigure[]{\epsfig{figure=figures/scale_hypoU.eps,width=5cm} }}

The choice of the interval t is also important. If t is short, say 3 data points, then the estimation of WSTD becomes lower because the time series is being followed too closely. If t is long, say on the order of 500 data points, the estimation of WSTD includes the annual variations and becomes much higher. If t exceeds the length of the time series then WSTD is simply the weighted standard deviation of the residual. A value of 50 datapoints seems to be appropriate. Using a time interval of 7 days results in composite scaling factors of 3.5 (east), 3.3 (north) and 1.9 (up). These values are comparable with the value (s=3) used by [Árnadóttir et al. (2000)] based on data from the ISGPS network until February 2000.

next up previous contents
Next: Detection of outliers Up: Data Processing Previous: Data Processing
Halldor Geirsson