22042010
What exactly is a monthly mean?
By Eco Guy
1:31am 22nd April 2010
A lot of climate analysis is done using the monthly mean temperature. Given that data is quite often available for hourly measurements I thought I'd have a look at whether creating a monthly mean straight off all the hourly measurements would produce a difference; initial findings indicate it does.
Background
Based upon Watts observations in
GISS & METAR – dial “M” for missing minus signs: it’s worse than we thought and how simply using '-' instead of 'M' can flip measurements and introduce error I thought I'd have a look at two things:
- Does this sign flip error show up in other places?
- Does the act of calculating the monthly mean introduce error or bias?
In order to do this one needed access to a 'higher resolution' data set than just the simple daily mean measurements; as without that it would be impossible to determine with any degree of significance whether the act of creating the mean value would introduce error into the final value.
Data sources
Two data sources were used for this:
Method
Quite simple really:
- Determine 'matching' stations in both the SCRAM and GHCN data set; i.e. both based on the same airport. (for instance Santa Barbara/FAA Airport SCRAM:CA23190 -> GHCN:42574606001) Also validated by lat/long check.
- Extract the monthly and hourly data.
- Run a program to calculate for each month:
- The 'Derived' monthly mean, based on taking the min/max per day mean, summing for month and finding the mean over the month with 1 decimal place of final rounding applied. This should closely track the GHCN measurement, indicating the hourly data set is closely correlating the data set GHCN used to get its figure; i.e. they are probably the same.
- The 'exact' monthly mean, based on taking all the hourly measurements in a given month and producing the mean value, to 4 decimal places.
- A rounded form of that to one decimal place.
- The difference of the GHCN mean to the 'exact' mean to one decimal place.
Now, the reason for the 1 decimal place rounding is to fairly compare to the same degree of supplied accuracy the GHCN figures against the Derived and exact means.
Also error values etc are filtered out and the means are calculated across the exact set size in each case (i.e. if a day has only 23 measurements, thats what the mean is calculated across and not 24).
The Results
I have only had time to produce one pair of results so far, but they are quite interesting in their own right. These are for Santa Barbara/FAA Airport.
Year=1984
Month Org.Mean [Derived] Mean from hrs rounded Diff
1 12.4 [12.2] 12.0639 12.1 0.3*
2 12.4 [12.4] 12.4082 12.4 0.0
3 14.6 [14.6] 14.7050 14.7 -0.1
4 15.2 [15.3] 15.3426 15.3 -0.1
5 17.8 [17.6] 17.3775 17.4 0.4*
6 17.4 [17.4] 17.3889 17.4 0.0
7 19.7 [19.6] 19.1129 19.1 0.6*
8 21.3 [21.2] 20.9677 21.0 0.3*
9 22.7 [22.6] 22.0139 22.0 0.7*
10 16.5 [16.5] 16.7100 16.7 -0.2
11 12.8 [12.7] 12.8372 12.8 0.0
12 10.4 [10.6]* 10.6325 10.6 -0.2
Net yearly temp diff between Org Monthly measure and hourly calc mean (+ is org is higher) = 1.7
Year=1985
Month Org.Mean [Derived] Mean from hrs rounded Diff
1 10.2 [10.2] 10.0986 10.1 0.1
2 11.0 [10.9] 10.9755 11.0 0.0
3 11.6 [11.6] 11.7279 11.7 -0.1
4 14.3 [14.3] 14.1443 14.1 0.2*
5 14.2 [14.1] 14.3182 14.3 -0.1
6 17.2 [16.9]* 16.5093 16.5 0.7*
7 20.2 [19.9]* 19.3735 19.4 0.8*
8 19.1 [18.9]* 18.5297 18.5 0.6*
9 18.2 [18.1] 18.0733 18.1 0.1
10 16.3 [16.3] 16.1850 16.2 0.1
11 12.1 [12.2] 12.3125 12.3 -0.2*
12 11.9 [11.9] 11.2463 11.2 0.7*
Net yearly temp diff between Org Monthly measure and hourly calc mean (+ is org is higher) = 2.9
Year=1986
Month Org.Mean [Derived] Mean from hrs rounded Diff
1 13.1 [12.9]* 12.3581 12.4 0.7*
2 13.1 [13.0] 12.9787 13.0 0.1
3 13.7 [13.6] 13.5633 13.6 0.1
4 14.4 [14.3] 14.4097 14.4 0.0
5 14.9 [14.8] 14.6752 14.7 0.2*
6 16.9 [16.9] 16.6937 16.7 0.2
7 18.2 [18.2] 17.9495 17.9 0.3*
8 18.2 [17.9]* 17.0393 17.0 1.2*
9 16.7 [16.6] 16.6335 16.6 0.1
10 17.1 [17.0] 16.7174 16.7 0.4*
11 14.8 [15.0] 14.6952 14.7 0.1
12 12.1 [12.1] 11.2970 11.3 0.8*
Net yearly temp diff between Org Monthly measure and hourly calc mean (+ is org is higher) = 4.2The 'Org Mean' field is from the GHCN data set. The * indicates a difference greater than 0.2 - this is to allow for the effects of different types of rounding at the limit could take the same original value and produce a difference of up to 0.2 with 1 decimal place of rounding enforced.
Given this, what is interesting is:
- My derived value seems to closely track the GHCN data set well - although given its not perfect I suspect some corrections are being applied that I'm not aware of.
- Only in one case is the derived value actually higher than the GHCN value.
- The exact mean value is able to often 'stray' by greater than 0.2 away from the GHCN data set value whilst the derived value remains in scope (i.e a * in the fair right column, but none in the third). To me this indicates the daily min/max mean approach is often incorrectly representing the real distribution of temperatures in the month; which is not that much of a surprise as the daily min/max only uses a maximum of 62 contributing data points, compared to the exact mean that uses up to 744 contributing data points (or 10x plus the effective resolution).
- It seems the min/max monthly mean method appears to add a net warming when compared to the exact monthly mean value.
Note: All the above needs further analysis across a much larger set of data pairs to determine if there is a trend here - but the degree of differences encountered so far are quite worrying.
Definitely food for thought, I was quite taken aback by the differences coming through and had to double check my code (even putting in some crazy sanity checks). Also I have put in code to detect possible 'sign switching' problems although it hasn't triggered on this one data set pair.
Whats next?
I want to spend a bit of time and turn this into an online tool so people can have a play themselves and see whats going on.
Also I want to investigate the yearly accumulative difference over a larger data set - to see if a trend across multiple data set pairs is evident.
More exact monthly means.. The results are in..
Got a question or comment about this?
Find what you were looking for?.. Not quite what you expected?.. Got a question to ask people?Share your thoughts and use the form below to post a public comment right on this page.