I have been playing around with the top 100 (cherry picked) Hockey Stick Index (HSI) that are all that are supplied by McIntyre and McKitrick in supplementary data for their 2005 paper in GRL. In doing so, I noticed certain defects in the Hockey Stick index they used. Of these, the most glaring is that for any straight line with a any slope other than zero (fat) or infinite (vertical), it indicates that the straight line is a hockey stick. Even with white noise added, so long as the Signal to Noise ratio does not exceed one, the line will probably (>50% chance) be given a HSI greater than 1, the conventional benchmark used by McIntyre to indicate something is a hockey stick.
Here is an example of a straight line "hockey stick":
In this case, the HSI is less than that for MBH 98 or 99, but the mean is of the 132 realizations is greater. That is, according to the M&M05 HSI, a straight line with white noise and a S/N ratio of 1.25 or more is more like a hockey stick than are MBH98 and 99.
This fact does not depend in any way on the slope (provided it is neither flat nor vertical). Negative slopes will yield negative HSI's, but M&M05 (correctly) regard negative HSI's as equivalent to positive values in that the MBH98 reconstruction method flips the sign on proxies if that yields a better fit to the temperature data (which is not an error).
From this it follows that the HSI developed by M&M cannot consistently distinguish between a straight line and a hockey stick shape. I suspect there are other shapes that it cannot distinguish either, but for now we need only consider the straight line. That means that, from the M&M05 HSI, we are unable to determine whether or not half of the 10,000 pseudo proxies are distinguishable from a straight line. Nor, using that index, are we able to distinguish MBH98 from a straight line. That means that as a statistical test of the tendency of short centered PCA to generate shapes similar to that of MBH98, the test is totally without power. It tells you absolutely nothing.
The total statistical power of the first part of M&M05, it turns out, comes from the visual comparison between MBH98 (fig 2) and the MBH first Principle Component of the North American Tree Ring Network (fig 3). That's it. And as everybody should no, eyeball Mark 1 has very little statistical power as well.
Not being content with finding a flaw with M&M05 statistical test, I looked to see if they could have done better. In the end I developed five variant Hockey Stick Indexes (vHSI) that were superior as a statistical test of a hockey stick shape (although not necessarily under all circumstances). These were,
- The ratio of the standard deviation of the calibration period relative to the calibration period (1902-1980) relative to the standard deviation of the non-calibration period. This tests for flatness in the "handle" vs noisiness or a high relative slope in the "blade". Like the M&M HSI, it will only work well when the "handle" is flat, but will work better in that circumstance.
- The angle formed by the slope in the calibration period relative to the angle of the slope of the non-calibration period if the two are displaced to intersect at the first year of the calibration period. This tests merely for the angle between "blade" and "shaft" and will work well regardless of orientation . It will not tell you how flat the "handle" is, however, and so can be confused by "hockey sticks" with very crooked "handles".
- The closeness of the largest inflection point in the period 1850-1900 to the inflection start of the calibration period. The inflection point is defined as the start year of the largest 50 year trend starting in that period. The index is defined as the difference between the inflection point and 1850 divided by the square root of the difference beween the start of the calibration period and 1850. (not shown)
- The angle formed by the slope to 1850 and the fifty year trend from the inflection point. This again works best with a flat "handle".
- The inflection point angle weighted by the inflection point index.
The twelve point mean is the average of the 12 pseudo proxies used by McInyre (and Wegman) in various illustrations M&M's results. MBH98 PC1 is the first principle component of the reconstruction of 1450-1400 temperatures from MBH98.
As can be seen, MBH98 and 99 are statistically distinguishable from even the cherry picked top 1% of pseudo proxies, with differences in index values never less than 2 standard deviations above the mean, and for one index nearly 10 standard deviations above the mean. MBH98 PC1 does not perform as well, but still can be statistically distinguished from the cherry picked top 1% in 3 out of the five tests. (The Inflection point vHSI shows MBH98 PC1 to be just over two standard deviations above the mean.)
This is still a work in progress. I think I need to improve my vHSIs by making comparisons with the instrumental record rather than the calibration period, and a combination of angle based and standard deviation based vHSI would probably be superior. Further, I should make comparisons with the first principle component of the North American Tree Ring data base.
Never the less, even at this stage the results show that you can devise variant Hockey Stick Indexes that are better able to determine a hockey stick shape, and that if you use those vHSIs MBH98 and 99 stand out as easily statistically distinguishable from PC1s generated from red noise using short centered PCA. Further, those vHSIs are demonstrably superior to that of M&M05 in that at least none of them will mistake a straight line for a hockey stick (except the pure inversion method, which is why it was not shown ;)) So not only did M&M05 use a test with no statistical power, without validating the test; but alternative tests exist which would have refuted their thesis.
The take home is that the first part of M&M05 is simply scientific garbage. It has no scientific merit whatsoever.
When I get around to it, I am going to see if I can develop even better vHSIs, but probably will wait at least till I have a copy of the NOAMER PC1, and ideally until I (or a collaborator) can generate a full set of pseudo proxies without the cherry picks for statistical comparisons. (Help with either would be appreciated.)