Search This Blog

Wednesday, June 5, 2013

Tol on "Quantifying the consensus on anthropogenic global warming in the scientific literature"

Richard Tol has been turning a series of intemperate and poorly supported criticisms of Cook et al (2013) into an intemperate and poorly supported comment, currently in its second draft.  Taken to task about the negativity of the criticisms, Tol responded that he did not have the option of constructive criticism because he does not have the resources.  Willard points out how absurd this excuse is.  In fact, I think he is over generous.  A constructive criticism need not formulate a better approach.  It need only show the likely impact of the relevant factors on the results of the paper being criticized.

In fact, it takes minimal resources and time to be constructive in this way.  Tol, however, at avoids every opportunity to lift above pure negativity in this way.  The consistent bias in his approach shows his claim that he does not have the resources for a constructive criticism is sheer bunk.



Taking one example, he corrected his first draft claim that:

 "In fact, the paper by Cook et al. may strengthen the belief that all is not well in climate research.  For starters, their headline conclusion is wrong. According to their data and their definition, 98%, rather than 97%, of papers endorse anthropogenic climate change. While the difference between 97% and 98% may be dismissed as insubstantial, it is indicative of the quality of manuscript preparation and review."
(My emphasis)

by adding the footnote that:

"1 Cook et al. arrive at 97% by splitting the neutral rate 4 into 4a and 4b, but only for 1,000 of the 7,970 papers rated 4; data are not provided. It is unclear whether they found 40 in the sample of 1,000, or 5 and scaled it up to 40 for the 7,970 neutral abstract. If the former is true, then 319 should have been reclassified. The headline endorsement rate would be 91% in that case. No survey protocol was published, so it is unclear whether the 4 ad hoc addition."

So, on the evidence available to him all he knows is that Cook et al's headline result may be the result of the correct projection of a subsidiary survey and is therefore in no way indicative of poor manuscript preparation or review.  These details, however, are consigned to a footnote, while the original attempt at condemnation remains in the body of the text.   .

His proper course of action given the additional information should have been to remove the original paragraph from the manuscript.  Discussion of the "issue", if included should have been consigned to an additional item in the body of the text.  Even then, the unverified suggestion that Cook et al failed to perform an simple and obvious projection from the subsidiary survey is unwarranted.  (Indeed, a co-author has verified the simple and obvious projection was made, which verification Tol merely dismissed as irrelvant.)

Cook et al should have made their method clearer by including the data from the subsidiary survey in the SI.  But that is a quibble having no impact on the headline result.  But pointing this out would have taken no more time or effort than Tol's chosen course of retaining the implicit slur while adding a footnote that completely undercuts the point he tries to make.

Another example is Tol's comment that:

"The Web of Science provides aggregate statistics for any query results. Figure 2 compares the disciplinary composition of the larger sample to that of the smaller sample. There are large differences. Particularly, the narrower query undersamples papers in meteorology (by 0.7%), geosciences (2.9%), physical geography (1.9%) and oceanography (0.4%), disciplines that are particularly relevant to the causes of climate change." 
(My emphasis)

This restrained comment contrasts with his clear statement in other cases that the detected skew in samples he thinks is likely to bias the results in favour of endorsements, eg:

"The data behind Figures 3 and 4 suggest that the smaller sample favoured influential authors and papers, who overwhelmingly support the hypothesis of anthropogenic climate change."

The reason for the restraint in the former case is revealed in an email to me in which Tol states:

"Cook et al. undersampled meteorology, oceanography, and geophysics journals, which suggests that they underestimated endorsement."

It is evident that when Tol discovers a skew in the sample he thinks will bias the result in favour of endorsement, he says so up front.  In contrast, when he thinks the skew will bias the result against endorsement he merely mentions the skew and not (what he considers to be) the probable consequences.  Again it takes no more effort to mention a negative bias than it does to mention a positive bias.  The negativity then, is by construction.  It represents a deliberate policy based on political intentions, not time constraints.

A third example comes from his analysis of skewness of the sample relative to disciplines in WoS.  Using data Tol has provided me, I have estimated the number of papers in the Cook et al survey from disciplines which are over represented relative to Tol's preferred search terms (5883) and those which are under represented (5985).  (The sum is 76 less than 11,944 papers rated but not excluded as per Cook et al.  This is due to rounding errors and the fact that some disciplines are not represented in both samples,making scaling of the results difficult.  The difference should not be significant).  It is also possible to estimate the number of excess abstracts in disciplines which are over represented (1711) and those which are under represented (1714).

These data should have been included by Tol in his analysis.  The near equality of the figures means it is almost impossible that the skew in subjects has resulted in a bias in the headline result.  In fact, given that the subjects which are over represented cannot have more than 100% endorsements excluding abstracts rated (4); it is impossible for papers from subjects that are under represented to have less than 96% endorsements in aggregate.  That means that the maximum variation in endorsement percentages resulting from the skewness Tol draws attention to is between 97.4 and 98.6%.

This is something highly relevant to Tol's critique of Cook et al.  It only takes about half an hour to calculate these facts, so Tol's failure to do so is not due to time constraints.  Again the simplest explanation is a straightforward bias towards including only negative criticisms; and towards excluding context that allows assessment of the impact of those criticisms.

A fourth and final example comes from Tol's new and unsurprising discovery that self rating respondents do not match in proportion the rated papers.  Unsurprising because people with strong positions (endorsement or rejection) are more likely to want their opinions registered and hence more likely to respond.  Given this the result is as likely to show bias in the rate of response rather than show the abstract ratings are in error.  The direct comparison between absract ratings and self ratings is not straightforwardly projectable, but does clearly show the abstract ratings to be conservative, ie, biased towards a rating of (4).

Though not straightforwardly projectable, however, we can project them on the assumption that self ratings are representative.  Doing so shows that if there was no skewness between abstract and self rating numbers, the abstract ratings would have reported 96.6% endorsing the consensus, with 3.4% rejecting or uncertain on the consensus.  In other words, the skewness identified by Tol would have had an impact of only 0.5% on the headline result.  Again, calculating this result is straighforward and requires minimal time.  While reporting it, however, is very useful in placing the skewness reported in table 5 of the paper in context, it destroys that data as a useful negative talking point.  Therefore Tol could not find the time for this simple analysis.

These four examples do not address the major flaws in Tol's critique.  In fact, were I to do so it is simple to show that Tol's critique is based on superficial data analysis and a fundamental misunderstanding of basic terms in the paper.  These examples show, however, that the negativity of Tol's critique is based on a predetermined desire to undermine the paper, whose results he finds politically inconvenient.  His choice to be destructive in his criticism is not because of time constraints, but because he needs to generate, and disseminate "talking points" to allow those inclined to not think about the implications of Cook et al.

That clear motive, evident in both his tweets and his comment strongly suggests that corrections of his errors will not be incorporated into his comment.  Certainly his comment will not include estimates of the likely impact of the skewness he identifies on the headline result except where (as with his footnote mentioned in the first example, absurd suppositions allow him to quote a large impact.

2 comments:

  1. 1) In his third draft, Tol has moved his discussion of the subsidiary survey of rating (4) papers from the footnote and dropped his claim that "While the difference between 97% and 98% may be dismissed as insubstantial, it is indicative of the quality of manuscript preparation and review."

    He still insists, however, that there is doubt as to whether the subsidiary survey found five of one thousand or forty of one thousand "uncertain" papers among those rated (4). This despite a public statement by a co-author that the number was five; a statement of which Tol was aware well before his third draft. His lack of clarity is, therefore, purely tactical rather than based on evidence. That is, he is unclear because he ignores evidence of which he is aware in order to retain an unjustified negative criticism in his comment.

    2) Tol has now admitted in his third draft that the skewed sample of disciplines relative to a scopus search "introduces a bias against endorsement". He does not make the same admission regarding the WoS search even though based on the same data and logic; and even though he has made that admission in private correspondence.

    This admission means that his claim of evidence of bias comes entirely from his unjustified claim that "impacts" and "mitigation" papers should not be rated.

    ReplyDelete
  2. The draft submitted to ERL was not draft four, but draft five. I am not sure how they differ, if at all.

    One thing that is noteworthy about the differences between draft 5 and draft 3 is that a number of edits have been made to make the language more negative and critical without, in fact, any addition to the argument.

    In draft three, for example, Tol begins the paragraph discussing the subsidiary survey of 1000 endorsement level 4 papers by writing:
    "Cook et al. claim that 97% of abstracts endorse the hypothesis of anthropogenic climate. The available data, however, has 98%."
    In draft five, however, those two sentences stand apart as a separate paragraph. By separating them from the discussion, Tol gives the appearance of an error where in fact none exists.

    When moving into the discussion of the subsidiary survey, Tol has not corrected any facts. He now ends the discussion by writing:
    "Data for the 4th rating are not available.The headline conclusion is not reproducible."

    Of course, data for the fourth rating (ie, the subsidiary survey) is available. We know that 1000 abstracts were rated, and further, we know from the paper that 0.5% of those were rated 4b (Uncertain on AGW). That percentage has been confirmed publicly by Dana, a co-author of the paper. It has also been confirmed in private correspondence by John Cook. Given that, the headline result is easily reproducible. Presumably Tol means only that the raw data of the subsidiary survey is not available. Access to the raw data, however, is not necessary in order to reproduce a result.

    Perhaps Tol is claiming that access to that data is necessary to reproduce the headline result based on his fiction that there is doubt as to whether 0.5% of 4% of abstracts rated in the subsidiary survey were rated 4b. As noted, however, any doubt on that basis that existed in the paper (and I maintain no such reasonable doubt existed) was put to rest by the public statement of an author. What is more, we know that Tol was aware of the statement. His failure to mention that statement in his discussion, therefore, constitutes scientific misconduct. He has concealed data of which he is aware, and which rebuts the position he argues in his paper.

    Further on, Tol's claim that "A number of authors have come out to publicly state that their papers were rated wrong, but their number is too small for any firm conclusion" has been turned into the end of a paragraph and had the qualification dropped, thus giving it emphasis. At the same time he has identified himself as one of the authors who disagreed with the rating of his paper. He does not, however, note that he responded to the survey of authors so that his disagreement is already included in the overall statistics. Again, this is relevant information to assessing the relevance of the seven authors who disagreed with the ratings, but is excluded because it runs contrary to Tol's narrative.

    In fact, given the nature of Tol's comment, the most germaine question regarding these seven authors is, are they a representative sample? And what measures have you taken to ensure that they are a representative sample? The answers, clearly, are no, and none. Given Tol's critique of Cook et al, his inclusion of mention of these cherry picked examples is the rankest hypocrisy.

    ReplyDelete