Monday, November 23, 2009

Climate Fraud Smoking Guns: Highlights of the incredible CRU HARRY_READ_ME.TXT file

The hacked climate research files include a lengthy file of programmers' notes called HARRY_READ_ME. It appears to record the challenges that programmers encountered in trying to produce the results that the warming crowd wanted. If it is one-tenth correct, it is the single smoking gun (rather, a stick of dynamite) that will end the global warming bunko scam once and for all.

Consider the highlights:

I was going to do further backtracing, but it's been revealed that the same issues were in 2.1 - meaning that I didn't add the duff data. The suggested way forward is to not use any observations after 1989, but to allow synthetics to take over. I'm not keen on this approach as it's likely (imo) to introduce visible jumps at 1990, since we're effectively introducing a change of data source just after calculating the normals. My compromise is to try it - but to also try a straight derivation from half-degree synthetics.

So, first, we need synthetic-only from 1990 onwards, that can be married with the existing glos from pre-1990...

Here, the expected 1990-2003 period is MISSING - so the correlations aren't so hot! Yet the WMO codes and station names /locations are identical (or close). What the hell is supposed to happen here? Oh yeah - there is no 'supposed', I can make it up. So I have :-)

..You can't imagine what this has cost me - to actually allow the operator to assign false WMO codes!! But what else is there in such situations? Especially when dealing with a 'Master' database of dubious provenance (which, er, they all are and always will be)...

25. Wahey! It's halfway through April and I'm still working on it. This surely is the worst project I've ever attempted. Eeeek. I think the main problem is the rather nebulous concept of the automatic updater. If I hadn't had to write it to add the 1991-2006 temperature file to the 'main' one, it would probably have been a lot simpler. But that one operation has proved so costly in terms of time, etc that the program has had to bend over backwards to accommodate it. So yes, in retrospect it was not a brilliant idea to try and kill two birds with one stone - I should have
realised that one of the birds was actually a pterodactyl with a temper problem.

With huge reluctance, I have dived into 'anomdtb' - and already I have that familiar Twilight Zone sensation.

I have found that the WMO Code gets set to -999 if *both* lon and lat are missing. However, the following points are relevant:

* LoadCTS multiplies non-missing lons by 0.1, so they range from -18 to +18 with missing value codes passing through AS LONG AS THEY ARE -9999. If they are -999 they will be processed and become -99.9. It is not clear why lats are not treated in the same way!

* The subroutine 'Anomalise' in anomdtb checks lon and lat against a simple 'MissVal', which is defined as -999. This will catch lats of -999 but not lons of -9999.

* This does still not explain how we get so many -999 codes.. unless we don't and it's just one or two?

And the real baffler:

* If the code is -999 because lat and lon are both missing - how the bloody hell does it know there's a duplication within 8km?!!!

An interesting aside.. David was looking at the v3.00 precip to help National Geographic with an enquiry. I produced a second 'station' file with the 'honest' counts (see above) and he used that to mask out cells with a 0 count (ie that only had indirect data from 'nearby' stations). There were some odd results.. with certain months havign data, and others being missing. After considerable debate and investigation, it was understood that anomdtb calculates normals on a monthly basis. So, where there are 7 or 8 missing values in each month (1961-1990), a station may end up contributing only in certain months of the year, throughout its entire run!

The problem is that the synthetics are incorporated at 2.5-degrees, NO IDEA why, so saying they affect particular 0.5-degree cells is harder than it should be. So we'll just gloss over that entirely ;0)

ARGH. Just went back to check on synthetic production. Apparently - I have no memory of this at all - we're not doing observed rain days! It's all synthetic from 1990 onwards. So I'm going to need conditionals in the update program to handle that. And separate gridding before 1989. And what TF happens to station counts?

OH **** THIS. It's Sunday evening, I've worked all weekend, and just when I thought it was done I'm hitting yet another problem that's based on the hopeless state of our databases. There is no uniform data integrity, it's just a catalogue of issues that continues to grow as they're found.

Confidence in the fidelity of the Australian station in the database drastically reduced. Likelihood of invalid merging of Australian stations high...

I am very sorry to report that the rest of the databases seem to be in nearly as poor a state as Australia was. There are hundreds if not thousands of pairs of dummy stations, one with no WMO and one with, usually overlapping and with the same station name and very similar coordinates. I know it could be old and new stations, but why such large overlaps if that's the case? Aarrggghhh! There truly is no end in sight...

...I have the CLIMAT bulletin for 10/2006, which gives data for Rain Days (12 in this case). It doesn't seem likely that nothing was reported after 2002...

So.. should I really go to town (again) and allow the Master database to be 'fixed' by this program? Quite honestly I don't have time - but it just shows the state our data holdings have drifted into. Who added those two series together? When? Why? Untraceable, except anecdotally.

It's the same story for many other Russian stations, unfortunately - meaning that (probably) there was a full Russian update that did no data integrity checking at all. I just hope it's restricted to Russia!!

And now the June 2000 histograms are much more interesting! And of course (for this is THIS project), much more worrying. The June 2000 plot for the new data (3.00) shows a fall at VAP ->0. This is in contrast to the other three, which show a more expotential decline from a high near 0 (though admittedly the 2.10 version does have a second peak at around 120). In fact, the June 2000 3.00 series has peaks at ~90 and ~300! Oh, help.

The big question must be, why does it have so little representation in the low numbers? Especially given that I'm rounding erroneous negatives up to 1!!

Oh, sod it. It'll do. I don't think I can justify spending any longer on a dataset, the previous version of which was completely wrong (misnamed) and nobody noticed for five years.

Looked at Bangladesh first. Here, the 1990s show a sudden drop that really can only be some stations having data a factor of 10 too low. This ties in with the WWR station data that DL added for 1991-2000, which aprently was prone to scaling issues. Wrote stnx10.for to scale a file of WWR Bangladesh records, then manually C&P'd the decade over the erroneous ones in the database. Also fixed country name from 'BNGLADESH'!

Case closed.

No comments: