Coins data now published: Please help us analyse it
This morning, the Treasury has released a tranche of data from its Coins (Combined Online Information System) database, which contains details of public spending across Whitehall.
This represents a reversal of the policy adopted by the Labour government, which in the past few months turned down freedom-of-information requests from myself and others for access to this large dataset.
The Coins database became a symbolic target for open-government campaigners, and as shadow chancellor before the election George Osborne promised to release it as part of the Conservatives' transparency agenda.
Labour Treasury ministers had maintained that it would not be in the public interest to disclose this information. They argued [1.64Mb PDF] that potential misinterpretation of 23 million lines of raw and unvalidated data and a high volume of follow-up enquiries would be too disruptive to the Treasury's work.
Another factor Labour quoted was the "impenetrability of the information to a lay user". The new coalition government has tried to combat this by giving special advance access to the data to the Open Knowledge Foundation which runs the Where Does My Money Go? website, so they can help present the data in more easily intelligible formats.
There's a lot of information there (and some background here), and it's not easy to work out its significance. So please explore it yourself; my colleagues and I would be very grateful to know if you spot something you think is interesting or come up with good ways to analyse it. Please use the form at the bottom of this page. Thanks.
Update 1045: I wrote above that the Treasury has released a lot of data and that it's initially hard to assess its significance; it turns out that both are serious understatements.
The Open Knowledge Foundation has compiled a very useful browsable and searchable version, with its own "user's guide" to Coins.
Update 1130: This morning, I have also received a freedom-of-information response [570Kb PDF] from the Treasury which makes clear there are limits to the coalition government's open-data policy.
The Treasury is refusing to provide access to the current contents of the Coins database. The information it has provided today is historical data for 2008/9 and 2009/10, although the figures for the latter year are not yet complete. It is planning to issue previous years' data back to 2005/6 within the next fortnight.
However, the Treasury has told me today that it will not release data for current and future years, because this relates to the formulation of government policy, and some of it - for example, that relating to government trading funds - is also commercially sensitive. It argues that the material is exempt under the Freedom of Information Act because the balance of the public interest is against publication. It says that Coins data for 2010/11 will not be issued until June next year.
Last year George Osborne attacked Gordon Brown for not giving him access to Coins data. He was clearly referring to current data rather than historical information. In an interview with the BBC's political editor Nick Robinson, he said: "We will publish all this information, we will make it available to future oppositions".
A 
~RS~q~RS~~RS~z~RS~45~RS~)
Comments
Sign in or register to comment.
Just because it's massive and difficult is no reason for it not to be published so nice to see that the promise has been met :0
What I find interesting is the fact that they are being clever enough to use torrent files to share this data.
Complain about this comment
Why does the Treasury use the very uncommon terms of MiB and GiB for the file sizes? Just about no-one uses these terms. Everyone uses the common and defacto terms of MB and GB.
For info: 1GiB ≈ 1.074GB
Complain about this comment
"[Labour ministers] argued [1.64Mb PDF] that potential misinterpretation of 23 million lines of raw and unvalidated data and a high volume of follow-up enquiries would be too disruptive to the Treasury's work."
If this argument abour the "impenetrability of the information to a lay user" and consequent "misinterpretations" of data is taken to its logical conclusion, findings of scientific research should not generally be available to MPs because most MPs are not scientists and therefore "lay users" who might misinterpret the data. I hope that even those Labour ministers would be able to spot the flaws in their argument.
But of course, that's not the point. It's information about how public money is spent, itself collected using public funds, and therefore the public is entitled to see it. End of argument. Governments could easily avoid misinterpretations of the data by ensuring that their own analyses of the data stand up to independent scrutiny. If they don't, then any disruption is fully deserved. And surely politicians, more than anyone, should realise that withholding of data always looks suspicious. (They should have been especially aware of this after the recent UEA e-mail theft, where just the perception that climate scientists had attempted to withhold raw data caused serious damage to their public reputation, even though the vast majority of the data was already freely available online, and the unreleased data was mostly not theirs to release because it was licensed from third-parties such as national meteorological organisations which wanted to make some profit out of that data.)
Oh well, it's a bit academic now, since Labour is no longer in government. Let's hope that this release of data is not a one-off, since the Conservatives themselves didn't seem to keen on FoI when they were last in government!
Complain about this comment
@ #2 SadButMadLad
MiB and GiB etc. are designed to refer to powers of 1024 (or 2 to powers of multiples of 10), which is what historically a kilobyte and gigabyte have referred to, however this prefix is ambiguous because in SI units mega and giga refer to powers of 1000, so a megabyte can mean either 1,000,000 bytes or 1,048,576 bytes.
So the likes of MiB and GiB have been introduced to remove the ambiguity, 1 MiB unambiguously means 1,048,576 bytes, 1 GiB unambiguously means 1,073,741,824 bytes which gives you your 1.074, however your assertion that 1GiB = 1.074GB is not necessarily true for the afformentioned reason that in some contexts 1GB may in fact be 1,073,741,824, in others it may be 1,000,000,000.
As a computer scientist for some time, I don't really like the new terms either if I'm honest, I've never met anyone who ever uses the SI prefixes to mean powers of 1000 rather than powers of 1024 at least without explicitly saying so beforehand. The new terms seem to cause more confusion than the ambiguity of the old terms ever has!
Complain about this comment
Why don't you campaign for greater contract information as no one ever seems to know the details for 80%+ of the contracts they have (take a look at the various public contract registers and you'll find it fairly sparsely populated despite their obligations to make such information public!).
Complain about this comment
SadButMadLad wrote:
>Why does the Treasury use the very uncommon terms of MiB and GiB
>for the file sizes? Just about no-one uses these terms. Everyone
>uses the common and defacto terms of MB and GB.
>
>For info: 1GiB ≈ 1.074GB
MiB and MB are the same thing. MiB is prefered because "Mega" means different things for data and other things thus it conflicts with SI units.
EG:
1 MB = 1024 kilobytes
1 Mm = 1000 Kilometers
Complain about this comment
I have to say it is somewhat dismaying that as soon as I click on the treasury's links, they fail to open or inform me that "Windows cannot open" the type of file I am dealing with. I heard that these file types deviated from the norm, but I am very shocked to discover that they are wholly inaccessible...
Am I missing something or does anybody else have this problem?
Complain about this comment
@U-Man
I've been able to get the zipped files with no problem. I've not tried the BitTorrent links though.
Complain about this comment
Whilst it may be impenetrable this isn't strictly speaking raw data. The Departments and other bodies all have there own accounting systems which are mapped to a centrally defined chart of accounts and uploaded into COINs.
It's all part of the dream of producing Whole of Government accounts for UK plc.
Has anyone found lines for Lloyds, RBS, HBOS etc. in there yet?
Complain about this comment
The reason that they will limit the data is three fold.
Some spending is allocated and not yet spent, despite several warnings certain departments do not "give up" money that has yet to be spent and carry this over to the following year and only tell treasury after the fact.
Two agreed spending is not always shown in the year it was actually spent
Three certain monies are "hidden" by the current Government to pay for things that the public is not told about, but are phrased as being in the public interest. Treasury has a "slush fund" as do many Departments and this money is usually labelled up misc expenses
Complain about this comment
Oh my goodness! Talk about complexity!
This morning, the Treasury has released a trench of data, and since this morning, that's where I've been sitting: in my trench looking up at this mountain of data.
I can see why the Labour government turned down freedom-of-information requests from everyone & anyone because each request would generate so much confusion and subsequent inquiries.
The only good thing is that George Osborne promised to release it, and George Osborne has released it. As for transparency, is there somewhere that I could take a course, a ladder to climb out of my trench? There is no public transparency here. I know this; the government knows this. So who are we kidding? Without training, COINS is not transparent to the lay person.
I agree with Labour's points:
1. 23 million lines of raw and unvalidated data and a high volume of follow-up enquiries would be too disruptive to the Treasury's work. Yep, I can see that.
2. Another factor Labour quoted was the "impenetrability of the information to a lay user". Yep, I can also see that.
As far as current data being released, it would be subsequent year's data before I found what I was looking for anyway.
In addition to "Open Knbowledge Foundation", I also find this information site useful:
http://www.info4local.gov.uk/
Complain about this comment
I have successfully downloaded the PDF "crib sheet"from the main COINS as well as the four zipped data TXT files from the COINS data page, using the "regular" zipped links rather than the BitTorrent ones.
Two early points noted from process which are not clearly covered in the PDF are:
● The two larger "fact" tables are too large to be handled by FilZip [an excellent
freeware zipper & unzipper] but can be extracted with the free evaluation version
of WinZip.
● The text files themselves are in Unicode, which makes them twice the size they
probably need to be, since I find it hard to believe that HM Treasury have needed
non-Latin character sets since the asian colonies were given their independence.
All the files can be opened - eventually - in Notepad or Word but I have not been patient enough to see them open in MSIE8. Knowing they're in Unicode should make life much easier if wanting to import them into a database.
Complain about this comment
@SadButMadLad: It is most likely a legacy system. The UK government have tried twice now to modernise the NHS computer system and failed both times, wasting millions of pounds.
The problem is that there are probably hundreds of computers spread out througout the country, probably with specialised software accessing this database. It is not that easy to just change stuff :P
Complain about this comment
I can't download most of that stuff on the website because it is Bit Torrent... as one Government official said before the DEA "torrent is clearly only used to obtain illegal material".
Yet now they see the advantage of Bit Torrent. Hmmm.
Yes, sorry this was off-topic.
Complain about this comment
Talking about data overload.
It seems the former Labour administration were guilty of gross obfuscation as well as a desire to prevent freedom of information requests.
As John Redwood has pointed out in his Blog, the Red Book used to be 47 pages deep detailing the profit and loss of government spending within Whitehall Departments. Under Labour the Red Book has grown to become in excess of 2000 pages with much party political point scoring embedded within it.
Hopefully George Osbourne and the Coalition will also decrease this assault upon the senses and achieve declaration of reality rather than excuses and coverup for overspend or wasteful spending by the government of the day.
Complain about this comment
Rosslyn Analytics, a London-based technology company that specializes in enabling organisations to quickly and easily obtain spend visibility, has launched a dedicated portal that gives the general public the ability to view the UK government’s recently published public sector data from COINS. This portal can be found at https://rapidgateway.rapidintel.com.
Complain about this comment
This data really Isn't difficult to interpret AT ALL. and despite the full files running to several gigabytes the actual information contained within them runs to only a few tens of megabytes which is well accesable to anyone with a handle on excel. which is very high level and easy to understand. While the Guardian has done a reasonable job. I decided to setup a website to allow people to dowload and explore this data in a better format at www.publicspendingdata.co.uk anyone should be able to analyse the easy to download CSV/html files. no sign up needed. no ads. just data.
Complain about this comment
View these comments in RSS