W3C home > Mailing lists > Public > public-egov-ig@w3.org > August 2010

Re: [uk-government-data-developers] Data Dumps at source.data.gov.uk

From: Chris Beer <chris@e-beer.net.au>
Date: Tue, 24 Aug 2010 23:42:32 +1000
Message-Id: <BF8542AB-14E4-4E7E-9C07-8A8DE4764AA3@e-beer.net.au>
Cc: public-egov-ig <public-egov-ig@w3.org>
To: Ed Summers <ehs@pobox.com>
I wish we could have nice things over here too...

(Seriously - thanks Ed and thanks Leigh - great links, good info :) )

Chris Beer (iPhone)

On 24/08/2010, at 23:21, Ed Summers <ehs@pobox.com> wrote:

> This post by Leigh Dodds to the uk-government-data-developers list
> about source.data.gov.uk should be of potential interest. It's great
> to see a methodical approach to making data dumps for data.gov.uk
> available.
> 
> //Ed
> 
> ---------- Forwarded message ----------
> From: Leigh Dodds <leigh.dodds@talis.com>
> Date: Tue, Aug 24, 2010 at 8:52 AM
> Subject: [uk-government-data-developers] Data Dumps at source.data.gov.uk
> To: uk-government-data-developers
> <uk-government-data-developers@googlegroups.com>
> 
> Hi,
> 
> I've just put together an initial set of data dumps for the majority
> of the Linked Data currently being published by data.gov.uk. More
> information on what's not included and why in a moment.
> 
> (Disclaimer: what follows is my understanding of the current state of
> play, so any errors/omissions then blame me :)
> 
> 
> THE REPOSITORY
> 
> There is a server at http://source.data.gov.uk which has been set up
> to provide access to both data dumps and (eventually) the code used to
> generate/convert the data. The data dumps can be found at:
> 
> http://source.data.gov.uk/data/
> 
> The intention is to create a repository of versioned datasets that
> will allow anyone to mirror the data for their own use/purposes, e.g.
> to perform local analysis or to host in your own triple store. Over
> time this repository should become a complete archival copy of all of
> the Linked Data that is published through data.gov.uk, complete with
> information on the provenance of individual datasets.
> 
> The team behind data.gov.uk are still working through a number of the
> best practices, so right now I've simply put up copies of all the
> currently live datasets.
> 
> 
> HOW THE DATA IS ORGANISED
> 
> The web archive is organised into a series of sub-directories:
> 
> * Sector — top-level sector. E.g. as used in *.data.gov.uk
> * Dataset — dataset directory, a short identifier for the dataset.
> I've made some of these up at present
> * Date-stamped directory — in format of yyyy-mm-dd.
> * Data files — This may be an number of data files in different
> formats. E.g the data may span a number of small files, some files may
> be ntriples for loading into default graph and some files may be
> nquads.
> 
> For example, the RDF version of Edubase currently available from
> http://education.data.gov.uk can be found here:
> 
> http://source.data.gov.uk/data/education/edubase/2009-08-14/
> 
> with the general pattern being:
> 
> http://source.data.gov.uk/data/[sector]/[dataset]/[timestamp]/
> 
> Currently only the latest versions of each dataset are being loaded
> into the live SPARQL endpoints, but over time there will be a move
> towards using named graphs for versioning (as described at [1]).
> 
> 
> LINKED DATA, DATA DUMPS & SERVICES
> 
> The sector identifier ties together the Linked Data, the data dumps,
> and the SPARQL endpoints and other services. For example if you're
> looking at some Linked Data, e.g.:
> 
> http://education.data.gov.uk/id/school/100866
> 
> Then this data will be included in the SPARQL endpoint at:
> 
> http://services.data.gov.uk/education/sparql
> 
> The search interface at:
> 
> http://services.data.gov.uk/education/search
> 
> And the raw data can be found in one (or more) of the datasets accessible from:
> 
> http://source.data.gov.uk/data/education/
> 
> 
> WHAT IS NOT INCLUDED?
> 
> As I explained at that start of this email, not all of the Linked Data
> being published from data.gov.uk, or the UK government is currently
> represented in these data dumps.
> 
> The RDF available from the legislation.gov.uk is currently only
> available as Linked Data because its surfaced directly from the
> website. Ditto, that published from the London Gazette website as
> RDFa. It would be possible to regularly crawl and dump those sources,
> but I'm not sure if there are plans to do that yet. Other departments
> and projects may also surface their own data and data dumps.
> 
> The other dataset that is not represented in the dump are the
> date-time URIs available from reference.data.gov.uk, e.g. [2]. as
> these are all algorithmically generated. I don't recommend anyone
> crawls those :)
> 
> Any questions then please ask.
> 
> Cheers,
> 
> L.
> 
> [1]. http://www.jenitennison.com/blog/node/141
> [2]. http://reference.data.gov.uk/id/day/2010-09-24
> 
> --
> Leigh Dodds
> Programme Manager, Talis Platform
> Talis
> leigh.dodds@talis.com
> http://www.talis.com
> 
Received on Tuesday, 24 August 2010 13:42:32 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 24 August 2010 13:42:33 GMT