W3C home > Mailing lists > Public > public-egov-ig@w3.org > August 2010

Re: [uk-government-data-developers] Data Dumps at source.data.gov.uk

From: Leigh Dodds <leigh.dodds@talis.com>
Date: Tue, 24 Aug 2010 19:04:15 +0100
Message-ID: <AANLkTi=C-8omPgxyz5YzfroAh0uYsYVHQOOR4B_qH9k6@mail.gmail.com>
To: Mike Norton <xsideofparadise@yahoo.com>
Cc: Chris Beer <chris@e-beer.net.au>, Ed Summers <ehs@pobox.com>, public-egov-ig <public-egov-ig@w3.org>

Strictly speaking is the equivalent of:


Not all of data.gov.uk is available as RDF. There's no single
directory of the raw files but there are typically pointers from the
dataset directory to download locations.

What we announced today was raw data dumps for the Linked Data
published by the project.



On 24 August 2010 18:34, Mike Norton <xsideofparadise@yahoo.com> wrote:
> Yes, thanks guys:  would http://www.data.gov/catalog/raw be the
> equivalent to these Dumps here in the States?
> Michael A. Norton
> ________________________________
> From: Chris Beer <chris@e-beer.net.au>
> To: Ed Summers <ehs@pobox.com>
> Cc: public-egov-ig <public-egov-ig@w3.org>
> Sent: Tue, August 24, 2010 6:42:32 AM
> Subject: Re: [uk-government-data-developers] Data Dumps at
> source.data.gov.uk
> I wish we could have nice things over here too...
> (Seriously - thanks Ed and thanks Leigh - great links, good info :) )
> Chris Beer (iPhone)
> On 24/08/2010, at 23:21, Ed Summers <ehs@pobox.com> wrote:
>> This post by Leigh Dodds to the uk-government-data-developers list
>> about source.data.gov.uk should be of potential interest. It's great
>> to see a methodical approach to making data dumps for data.gov.uk
>> available.
>> //Ed
>> ---------- Forwarded message ----------
>> From: Leigh Dodds <leigh.dodds@talis.com>
>> Date: Tue, Aug 24, 2010 at 8:52 AM
>> Subject: [uk-government-data-developers] Data Dumps at source.data.gov.uk
>> To: uk-government-data-developers
>> <uk-government-data-developers@googlegroups.com>
>> Hi,
>> I've just put together an initial set of data dumps for the majority
>> of the Linked Data currently being published by data.gov.uk. More
>> information on what's not included and why in a moment.
>> (Disclaimer: what follows is my understanding of the current state of
>> play, so any errors/omissions then blame me :)
>> There is a server at http://source.data.gov.uk which has been set up
>> to provide access to both data dumps and (eventually) the code used to
>> generate/convert the data. The data dumps can be found at:
>> http://source.data.gov.uk/data/
>> The intention is to create a repository of versioned datasets that
>> will allow anyone to mirror the data for their own use/purposes, e.g.
>> to perform local analysis or to host in your own triple store. Over
>> time this repository should become a complete archival copy of all of
>> the Linked Data that is published through data.gov.uk, complete with
>> information on the provenance of individual datasets.
>> The team behind data.gov.uk are still working through a number of the
>> best practices, so right now I've simply put up copies of all the
>> currently live datasets.
>> The web archive is organised into a series of sub-directories:
>> * Sector — top-level sector. E.g. as used in *.data.gov.uk
>> * Dataset — dataset directory, a short identifier for the dataset.
>> I've made some of these up at present
>> * Date-stamped directory — in format of yyyy-mm-dd.
>> * Data files — This may be an number of data files in different
>> formats. E.g the data may span a number of small files, some files may
>> be ntriples for loading into default graph and some files may be
>> nquads.
>> For example, the RDF version of Edubase currently available from
>> http://education.data.gov.uk can be found here:
>> http://source.data.gov.uk/data/education/edubase/2009-08-14/
>> with the general pattern being:
>> http://source.data.gov.uk/data/[sector]/[dataset]/[timestamp]/
>> Currently only the latest versions of each dataset are being loaded
>> into the live SPARQL endpoints, but over time there will be a move
>> towards using named graphs for versioning (as described at [1]).
>> The sector identifier ties together the Linked Data, the data dumps,
>> and the SPARQL endpoints and other services. For example if you're
>> looking at some Linked Data, e.g.:
>> http://education.data.gov.uk/id/school/100866
>> Then this data will be included in the SPARQL endpoint at:
>> http://services.data.gov.uk/education/sparql
>> The search interface at:
>> http://services.data.gov.uk/education/search
>> And the raw data can be found in one (or more) of the datasets accessible
>> from:
>> http://source.data.gov.uk/data/education/
>> As I explained at that start of this email, not all of the Linked Data
>> being published from data.gov.uk, or the UK government is currently
>> represented in these data dumps.
>> The RDF available from the legislation.gov.uk is currently only
>> available as Linked Data because its surfaced directly from the
>> website. Ditto, that published from the London Gazette website as
>> RDFa. It would be possible to regularly crawl and dump those sources,
>> but I'm not sure if there are plans to do that yet. Other departments
>> and projects may also surface their own data and data dumps.
>> The other dataset that is not represented in the dump are the
>> date-time URIs available from reference.data.gov.uk, e.g. [2]. as
>> these are all algorithmically generated. I don't recommend anyone
>> crawls those :)
>> Any questions then please ask.
>> Cheers,
>> L.
>> [1]. http://www.jenitennison.com/blog/node/141
>> [2]. http://reference.data.gov.uk/id/day/2010-09-24
>> --
>> Leigh Dodds
>> Programme Manager, Talis Platform
>> Talis
>> leigh.dodds@talis.com
>> http://www.talis.com
> ________________________________
> Please consider the environment before printing this email.
> Find out more about Talis at http://www.talis.com/
> shared innovation™
> Any views or personal opinions expressed within this email may not be those
> of Talis Information Ltd or its employees. The content of this email message
> and any files that may be attached are confidential, and for the usage of
> the intended recipient only. If you are not the intended recipient, then
> please return this message to the sender and delete it. Any use of this
> e-mail by an unauthorised recipient is prohibited.
> Talis Information Ltd is a member of the Talis Group of companies and is
> registered in England No 3638278 with its registered office at Knights
> Court, Solihull Parkway, Birmingham Business Park, B37 7YB.

Leigh Dodds
Programme Manager, Talis Platform
Received on Tuesday, 24 August 2010 18:04:49 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:00:44 UTC