- From: Mike Norton <xsideofparadise@yahoo.com>
- Date: Tue, 24 Aug 2010 10:34:02 -0700 (PDT)
- To: Chris Beer <chris@e-beer.net.au>, Ed Summers <ehs@pobox.com>, public-egov-ig <public-egov-ig@w3.org>
- Message-ID: <969869.50138.qm@web82405.mail.mud.yahoo.com>
Yes, thanks guys: would http://www.data.gov/catalog/raw be the equivalent to these Dumps here in the States? Michael A. Norton ________________________________ From: Chris Beer <chris@e-beer.net.au> To: Ed Summers <ehs@pobox.com> Cc: public-egov-ig <public-egov-ig@w3.org> Sent: Tue, August 24, 2010 6:42:32 AM Subject: Re: [uk-government-data-developers] Data Dumps at source.data.gov.uk I wish we could have nice things over here too... (Seriously - thanks Ed and thanks Leigh - great links, good info :) ) Chris Beer (iPhone) On 24/08/2010, at 23:21, Ed Summers <ehs@pobox.com> wrote: > This post by Leigh Dodds to the uk-government-data-developers list > about source.data.gov.uk should be of potential interest. It's great > to see a methodical approach to making data dumps for data.gov.uk > available. > > //Ed > > ---------- Forwarded message ---------- > From: Leigh Dodds <leigh.dodds@talis.com> > Date: Tue, Aug 24, 2010 at 8:52 AM > Subject: [uk-government-data-developers] Data Dumps at source.data.gov.uk > To: uk-government-data-developers > <uk-government-data-developers@googlegroups.com> > > Hi, > > I've just put together an initial set of data dumps for the majority > of the Linked Data currently being published by data.gov.uk. More > information on what's not included and why in a moment. > > (Disclaimer: what follows is my understanding of the current state of > play, so any errors/omissions then blame me :) > > > THE REPOSITORY > > There is a server at http://source.data.gov.uk which has been set up > to provide access to both data dumps and (eventually) the code used to > generate/convert the data. The data dumps can be found at: > > http://source.data.gov.uk/data/ > > The intention is to create a repository of versioned datasets that > will allow anyone to mirror the data for their own use/purposes, e.g. > to perform local analysis or to host in your own triple store. Over > time this repository should become a complete archival copy of all of > the Linked Data that is published through data.gov.uk, complete with > information on the provenance of individual datasets. > > The team behind data.gov.uk are still working through a number of the > best practices, so right now I've simply put up copies of all the > currently live datasets. > > > HOW THE DATA IS ORGANISED > > The web archive is organised into a series of sub-directories: > > * Sector — top-level sector. E.g. as used in *.data.gov.uk > * Dataset — dataset directory, a short identifier for the dataset. > I've made some of these up at present > * Date-stamped directory — in format of yyyy-mm-dd. > * Data files — This may be an number of data files in different > formats. E.g the data may span a number of small files, some files may > be ntriples for loading into default graph and some files may be > nquads. > > For example, the RDF version of Edubase currently available from > http://education.data.gov.uk can be found here: > > http://source.data.gov.uk/data/education/edubase/2009-08-14/ > > with the general pattern being: > > http://source.data.gov.uk/data/[sector]/[dataset]/[timestamp]/ > > Currently only the latest versions of each dataset are being loaded > into the live SPARQL endpoints, but over time there will be a move > towards using named graphs for versioning (as described at [1]). > > > LINKED DATA, DATA DUMPS & SERVICES > > The sector identifier ties together the Linked Data, the data dumps, > and the SPARQL endpoints and other services. For example if you're > looking at some Linked Data, e.g.: > > http://education.data.gov.uk/id/school/100866 > > Then this data will be included in the SPARQL endpoint at: > > http://services.data.gov.uk/education/sparql > > The search interface at: > > http://services.data.gov.uk/education/search > > And the raw data can be found in one (or more) of the datasets accessible from: > > http://source.data.gov.uk/data/education/ > > > WHAT IS NOT INCLUDED? > > As I explained at that start of this email, not all of the Linked Data > being published from data.gov.uk, or the UK government is currently > represented in these data dumps. > > The RDF available from the legislation.gov.uk is currently only > available as Linked Data because its surfaced directly from the > website. Ditto, that published from the London Gazette website as > RDFa. It would be possible to regularly crawl and dump those sources, > but I'm not sure if there are plans to do that yet. Other departments > and projects may also surface their own data and data dumps. > > The other dataset that is not represented in the dump are the > date-time URIs available from reference.data.gov.uk, e.g. [2]. as > these are all algorithmically generated. I don't recommend anyone > crawls those :) > > Any questions then please ask. > > Cheers, > > L. > > [1]. http://www.jenitennison.com/blog/node/141 > [2]. http://reference.data.gov.uk/id/day/2010-09-24 > > -- > Leigh Dodds > Programme Manager, Talis Platform > Talis > leigh.dodds@talis.com > http://www.talis.com >
Received on Tuesday, 24 August 2010 17:34:39 UTC