Re: UK Govt RDF Data Sets from Kingsley Idehen on 2010-04-25 (public-lod@w3.org from April 2010)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Sun, 25 Apr 2010 14:02:43 -0400
To: Jeni Tennison <jeni@jenitennison.com>
CC: public-lod community <public-lod@w3.org>
Message-ID: <4BD483C3.2040905@openlinksw.com>

Jeni Tennison wrote:
> Kingsley,
>
> On 15 Apr 2010, at 23:19, Kingsley Idehen wrote:
>> Do you have any idea as to the whereabouts of RDF data sets for the 
>> SPARQL endpoints associated with data.gov.uk? As you can imagine, I 
>> haven't opted to crawl your endpoints for the data bearing in LOD 
>> community ethos i.e.,  publish dataset dump locations for SPARQL 
>> endpoints that host Linked Open Data. This best practice was devised 
>> SPARQL endpoint crawling in mind.
>
>
> We do absolutely recognise this requirement. The basic model that we 
> have in mind is that the ask of individual departments or other public 
> bodies is to publish their data as RDF and either provide dumps of 
> that data or enable people to crawl it. We would also expect them to 
> provide feeds to notify the users of that data of updates to their 
> datasets, possibly through a PubSubHubbub hub, though we haven't 
> worked out the details of what that looks like yet and we'd very much 
> value input on that as we begin to firm that up.
>
> We would then expect many curated triplestores to load in that data; 
> some of these will provide SPARQL endpoints from data.gov.uk, some 
> from other .gov.uk domains, and some from third parties.
>
> As I think Ian said, we are now in a purdah period during which it's 
> difficult for us to release anything new, but you can expect progress 
> during the second week of May and on into the future.
>
> Cheers,
>
> Jeni
Jeni,

One thing I haven't been able to reconcile (in my head repeatedly) re. 
the above.

If data provenance is the key concern behind the RDF dump releases, 
doesn't the same issue apply to CONSTRUCTs or DESCRIBE style crawls 
against the published endpoints? Basically, the very pattern exhibited 
by some user agents that hit the DBpedia endpoint (as per the "DBpedia 
Endpoint Burden" post).

What makes a SPARQL endpoint safer than an RDF dump in this regard?

-- 

Regards,

Kingsley Idehen	      
President & CEO 
OpenLink Software     
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Received on Sunday, 25 April 2010 18:03:14 UTC