W3C home > Mailing lists > Public > public-lod@w3.org > February 2011

Re: Proposal to assess the quality of Linked Data sources

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Sun, 27 Feb 2011 14:53:25 -0500
Message-ID: <4D6AABB5.5030604@openlinksw.com>
To: Hugh Glaser <hg@ecs.soton.ac.uk>
CC: Bernard Vatant <bernard.vatant@mondeca.com>, Annika Flemming <annika.flemming@gmx.de>, "<public-lod@w3.org>" <public-lod@w3.org>, Bob Ferris <zazi@elbklang.net>
On 2/26/11 8:56 PM, Hugh Glaser wrote:
> On 26 Feb 2011, at 02:22, Kingsley Idehen wrote:
>
>>> On 25 Feb 2011, at 23:00, Kingsley Idehen wrote:
>>>
>>>> ...
> <snip>
>>>> Why not actual link coefficient from an LOD Cloud cache instance ? That a least shows what's being used :-)
>>> There is no LOD Cloud cache instance as far as I can tell.
>> Okay, you might not see it as a LOD Cloud cache. How about a massive 13B strong live instance [1] with as much Linked Data as we can get our hands on?
> Fine with me.
>> There good sampling there since you can use Entity Ranking to analyze usage.
>>> So any attempt to infer data from something that claimed to be would be misleading.
>> No in my eyes, but we can agree to disagree as we've done in the past re. this matter :-)
> I know.
> And previously you have agreed to stop calling it a "LOD Cloud cache", since it isn't.
> And then you slip back into it, and then we have one of these discussions :-)

No, I've always referred to: lod.openlinksw.com as "the LOD Cloud cache 
we maintain". And I am not going to stop. I know what its about, and I 
know why I've chosen that moniker.

> Of course, it is a great resource, both technically and in terms of maintenance, and very useful.
> A powerful service to the community.
> And might be just what Annika wants.
>
> But there are, for example, wrapper (and aleph-0 size) sets.
> So people who come to your site and expect to find the URIs from them will be seriously misled.

Well, is they SPARQL any de-referencable URI they will get a result, as 
long as Sponging is enabled :-) When someone is interested we just show 
them how, in the worst case.

If a URI is de-referencable it will get into the LOD cloud cache 
instance via one of the following routes:

1. Explicit SPARQL + Sponger crawl query
2. Crawler Job
3. Bulk Uploads
4. Syncs with URIBurner and PTSW.

This is about smart cache invalidation schemes combined with SPARQL.

We don't obscure source URIs, even if we make Proxy URIs, the graph will 
always expose original de-referencable Entity URIs (where such exist) 
and expose URLs re. most basic provenance metadata (amongst other things 
expressed in HTTP metadata graphs). We also use the Provenance Ontology 
for additional clarity about all for this.

> In this sense, it is like caching the web, with all the problems of the hidden web.
>
> As I say, great work, but please don't mislead people by suggesting that it has all the URIs in the Cloud.

Not misleading anyone. I hope you understand what we are doing with 
additional clarity based on the above :-)

Kingsley
>
>


-- 

Regards,

Kingsley Idehen	
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen
Received on Sunday, 27 February 2011 19:53:57 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:31 UTC