W3C home > Mailing lists > Public > public-lod@w3.org > November 2008

Re: Size matters -- How big is the danged thing

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Sat, 22 Nov 2008 12:09:30 -0500
Message-ID: <49283CCA.2050905@openlinksw.com>
CC: public-lod@w3.org

David Wood wrote:
> On Nov 22, 2008, at 11:11 AM, Richard Cyganiak wrote:
>> On 21 Nov 2008, at 22:30, Yves Raimond wrote:
>>> On Fri, Nov 21, 2008 at 8:08 PM, Giovanni Tummarello
>>> <giovanni.tummarello@deri.org <mailto:giovanni.tummarello@deri.org>> 
>>> wrote:
>>>> IMO considering myspace 12 billion triples as part of LOD, is quite a
>>>> stretch (same with other wrappers) unless they are provided by the
>>>> entity itself (E.g. i WOULD count in livejournal foaf file on the
>>>> other hand, ok they're not linked but they're not less useful than the
>>>> myspace wrapper are they? (in fact they are linked quite well if you
>>>> use the google social API)
>>> Actually, I don't think I can agree with that. Whether we want it or
>>> not, most of the data we publish (all of it, apart from specific cases
>>> e.g. review) is provided by wrappers of some sort, e.g. Virtuoso, D2R,
>>> P2R, web services wrapper etc. Hence, it makes not sense trying to
>>> distinguish datasets on the basis they're published through a
>>> "wrapper" or not.
>>> Within LOD, we only segregate datasets for inclusion in the diagram on
>>> the basis they are published according to linked data principles. The
>>> stats I sent reflect just that: some stats about the datasets
>>> currently in the diagram.
>>> The origin of the data shouldn't matter. The fact that it is published
>>> according to linked data principles and linked to at least one dataset
>>> in the cloud should matter.
>> I think this view is too simplistic.
>> I think what Giovanni and others mean when they try to distinguish 
>> “wrappers” from other kinds of LOD sites is not about the 
>> implementation technology. It's not about wether the data comes from 
>> a triple store or RDBMS or flat files or REST APIs or whatever.
>> It's about licenses and rights.
>> If I wrap an information service provided by a third party into a 
>> linked data interface, then I should better watch out that the terms 
>> of service permit this, and that no copyright laws are violated.
>> There are some sites in the LOD cloud that, as far as I can tell, 
>> violate the TOS of the originating service. The MySpace wrapper and 
>> the RDF Book Mashup are maybe the clearest examples. Others are in 
>> the grey area.
>> This is always an issue when party A wraps a service provided by 
>> party B. I think it's reasonable to treat all these datasets with 
>> extra caution, unless A has provided a clear argument and 
>> documentation to the effect that B'a license permits this kind of 
>> service.
> Richard has an excellent point here.  This type of data separation is 
> one I could support.
> Jim's question can then be recast as something like, "How big is the 
> LOD cloud excluding wrappers of questionable copyright status?"
> This view also suggests a community-building step:  Someone with moral 
> authority (or something that passes for it) may wish to approach 
> MySpace, etc, and get their permission to either expose their data or 
> (preferably) show them ways to do it themselves.
> Regards,
> Dave

"How big is the LOD cloud excluding wrappers of questionable copyright 
status?" vs LOD warehouses re. this questionable statistical endeavor ?

LOD warehouses have a clear set of characteristics:

1. Static (due to periodic Extract and Load aspect of RDF production)
2. Presumed to be less questionable by some re. license terms

Dynamically generated Linked Data via wrappers also have their 

1. Dynamic (RDF generated "on the fly")
2. Presume to be questionable by some re. license terms

Is the initial dichotomy I espoused still false in reality?

I wonder :-)



Kingsley Idehen	      Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software     Web: http://www.openlinksw.com
Received on Saturday, 22 November 2008 17:10:08 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:20:43 UTC