W3C home > Mailing lists > Public > public-lod@w3.org > October 2010

Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices

From: Martin Hepp <martin.hepp@ebusiness-unibw.org>
Date: Wed, 20 Oct 2010 19:12:36 +0200
Cc: Anja Jentzsch <anja@anjeve.de>, semanticweb@yahoogroups.com
Message-Id: <8425B0B9-DAEB-4824-A444-C6023B3BDE0B@ebusiness-unibw.org>
To: Chris Bizer <chris@bizer.de>, Semantic Web <semantic-web@w3.org>, public-lod@w3.org
Hi Chris:

First, I think it is pretty funny that you list Denny's April's fool  
dataset of creating triples for numbers as an acceptable part of the  
cloud,

	http://ckan.net/package/linked-open-numbers


  (right next to WordNet)

The fundamental mistake of what you say is that linked open e-commerce  
data is not "a dataset" but a wealth of smaller datasets. Asking me to  
create CKAN entries for each store or business in the world that  
provides GoodRelations data is as if Google was asking any site owner  
in the world to register his or her site manually via CKAN.

That is 1990s style and does not have anything to do with a "Web" of  
data.

> 1.Data items are accessible via dereferencable URIs (provding only  
> access
> via SPARQL is not enough, as linked data browsers and search engines  
> cannot
> work with SPARQL endpoints)

Is HTML + RDFa with hash fragments, available via HTTP GET  
"dereferencable" for you? E.g.

    http://stores.bestbuy.com/10/

If yes, fine. If not - why? IMO, HTML with RDFa payload does not brake  
any fundamental principles of the Web architecture.


> 2.The dataset sets at least 50 RDF links pointing at other datasets  
> or at
> least one other dataset is setting 50 RDF links pointing at your  
> dataset.


This is often hard to meet and seems like a very artificial  
requirement to me.

First, many small datasets may be just 50 triples in total. Why should  
a hairdresser in Kentucky, exposing its description in GoodRelations +  
RDFa have 50 outbound links? What should this beauty store in CA  
exposing 800 triples do to qualify as linked data?

http://www.plushbeautybar.com/services.html

Second, what kind of links to core LOD entities do you expect from  
shop operators? For example, take

	http://semantic.eurobau.com/

That dataset contains some 30 million triples of construction- 
materials information. Which links to dbPedia would you reasonably  
expect? Is this Linked Data in your opinion or not? If not, why?

To be frank, I think the bubbles diagram fundamentally misses the  
point in the sense that the power of linked data is in integrating a  
huge amount of small, specific data sources, and not in linking a  
manually maintained blend of ca. 100 monolithic datasets.

Integrating 100 datasets does not have anything to do with Web-scale  
information integration. Note that Google estimated back in 2008 that  
there were ca. 1 trillion URIs in their index alone. So what are 100  
manually converted datasets in comparison to that?

Best

Martin

On 20.10.2010, at 08:49, Chris Bizer wrote:

> Hi Martin,
>
> we are not ignoring anything.
>
> I personally think that http://linkedopencommerce.com/ is an quite  
> exciting
> effort and would love to see more e-commerce data in the LOD cloud.
>
> We have asked the community repeatedly to provide information about  
> datasets
> that they like to be included into the LOD cloud on CKAN.
>
> You did not do this. And at this time, we also did not hear about
> http://linkedopencommerce.com/ yet.
>
> It would be great, if you would add information about your  
> dataset(s) to
> CKAN, so that we can include it into the next version of the cloud  
> diagram.
>
> Of course given that they fulfill the minimal requirements for  
> inclusion,
> which are:
>
> 1.Data items are accessible via dereferencable URIs (provding only  
> access
> via SPARQL is not enough, as linked data browsers and search engines  
> cannot
> work with SPARQL endpoints)
> 2.The dataset sets at least 50 RDF links pointing at other datasets  
> or at
> least one other dataset is setting 50 RDF links pointing at your  
> dataset.
>
> Cheers,
>
> Chris
>
> -----Ursprüngliche Nachricht-----
> Von: Martin Hepp [mailto:martin.hepp@ebusiness-unibw.org]
> Gesendet: Dienstag, 19. Oktober 2010 22:09
> An: Anja Jentzsch; Chris Bizer
> Cc: Semantic Web; semanticweb@yahoogroups.com
> Betreff: Re: ANN: LOD Cloud - Statistics and compliance with best  
> practices
>
> Hi Anja, Chris:
>
> It's kind of a joke that you ignore the 1 billion triples of
> GoodRelations data on the Web, e.g. available at
>
>   http://linkedopencommerce.com/
>
> or
>
>   http://www.ebusiness-unibw.org/wiki/
> GoodRelations#Examples_in_the_Wild
>
> Martin
>
>
> On 19.10.2010, at 17:56, Anja Jentzsch wrote:
>
>> Hi all,
>>
>> in the last weeks, we have analyzed which data sources in the new
>> version of the LOD cloud comply to various best practices that are
>> recommended by W3C or have emerged within the LOD community.
>>
>> We have checked the implementation of the following nine best
>> practices:
>>
>> 1. Provide dereferencable URIs
>> 2. Set RDF links pointing at other data sources
>> 3. Use terms from widely deployed vocabularies
>> 4. Make proprietary vocabulary terms dereferencable
>> 5. Map proprietary vocabulary terms to other vocabularies
>> 6. Provide provenance metadata
>> 7. Provide licensing metadata
>> 8. Provide data-set-level metadata
>> 9. Refer to additional access methods
>>
>> The compliance with the best practices was either checked manually
>> or by using scripts that downloaded and analyzed some data from the
>> data sources.
>> We have added the results of the evaluation in the form of tags to
>> the LOD data set catalog on CKAN [1].
>>
>> We are now happy to release the first statistics about the structure
>> of the LOD could as well as the compliance of the datasets with the
>> best practices.
>> The statistics can be found here:
>>
>> http://www4.wiwiss.fu-berlin.de/lodcloud/state/
>>
>> The document contains an initial, preliminary release of the
>> statistics. If you spot any errors in the data describing the LOD
>> data sets on CKAN, it would be great if you would correct them
>> directly on CKAN.
>>
>> For information on how to describe datasets on CKAN please refer to
>> the Guidelines for Collecting Metadata on Linked Datasets in CKAN  
>> [2].
>>
>> After your feedback and corrections, we will then move the corrected
>> version of the statistics to http://www.lod-cloud.net/ (around
>> October 24th).
>>
>> Have fun with the statistics and the encouraging as well as
>> disappointing insights that they provide.
>>
>> Cheers,
>>
>> Chris Bizer, Anja Jentzsch and Richard Cyganiak
>>
>> [1] http://www.ckan.net/group/lodcloud
>> [2]
> http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKAN
> metainformation
>>
>>
>
>



Picture_39.png
(image/png attachment: Picture_39.png)

Received on Wednesday, 20 October 2010 17:13:18 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:29 UTC