Re: There's No Money in Linked Data from Aidan Hogan on 2013-05-21 (public-lod@w3.org from May 2013)

From: Aidan Hogan <aidan.hogan@deri.org>
Date: Tue, 21 May 2013 20:14:16 +0100
To: public-lod@w3.org
Message-ID: <519BC788.1030107@deri.org>
<snip>
On 18/05/2013 09:58, Leigh Dodds wrote:
> You don't say in your paper how you did the analysis. Did you use the
> metadata from the LOD group in datahub? At the time I had to do
> mine manually, but it wouldn't be hard to automate some of this now,
> perhaps to create an regularly updated set of indicators.
>
> One criteria that agents might apply when conducting "Follow Your
> Nose" consumption of Linked Data is the licensing of the target data,
> e.g. ignore links to datasets that are not licensed for your
> particular usage.

On a similar note, we also did a survey of some licensing issues in and 
around Linked Data as part of a larger contribution looking at how 
closely publishers of RDF follow various tips from the (now superseded 
but still relevant) "How to Publish Linked Data on the Web" guide [1].

Our analysis is published/available at [2,3]. For the paper, we looked 
at ~4 million RDF/XML documents crawled in May 2011, divided the data by 
pay-level domain and looked at how well each domain followed the key 
guidelines in [1] with the goal of seeing how well specific guidelines 
are followed, and looking to comparatively rank the conformance of 
publishers using objective measures. We ended up looking at 188 domains 
that offered more than 1,000 quads.

Long story shortish, for one of the guidelines we looked specifically at 
licensing information for documents embedded in the documents themselves 
[p29,2]. This was tricky: we found a bunch of licensing properties in 
use [Table 19,2]. Considering as many of these properties as we could 
identify, we found that only 15% of the domains provided licensing 
information embedded in *at least one* local document. Averaging equally 
across the domains (which had different numbers of documents), about 3% 
of documents contained observable licensing information about themselves.

On the plus side, there was some use of the creative-commons vocabulary:

	http://creativecommons.org/ns

... though I think dct:rights/dct:license are more actively promoted.



Versus registering the licensing information on the DataHub or so forth 
(which AFAIK no longer supports a public SPARQL endpoint), it would be 
much better for (SemWeb) consumers if publishers directly embed 
licensing meta-data in the individual RDF documents themselves. There 
are already established vocabularies and (at least CC) license URIs in 
place for this.


Cheers/fwiw,
Aidan




[1] http://wifo5-03.informatik.uni-mannheim.de/bizer/pub/LinkedDataTutorial/
[2] http://sw.deri.org/~aidanh/docs/ldstudy12.pdf
[3] Aidan Hogan, Jürgen Umbrich, Andreas Harth, Richard Cyganiak, Axel 
Polleres and Stefan Decker. "An empirical survey of Linked Data 
conformance ". In the Journal of Web Semantics 14: pp. 14–44, 2012.
Received on Tuesday, 21 May 2013 19:14:45 UTC