Re: Why is it bad practice to consume Linked Data and publish opaque HTML pages? from Hugh Glaser on 2013-04-01 (public-lod@w3.org from April 2013)

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Mon, 1 Apr 2013 11:51:42 +0000
To: "public-lod@w3.org" <public-lod@w3.org>
CC: Kingsley Idehen <kidehen@openlinksw.com>
Message-ID: <387E72E216DF1247A2F8ED4819C93BA74EB13276@UOS-MSG00041-SI.soton.ac.uk>
These aims are laudable, and are a good objective when possible.
And I note, Kingsley, that your post talks about "republish the extracted content", and I roughly agree with you. 

But the wider discussion seems to me to have a very simplistic, if not naive, view of how LOD is used in practice (well, at least compared to the way I use it :-) ).
A typical page of something like http://apps.seme4.com/see-uk/ (sorry, hardware fault at the moment) or http://www.dotac.info/explorer/ uses many hundreds, or even thousands of RDF documents from hundreds of domains retrieved via URIs.
The contribution of some documents may be as little as lending weight to an inference that was calculated several years ago, and the document may have long been discarded, and not re-cached.
Or, of course, it may be an easily identifiable "fact" in the presentation.
The best I can do is point overall at the domains where we got data (http://www.rkbexplorer.com/data/), in the spirit of attribution.
A *requirement* to attribute each URI in a system that goes out and gets stuff from the LOD Cloud like that simply means that I have to ignore that entire data source, because I can't realistically satisfy it.
Actually, maybe I could - an enormous list of every URI we have ever resolved - but somehow I don't think a page with hundreds of millions of URIs on it is very helpful.
Of course, I could do quite a lot of implementation work to try to track it, but that would have serious computing, storage and communication costs - such provenance data for an rkbexplorer network panel might well have than an order of magnitude more URIs than the panel itself, plus the descriptive overheads (and the receiver would not be very happy with perhaps 50K for 1K of substantive data).
Actually, in many cases, at the moment, really doing it properly would not be possible, as the RDF data does not in fact have a licence, even if  the web "site" does.
Again, this is because people seem to have a simplistic view of how LOD data is consumed.
Remember, it is agents that are doing the retrieval, and that eyeballs never get to see the "site", if there is such a thing.
Even Jeff's "special cases" clause makes me nervous - the best I can manage in reality is to have a link to the main site.
(By the way Jeff, in answer to your question of what you might do, you could add licence information to the RDF you return.)
In practice I try to ensure I block sites that require attribution - if I can't comply with the spirit, never mind the letter, of the publisher's requirements, then I prefer to leave it out.

So, if a site *requires* attribution, some really interesting sites that really use the power of Linked Data won't use the data - is that what the publisher wanted when they published it?

I do like Chris Gutteridge's data.southampton.ac.uk - please attribute of you can, but if you really, really can't, then still feel free to use my beautiful data.

Good discussion.
Hugh

On 30 Mar 2013, at 14:35, Kingsley Idehen <kidehen@openlinksw.com> wrote:

> All,
> 
> " Citing sources is useful for many reasons: (a) it shows that it isn't a half-baked idea I just pulled out of thin air, (b) it provides a reference for anybody who wants to dig into the subject, and (c) it shows where the ideas originated and how they're likely to evolve." -- John F. Sowa [1].
> 
> An HTTP URI is an extremely powerful citation and attribution mechanism. Incorporate Linked Data principles and the power increases exponentially.
> 
> It is okay to consume Linked Data from wherever and publish HTML documents based on source data modulo discoverable original sources Linked Data URIs.
> 
> It isn't okay, to consume publicly available Linked Data from sources such as the LOD cloud and then republish the extracted content using HTML documents, where the original source Linked Data URIs aren't undiscoverable by humans or machines.
> 
> The academic community has always had a very strong regard for citations and source references. Thus, there's no reason why the utility of Linked Data URIs shouldn't be used to reinforce this best-practice, at Web-scale .
> 
> Links:
> 
> 1. http://ontolog.cim3.net/forum/ontolog-forum/2013-03/msg00084.html -- ontolog list post .
> 
> -- 
> 
> Regards,
> 
> Kingsley Idehen	
> Founder & CEO
> OpenLink Software
> Company Web: http://www.openlinksw.com
> Personal Weblog: http://www.openlinksw.com/blog/~kidehen
> Twitter/Identi.ca handle: @kidehen
> Google+ Profile: https://plus.google.com/112399767740508618350/about
> LinkedIn Profile: http://www.linkedin.com/in/kidehen
> 
> 
> 
> 
>
Received on Monday, 1 April 2013 11:52:25 UTC