- From: Dominic Oldman <doint@oldman.me.uk>
- Date: Mon, 1 Apr 2013 14:16:16 +0100 (BST)
- To: Hugh Glaser <hg@ecs.soton.ac.uk>
- Cc: "public-lod@w3.org community" <public-lod@w3.org>
- Message-ID: <1364822176.26032.YahooMailNeo@web87803.mail.ir2.yahoo.com>
Hugh, Yes, you are correct... and there are also issues when mashing together data from different sites.This is yet another reason why formal and mandatory 'URI attribution' is not workable. The original question was a sort of provocation. Therefore the encouraging words would be about using object URI's when appropriate - i.e, if you are talking about individual objects - which is likely to be a significant use. I don't think there is a good answer to this but just from a practical perspective it would be very nice if people included our object URIs and gave people the opportunity and choice to see the source.This shouldn't be a problem for many sites if they understand how to do it. Sorry about my record on Public LOD. I have had some technical problems with it and still do (my message bodies don't appear on the web site - I though I was being censored :-)). I will try to do better in the future. Dominic ________________________________ From: Hugh Glaser <hg@ecs.soton.ac.uk> To: Dominic Oldman <doint@oldman.me.uk> Cc: "public-lod@w3.org community" <public-lod@w3.org> Sent: Monday, 1 April 2013, 13:47 Subject: Re: Why is it bad practice to consume Linked Data and publish opaque HTML pages? Hi Dominic, Nice when it is the holiday weekend, so we hear from you :-) On 1 Apr 2013, at 13:19, Dominic Oldman <doint@oldman.me.uk> wrote: > > > For the specific case of the BMs endpoint would the ideal situation be that there is no formal attribution requirement (friction free) but rather some encouraging (but not mandatory) words about embedding at least the URI of the object record in a web publication. Sounds perfect to me. Looks like I was wrong about Chris Gutteridge'shttp://data.southampton.ac.uk/ license - I'm sure it used to have something like that, but now it is either OGL or nothing. I guess he got the University to formally agree OGL, which is great. > > There is no need for every URI to be included, but the inclusion of the object URI (a simple matter if you are querying the EndPouint) would provide everything that anyone would need, particularly since every object record is a graph and therefore only the main URI is needed to collect all the triples for an individual object. Let's try an example. Perhaps a little contrived, but… I might decide to produce a statistics site about objects in museums, and for the BM used your lovely data to find out about year of acquisition, size, weight, age etc., of a significant range (or even all) of your collection. Let's say I show mean and SD, for example. This doesn't really conform to the idea of having an "object URI", but clearly draws on the graph for every one of them. Best Hugh > > It would be good to have some best practice guidelines that general web site developers can reference (and we can reproduce or link to on our sites) when querying triplestores. > > Dominic > > From: Hugh Glaser <hg@ecs.soton.ac.uk> > To: "public-lod@w3.org" <public-lod@w3.org> > Cc: Kingsley Idehen <kidehen@openlinksw.com> > Sent: Monday, 1 April 2013, 12:51 > Subject: Re: Why is it bad practice to consume Linked Data and publish opaque HTML pages? > > These aims are laudable, and are a good objective when possible. > And I note, Kingsley, that your post talks about "republish the extracted content", and I roughly agree with you. > > But the wider discussion seems to me to have a very simplistic, if not naive, view of how LOD is used in practice (well, at least compared to the way I use it :-) ). > A typical page of something like http://apps.seme4.com/see-uk/ (sorry, hardware fault at the moment) or http://www.dotac.info/explorer/ uses many hundreds, or even thousands of RDF documents from hundreds of domains retrieved via URIs. > The contribution of some documents may be as little as lending weight to an inference that was calculated several years ago, and the document may have long been discarded, and not re-cached. > Or, of course, it may be an easily identifiable "fact" in the presentation. > The best I can do is point overall at the domains where we got data (http://www.rkbexplorer.com/data/), in the spirit of attribution. > A *requirement* to attribute each URI in a system that goes out and gets stuff from the LOD Cloud like that simply means that I have to ignore that entire data source, because I can't realistically satisfy it. > Actually, maybe I could - an enormous list of every URI we have ever resolved - but somehow I don't think a page with hundreds of millions of URIs on it is very helpful. > Of course, I could do quite a lot of implementation work to try to track it, but that would have serious computing, storage and communication costs - such provenance data for an rkbexplorer network panel might well have than an order of magnitude more URIs than the panel itself, plus the descriptive overheads (and the receiver would not be very happy with perhaps 50K for 1K of substantive data). > Actually, in many cases, at the moment, really doing it properly would not be possible, as the RDF data does not in fact have a licence, even if the web "site" does. > Again, this is because people seem to have a simplistic view of how LOD data is consumed. > Remember, it is agents that are doing the retrieval, and that eyeballs never get to see the "site", if there is such a thing. > Even Jeff's "special cases" clause makes me nervous - the best I can manage in reality is to have a link to the main site. > (By the way Jeff, in answer to your question of what you might do, you could add licence information to the RDF you return.) > In practice I try to ensure I block sites that require attribution - if I can't comply with the spirit, never mind the letter, of the publisher's requirements, then I prefer to leave it out. > > So, if a site *requires* attribution, some really interesting sites that really use the power of Linked Data won't use the data - is that what the publisher wanted when they published it? > > I do like Chris Gutteridge's data.southampton.ac.uk - please attribute of you can, but if you really, really can't, then still feel free to use my beautiful data. > > Good discussion. > Hugh > > On 30 Mar 2013, at 14:35, Kingsley Idehen <kidehen@openlinksw.com> wrote: > > > All, > > > > " Citing sources is useful for many reasons: (a) it shows that it isn't a half-baked idea I just pulled out of thin air, (b) it provides a reference for anybody who wants to dig into the subject, and (c) it shows where the ideas originated and how they're likely to evolve." -- John F. Sowa [1]. > > > > An HTTP URI is an extremely powerful citation and attribution mechanism. Incorporate Linked Data principles and the power increases exponentially. > > > > It is okay to consume Linked Data from wherever and publish HTML documents based on source data modulo discoverable original sources Linked Data URIs. > > > > It isn't okay, to consume publicly available Linked Data from sources such as the LOD cloud and then republish the extracted content using HTML documents, where the original source Linked Data URIs aren't undiscoverable by humans or machines. > > > > The academic community has always had a very strong regard for citations and source references. Thus, there's no reason why the utility of Linked Data URIs shouldn't be used to reinforce this best-practice, at Web-scale . > > > > Links: > > > > 1. http://ontolog.cim3.net/forum/ontolog-forum/2013-03/msg00084.html -- ontolog list post . > > > > -- > > > > Regards, > > > > Kingsley Idehen > > Founder & CEO > > OpenLink Software > > Company Web: http://www.openlinksw.com > > Personal Weblog: http://www.openlinksw.com/blog/~kidehen > > Twitter/Identi.ca handle: @kidehen > > Google+ Profile: https://plus.google.com/112399767740508618350/about > > LinkedIn Profile: http://www.linkedin.com/in/kidehen > > > > > > > > > > > > > >
Received on Monday, 1 April 2013 13:16:46 UTC