- From: Booth, David (HP Software - Boston) <dbooth@hp.com>
- Date: Fri, 3 Aug 2007 01:03:12 -0400
- To: "Chris Bizer" <chris@bizer.de>, "Alan Ruttenberg" <alanruttenberg@gmail.com>
- Cc: "SW-forum Web" <semantic-web@w3.org>, "Linking Open Data" <linking-open-data@simile.mit.edu>, "Jonathan A Rees" <jar@mumble.net>, <www-tag@w3.org>
Chris, Your main point below seems to be that if different parties use different URIs for resources that are owl:sameAs each other, the different URIs make it easier to track provenance. To do this, software would have to differentiate between URIs that are declared owl:sameAs -- in essence making them not quite owl:sameAs. :) This seems like a slight extension to the RDF Semantics ( http://www.w3.org/TR/rdf-mt/#gddenot ), because the RDF Sematics says that a URI only denotes a resource, but maybe it's a reasonable approach. I don't know how it compares with other mechanisms for tracking provenance, such as named graphs. My thoughts: 1. The main benefit in sharing a common set of URIs is *not* that it avoids the use of owl:sameAs, but that it ensures that users are referring to the same *resource*. I.e., it ensures that different parties did not choose subtly different resource definitions that are difficult to relate to each other, and thus cause difficulty in using different data sets together. 2. For this reason, if an existing URI is good enough for a particular application, then at least that resource should be re-used, either by using the same URI or by minting a new URI that is declared to be owl:sameAs the existing URI, as you suggest. 3. Nobody is suggesting that the world should (by committee?) standardize on a common set of URIs. However, it is almost always beneficial to agree on a common set of concepts (with URIs) when feasible. This *is* generally feasible in small communities and *should* be done when possible. However, as a community grows, it becomes impossible, even for organizations that are theoretically hierarchical, such as very large companies. Thus it is *necessary* to allow URIs for similar concepts to be minted independently. 4. When it is necessary to mint a new URI (because existing URIs are inadequate), it is beneficial to gain agreement on a precise resource definition with the largest community in which one can expect success, to maximize the number of parties using the same concepts. Of course, one can give more weight to more important parties too. 5. Committees sometimes reduce precision in order to achieve superficial agreement. This is *not* okay: it defeats the purpose. In such cases it is better to subdivide into smaller communities that yield more precise definitions. However, it *is* fine to precisely define broader concepts. Additional specific comments below. > -----Original Message----- > From: www-tag-request@w3.org [mailto:www-tag-request@w3.org] > On Behalf Of Chris Bizer > Sent: Monday, July 23, 2007 3:23 AM > > Hi Alan, > > very fruitful discussion. Thanks for challenging me on this > point :-) > > > So you have two novel claims: > > > > 1) It is better to mint your own URI than to use one that you > > know to identify the same resource. > > 2) It is better to attach "different views and opinions" > > about a known resource to a newly minted URI that you state > > is owl:sameAs some other rather than using an alternative > > mechanism for doing so, one of which might be the one I > > suggested. The most important consideration is whether the owner of the new URI declares the new URI to be owl:sameAs the existing URI. If so, the new URI should not be declared (in the sense of http://dbooth.org/2007/uri-decl/ ) with any assertions that were not a part of the old URI declaration. Otherwise, there will be a huge loss if the new URI declaration differs from the existing URI declaration, because if the two URIs name different resources then different data sets involving them will be much harder to combine. > > I basically see four arguments in favour of my point: > > 1. Practicability: There is no commonly accepted infrastructure > in place that allows applications to find out the single URI > that should be used by everybody to identify a resource. Correct, and there never will be. That is an important design principle of the Web. But that does *not* imply that a new URI should be minted when an existing URI is good enough for the purpose at hand. > There > are lots of real-world object and abstract concepts that do not > have URIs yet, so you have to mint URIs for them yourself > anyway. In those cases, new URIs are justified anyway. > Also as Christopher Brewster pointed out yesterday, all > approaches that assumed using single identifiers have failed > throughout history so far. I do not know of anyone who is advocating that. The TAG certainly is not. > 2. Provenance Tracking: If you mint your own URIs you can back > them up with RDF descriptions, which makes it easy to track who > said what on the Semantic Web, as there is only one > authoritative information provider for each URI. This seems to be inadequately distinguishing between assertions that are part of a URI declaration and regular assertions about resources. See: http://dbooth.org/2007/uri-decl/ > 3. Discovery: When you know that two URIs refer to the same > non-information resource, it is extremely easy and does not > require any new technical infrastructure to retrieve > information about this resource from the Web: Just dereference > both URIs. I don't see the benefit here. It is similarly easy to retrieve two sets of information of you have two URIs that point to those information sets, even if those information sets are RDF documents that make assertions involving the same URIs. > 4. Information Quality: Information providers will not set > owl:sameAs links > to minor quality information provided by somebody else about > the same non-information resource. Therefore setting a > owl:sameAs link implies a quality judgement and a client can > use these judgements to assess information quality using an > algorithm like PageRank. > [ . . . ] I do not understand this point. If A owl:sameAs B, then there is only one resource being identified. Quality judgement about what? If S1 is a set of statements involving a URI A, and S2 is a set of statements involving URI B, then I can see that one might choose to assert S1 and not S2. But if A owl:sameAs B, then that would seem no different than if S2 were expressed using A instead of B. Are you saying that by using different URIs there is value because it permits S1 and S2 to be asserted *without* asserting A owl:sameAs B? David Booth, Ph.D. HP Software +1 617 629 8881 office | dbooth@hp.com http://www.hp.com/go/software Opinions expressed herein are those of the author and do not represent the official views of HP unless explicitly stated otherwise.
Received on Friday, 3 August 2007 05:04:01 UTC