RE: Terminology Question concerning Web Architecture and Linked Data from Booth, David (HP Software - Boston) on 2007-08-03 (www-tag@w3.org from August 2007)

From: Booth, David (HP Software - Boston) <dbooth@hp.com>
Date: Fri, 3 Aug 2007 01:03:12 -0400
To: "Chris Bizer" <chris@bizer.de>, "Alan Ruttenberg" <alanruttenberg@gmail.com>
Cc: "SW-forum Web" <semantic-web@w3.org>, "Linking Open Data" <linking-open-data@simile.mit.edu>, "Jonathan A Rees" <jar@mumble.net>, <www-tag@w3.org>
Message-ID: <EBBD956B8A9002479B0C9CE9FE14A6C202FE8554@tayexc19.americas.cpqcorp.net>
Chris,

Your main point below seems to be that if different parties use
different URIs for resources that are owl:sameAs each other, the
different URIs make it easier to track provenance.  To do this, software
would have to differentiate between URIs that are declared owl:sameAs --
in essence making them not quite owl:sameAs. :)   This seems like a
slight extension to the RDF Semantics (
http://www.w3.org/TR/rdf-mt/#gddenot ), because the RDF Sematics says
that a URI only denotes a resource, but maybe it's a reasonable
approach.   I don't know how it compares with other mechanisms for
tracking provenance, such as named graphs.

My thoughts:

1. The main benefit in sharing a common set of URIs is *not* that it
avoids the use of owl:sameAs, but that it ensures that users are
referring to the same *resource*.  I.e., it ensures that different
parties did not choose subtly different resource definitions that are
difficult to relate to each other, and thus cause difficulty in using
different data sets together.

2. For this reason, if an existing URI is good enough for a particular
application, then at least that resource should be re-used, either by
using the same URI or by minting a new URI that is declared to be
owl:sameAs the existing URI, as you suggest.

3. Nobody is suggesting that the world should (by committee?)
standardize on a common set of URIs.  However, it is almost always
beneficial to agree on a common set of concepts (with URIs) when
feasible.  This *is* generally feasible in small communities and
*should* be done when possible.  However, as a community grows, it
becomes impossible, even for organizations that are theoretically
hierarchical, such as very large companies.  Thus it is *necessary* to
allow URIs for similar concepts to be minted independently. 

4. When it is necessary to mint a new URI (because existing URIs are
inadequate), it is beneficial to gain agreement on a precise resource
definition with the largest community in which one can expect success,
to maximize the number of parties using the same concepts.  Of course,
one can give more weight to more important parties too. 

5. Committees sometimes reduce precision in order to achieve superficial
agreement.  This is *not* okay: it defeats the purpose.  In such cases
it is better to subdivide into smaller communities that yield more
precise definitions.  However, it *is* fine to precisely define broader
concepts.

Additional specific comments below.

> -----Original Message-----
> From: www-tag-request@w3.org [mailto:www-tag-request@w3.org]
> On Behalf Of Chris Bizer
> Sent: Monday, July 23, 2007 3:23 AM
>
> Hi Alan,
>
> very fruitful discussion. Thanks for challenging me on this
> point :-)
>
> > So you have two novel claims:
> >
> > 1) It is better to mint your own URI than to use one that you
> > know to identify the same resource.
> > 2) It is better to attach "different views and opinions"
> > about a known resource to a newly minted URI that you state
> > is owl:sameAs some other rather than using an alternative
> > mechanism for doing so, one of which might be the one I
> > suggested.

The most important consideration is whether the owner of the new URI
declares the new URI to be owl:sameAs the existing URI.  If so, the new
URI should not be declared (in the sense of
http://dbooth.org/2007/uri-decl/ ) with any assertions that were not a
part of the old URI declaration.  Otherwise, there will be a huge loss
if the new URI declaration differs from the existing URI declaration,
because if the two URIs name different resources then different data
sets involving them will be much harder to combine.

>
> I basically see four arguments in favour of my point:
>
> 1. Practicability: There is no commonly accepted infrastructure
> in place that allows applications to find out the single URI
> that should be used by everybody to identify a resource. 

Correct, and there never will be.  That is an important design principle
of the Web.  But that does *not* imply that a new URI should be minted
when an existing URI is good enough for the purpose at hand.

> There
> are lots of real-world object and abstract concepts that do not
> have URIs yet, so you have to mint URIs for them yourself
> anyway. 

In those cases, new URIs are justified anyway.

> Also as Christopher Brewster pointed out yesterday, all
> approaches that assumed using single identifiers have failed
> throughout history so far.

I do not know of anyone who is advocating that.  The TAG certainly is
not.

> 2. Provenance Tracking: If you mint your own URIs you can back
> them up with RDF descriptions, which makes it easy to track who
> said what on the Semantic Web, as there is only one
> authoritative information provider for each URI.

This seems to be inadequately distinguishing between assertions that are
part of a URI declaration and regular assertions about resources.  
See: http://dbooth.org/2007/uri-decl/ 

> 3. Discovery: When you know that two URIs refer to the same
> non-information resource, it is extremely easy and does not
> require any new technical infrastructure to retrieve
> information about this resource from the Web: Just dereference
> both URIs.

I don't see the benefit here.  It is similarly easy to retrieve two sets
of information of you have two URIs that point to those information
sets, even if those information sets are RDF documents that make
assertions involving the same URIs.

> 4. Information Quality: Information providers will not set
> owl:sameAs links
> to minor quality information provided by somebody else about
> the same non-information resource. Therefore setting a
> owl:sameAs link implies a quality judgement and a client can
> use these judgements to assess information quality using an
> algorithm like PageRank.
> [ . . . ]

I do not understand this point.  If A owl:sameAs B, then there is only
one resource being identified.  Quality judgement about what?  If S1 is
a set of statements involving a URI A, and S2 is a set of statements
involving URI B, then I can see that one might choose to assert S1 and
not S2.  But if A owl:sameAs B, then that would seem no different than
if S2 were expressed using A instead of B.  Are you saying that by using
different URIs there is value because it permits S1 and S2 to be
asserted *without* asserting A owl:sameAs B?



David Booth, Ph.D.
HP Software
+1 617 629 8881 office  |  dbooth@hp.com
http://www.hp.com/go/software

Opinions expressed herein are those of the author and do not represent
the official views of HP unless explicitly stated otherwise.
Received on Friday, 3 August 2007 05:04:01 UTC