Re: issue-57 background reading for F2F (short required reading) from David Booth on 2012-10-08 (www-tag@w3.org from October 2012)

From: David Booth <david@dbooth.org>
Date: Sun, 07 Oct 2012 23:11:07 -0400
To: Graham Klyne <GK@ninebynine.org>
Cc: Jonathan A Rees <rees@mumble.net>, www-tag@w3.org
Message-ID: <1349665867.18784.158.camel@dbooth-laptop>
Hi Graham,

I appreciate the intuitive appeal that you've expressed, and I agree
that the RDF semantics is not the whole story.  But I think it would be
misleading to suggest that humans can somehow bypass these fundamental
laws of ambiguity.  I'll explain . . .

On Sat, 2012-10-06 at 08:14 +0100, Graham Klyne wrote: 
> David,
> 
> While I agree with most of what you say, I think it's maybe unhelpful to treat 
> the consequences of RDF's logical formalism as the whole story.  There is the 
> matter of *intended* meaning of a URI which, as you indicate, can never be 
> completely nailed down formally, does exist pragmatically and unambiguously in 
> many cases, and is (AIUI) part of the "by design ..." of the web.

By the "*intended* meaning of a URI", I assume you mean " . . .
according to the URI's owner", since otherwise different authors would
likely assume different intended meanings, thus making the URI
ambiguous.

Now to address the suggestion that the intended meaning of a URI exists
"pragmatically and unambiguously in many cases".  First of all, if by
"in many cases" you mean "in many applications", then I would completely
agree with you, because as explained in point #2 below, unambiguity is
really *relative* to the application that is consuming the RDF
containing the URI in question.  It is *not* a property of the URI
itself or its semantics.

On the other hand, if by "in many cases" you mean "for many URIs", then
I would vehemently disagree.  Perhaps the [intended] meaning of the URI
does exist in a *few* -- vanishingly few -- cases, such as purely
mathematical concepts.  But for the vast majority of URIs, the meaning
is ambiguous even to the URI owner -- regardless of whether it has been
documented anywhere outside of the URI owner's head!  To see why, one
only needs to recognize that no matter how clearly the URI owner thinks
he/she knows exactly what resource is intended, someone (or some
application) can always come along and make a finer distinction that the
URI owner never anticipated, doesn't know . . . and may not even
understand!  As always, such a distinction may be unimportant to most
applications, but may be critically important to some new application
that was unforeseen by the URI owner.

So it seems to me that we are inevitably caught between two
possibilities: either one is restricting one's attention to a particular
*application* (or class of applications), or the URI is ambiguous.

> 
> When humans are "in the loop", then we can reasonably appeal to a human notion 
> of unambiguous (e.g. http://dbpedia.org/resource/The_Lord_of_the_Rings refers to 
> the work, not the web page, or some particular copy).  

In some cases they can, but *only* because they are restricting their
attention to a particular application (or class of applications).  

This is actually an excellent example of point #2, below:
ambiguity/unambiguity is relative to the *application*.  To drive this
point home, let me instead choose a slightly different URI (because I
happen to have a ready-made punch line on hand for it).  :-)  

Consider a URI for the Lincoln Bedroom, in the White House:
http://dbpedia.org/page/Lincoln_Bedroom
Surely humans would consider this URI to unambiguously denote a famous
room in a particular building, rather than a web page.  But does that
URI *really* unambiguous denote that particular room?  What about for
applications that need to make finer distinctions than
web-page-versus-part-of-a-building?  For example, what if they need to
make statements about rooms, such as what items are in the room, etc.?
Is that URI *really* unambiguous, even to us humans?  Pat Hayes posted a
wonderful vignette in a 2002 discussion of RDF semantics:
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jun/0069.html

  Once, as an initial exercise in formalizing some 'common
  sense' I sat down with two people and we decided to make a
  list of all the things in the room. After a while, one of them
  mentioned a picture which was hanging on one wall; the other
  objected that the picture wasn't *in* the room, but was *part*
  of the room. The ensuing debate went on for an hour. What do
  you think? Is the carpet 'in' the room? (What if it is glued
  down?) Is the paint on the wall 'in' the room? (If you bring
  a can of paint into the room and use it on the walls, at what
  point does it become part of the room?) Is the door of the room
  'in' the room? (If it opens inwards, and you open it, is it
  in the room then?) Can a sound be said to be in a room? How
  about a light, or a scent? And so on....  They fought like cat
  and dog : each of them found it hard to accept that the other
  could believe such crazy stuff.  And the amusing thing is that
  both of these people had reached adulthood using the words
  'in' and  'room' without ever discovering that other people
  had such different intuitions about what they meant.

Clearly, the notion of something so simple as "the room" *is* ambiguous,
even to us intelligent humans.

> When humans are not in 
> the loop, then it doesn't really matter, as long as the logical inferences 
> provided are consistent with what people expect, and the logical formalism does 
> provide for that much.
> 
> So, while I think we mostly agree on the details, I personally think it's OK to 
> talk about *the* [intended] referent of a URI as long as we don't expect the 
> formal logic to constrain itself to that single denotation.

As a simplistic guideline, I think it is fine (and even helpful!) to say
that "by design, a URI identifies one resource" and to encourage URI
owners to avoid ambiguity (a/k/a "URI collisions"). But for anyone
attempting to seriously examine Web architecture and draft the
architectural principles needed to enable the Semantic Web, such an
assumption is hopelessly naive.  It would be analogous to assuming that
the Earth is flat when trying to draft the laws of physics.  If the TAG
is going to make progress on such deeply rooted issues as issue-57 and
httpRange-14, we *must* recognize the inherent falsity of such an
assumption.

Best wishes,
David

> 
> #g
> --
> 
> On 03/10/2012 19:54, David Booth wrote:
> > More background reading for TAG issue-57 discussion:
> >
> >   - "Framing the URI Resource Identity Problem: The Fundamental
> > Use Case of the Semantic Web":
> > http://dbooth.org/2012/fyn/Booth-fyn.pdf
> >
> >   - "Resource Identity and Semantic Extensions: Making Sense
> > of Ambiguity":
> > http://dbooth.org/2010/ambiguity/paper.html
> >
> > And some basic points that should be kept in mind in thinking
> > about TAG issue-57 (and httpRange-14):
> >
> > 1. Ambiguity is a fact of life.  In spite of the AWWW's
> > statement that "By design, a URI identifies one resource",
> > http://www.w3.org/TR/webarch/#id-resources ambiguity of
> > reference is inescapable.  This is well established in
> > philosophy, and basically boils down to the fact that when
> > descriptions are used to define things, it is always possible
> > to make finer distinctions than a description anticipated.
> >
> > 2. Ambiguity is *relative* to the application.  In spite of
> > the fact that a URI's referent is inherently ambiguous, such
> > ambiguity may or may not matter to a particular application.
> > A URI that denotes influenza but fails to distinguish between
> > different kinds of influenza may be perfectly UNambiguous
> > to an application that merely needs to distinguish between
> > viral infections and bacterial infections, whereas it will be
> > hopelessly ambiguous to an application that attempts to measure
> > the incidence of different influenza strains.  Similarly,
> > a URI that ambiguously denotes both a web page and a toucan
> > may be perfectly UNambiguous to an application that cares only
> > about different kinds of birds, or to a different application
> > that cares only about web pages, even if it is ambiguous to
> > an application that needs to distinguish between birds and
> > web pages.
> >
> > 3. The context of this issue is RDF.  This issue only matters
> > in the RDF / Semantic Web world.  Nobody else cares about the
> > "meaning" of a URI.  The Semantic Web is the use case that
> > motivates this issue.  Although in concept the Semantic Web does
> > not require RDF per se, as a practical matter RDF is the lingua
> > franca for the Semantic Web.  Furthermore, since this same
> > issue would arise in any formal/machine-processable language
> > in which URIs are used as names for things, for simplicity,
> > and without loss of generality, we can assume that the context
> > of this issue is RDF.
> >
> > 4. Because we are attempting to address the meaning of a
> > URI in the context of RDF, it is essential to understand a
> > small amount about how the RDF semantics works -- not the gory
> > details or all the mathematical formalism, but one key point.
> > This key point is that RDF semantics does not assign a unique
> > interpretation to an RDF graph or URI.  As explained in the
> > RDF Semantics specification:
> >
> >    "It is usually impossible to assert enough in any language
> >    to completely constrain the interpretations to a single
> >    possible world, so there is no such thing as 'the' unique
> >    interpretation of an RDF graph. In general, the larger an
> >    RDF graph is - the more it says about the world - then the
> >    smaller the set of interpretations that an assertion of
> >    the graph allows to be true - the fewer the ways the world
> >    could be, while making the asserted graph true of it."
> >    http://www.w3.org/TR/rdf-mt/#interp
> >
> > Thus, there is no such thing as *the* referent of a URI in an
> > RDF graph.  A URI can have *many* referents -- infinitely many.
> > The referent of a URI only becomes unique when a particular
> > interpretation of that graph is selected, and that is up to
> > the *consumer* of that RDF graph -- not the RDF semantics.
> > This is not merely a technicality that can be waved away,
> > it is the formal manifestation of point #1 above.
> >
> > 5. Interpretations correspond to applications.  RDF graphs
> > are designed to be consumed by *applications* -- not people.
> > Thus, in essence, it is an RDF application that selects an
> > interpretation of a given RDF graph: different interpretations
> > correspond to different applications.  Thus, in an RDF graph
> > a URI that identifies one resource in one application may
> > identify a *different* resource in another application if those
> > applications have different purposes.  Compare point #2 above.
> >
> >                           ----
> >
> > A consequence of the above points is that if one sets out to
> > solve TAG issue-57 (or httpRange-14) under the premise that
> > "a URI identifies one resource", then one will be heading in
> > the wrong direction, and solving it will be an exceedingly long
> > and difficult journey.  A solution might eventually be found,
> > but unless that faulty premise is corrected, it is apt to end
> > up being a solution to the wrong problem.
> >
> > Since this message is only intended to provide general
> > background material for issue-57, I will comment on Proposal27
> > in a separate message.
> >
> > Thanks!
> >
> 
> 
> 
> 

-- 
David Booth, Ph.D.
http://dbooth.org/

Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.
Received on Monday, 8 October 2012 03:11:39 UTC