Ambiguity (was: issue-57 background reading for F2F (short required reading) from Graham Klyne on 2012-10-11 (www-tag@w3.org from October 2012)

From: Graham Klyne <GK@ninebynine.org>
Date: Thu, 11 Oct 2012 09:32:54 +0100
To: Pat Hayes <phayes@ihmc.us>, David Booth <david@dbooth.org>
CC: "www-tag@w3.org List" <www-tag@w3.org>
Message-ID: <50768436.80402@ninebynine.org>
It seems I stepped on a land mine here.  Silly me.

I think David is closer to what I was aiming for when he mentions [un]ambiguity 
relative to an application.  (I think this is reminiscent of Quine's description 
of language meaning being grounded in sentences that most native speakers agree 
to be true [cf. "Ontological Relativity", I forget which essay].)

Yet there is something about the debate that seems to leave the fundamental 
successes of web communication unexplained.  URIs may not denote or refer 
unambiguously, but neither are they completely free-ranging.   I sense this may 
be what Pat alludes to with  "I'm sure there is something important that this 
communication about cool URIs MEANS to say, but right now, it doesn't say that, 
whatever it is. Why not say it right, and then maybe people will take it more 
seriously?"? (Though, as I understand it, the "cool URIs" exhortations are about 
stability, not unambiguity, but I take the general point.)

I think the examples of law, etc., may be closer to what happens in the Web than 
normal day-to-day language.  Pat points out: "special words are used which have 
exactly defined meanings in special contexts".  Is the Web itself not a special 
context, in which URIs can have defined meanings?  (I'm not referring here to 
the world of human discourse that appears in web pages, but the particular 
exchanges that take place using URIs: HTTP transactions, RDF expressions, link 
relations, etc.)

So what does it take to "say it right"?  Do we need a theory of (special) 
contexts to make sense of it all?  Or something else?  It seems to me that the 
theories of logical semantics that work well enough for explaining the validity 
of inferences over web data are not helping here.

The example I picked, http://dbpedia.org/resource/The_Lord_of_the_Rings, was 
chosen not because the notion of this "work" has a clear unambiguous definition 
(it doesn't), but because people seem to have a broad common understanding of 
what is meant, and it is introduced in a way that distinguishes it from 
completely different referents that might be considered in the context of the 
Web (such as the dbpedia web page, which has a different URI).  Thus, we can say 
many things about what the URI does *not* denote (the web page, a particular 
copy of the book, etc.), and we can make statements about it that most who know 
the work would agree to be true (it's author, its subject matter, etc.).  Model 
theory provides a basis for formalizing some of this.

I'm no philosopher, and lack clarity on how to progress this discussion.  It 
seems that by focusing on and then dismissing unambiguity of reference, we are 
failing to capture a less distinct commonality of reference that does play a 
part in web communications (including here people who use the web as well as the 
machinery).

#g
--


On 08/10/2012 06:21, Pat Hayes wrote:
> Just to weigh in, David is exactly right here. We all feel intuitively that when we use words in English and manage to convey our thoughts adequately, that our words are unambiguous; because if they had been ambiguous, then something would have gone wrong in the communication, and we would have been obliged to repair the resulting damage. (Which does sometimes happen, of course.) But the fact that communication was successful, that you "understood what I meant", does **not** imply that the words I was using were conveying unambiguous meanings, or even that what I had in mind matches exactly with what you have in mind after understanding me. It means only that enough commonality of meaning or reference was achieved for the pragmatic purposes of that particular conversation. If I say, pass me that book on the chair, the fact that you pass it to me does not imply that we have exactly the same concepts of book or chair. Or even of what it means to pass something, for that matter.
>
> This is not the forum to argue cases in philosophy of language, but for a practical everyday argument, just look at what happens to English prose when there is a real need to pin down meanings and referents with enough precision to survive many subsequent readings by a variety of readers: legal contracts, diplomatic communications, statements of regulations and laws and patent applications. Hardly any word is simply used: almost all of them are given explicit definitions, often with special 'guard' language, for example explicitly denying what might have been normal or intuitive understandings of the words. Often, special words are used which have exactly defined meanings in special contexts, like all the latin stuff on legal prose. This is not normal language: to even use it requires specialized training, and it is dangerous for non-experts to attempt, exactly because it has hard unambiguous meanings which are independent of context.
>
> This has nothing to do with logical inference, by the way. And it doesn't only apply when you are "doing logic" and can be ignored or even denied when you are in the "real world" or just "being practical". It is a basic point about ALL COMMUNICATION, BETWEEN PEOPLE OR MACHINES, USING ANY SYMBOLIC LANGUAGE. Got that?
>
> Graham says:
> " I think it's maybe unhelpful to treat
>>> the consequences of RDF's logical formalism as the whole story.  There is the
>>> matter of *intended* meaning of a URI which, as you indicate, can never be
>>> completely nailed down formally, does exist pragmatically and unambiguously in
>>> many cases"
>
> Wrong. It does NOT exist, in almost all cases. And it can't be nailed down, period: formality has nothing to do with the issue here.
>
> If I talk about the date that Hilary climbed Everest, you know what the word "Everest" means, right? Im talking about the famous mountain, yes. But what exactly is a mountain? I mean, its made of rock (and ice and stuff), so we are talking about a solid lump of stuff, right? So how much does it weigh? What is its volume? Do questions like that even have a meaning? Some people think so, others don't. Lets stick to geography: where are the geographical boundaries of Everest? How does one draw a line around one mountain in a range like the Himalayas? Wars have been fought over questions like this. So what, exactly and uniquely, are we referring to when we say "Everest"? If one of us is Chinese and the other is Nepalese, we might well have very different referents for the word, and yet we can still agree that, whatever it is, it was first climbed by Hilary and Tensing. Or take the example Graham uses: http://dbpedia.org/resource/The_Lord_of_the_Rings  refers to the "work", not to a web

page or a particular copy. Sounds good, but what exactly is a "work"? One view 
is that it is the class of all of the copies of it. Another view is that it is 
some Platonic entity, a kind of book-in-the-sky. Another is that it is a mental 
entity, the thought in the mind of the author. And (I am sure) so on. Now, you 
may say, it doesn't matter "what it really is", only that people agree that they 
are talking about 'the work' as opposed to 'the website' or 'a copy', and only 
this three-way distinction matters. Yes, exactly my point: communication only 
requires that we agree on intended referents well enough to achieve the purpose 
of the communcation. It does NOT require that there is a SINGLE referent that is 
UNIQUELY identified by all users of the word. And in fact, that is good news, 
because if it did depend upon this, we would almost never be able to 
communicate. A simple request to pass a book would involve saying an entire 
thesis about the ontology of book-hood.
>
> Successful communication (for a given purpose) does NOT require us to agree on unique referents. It FEELS like that, but that feeling is an illusion. And all of this applies to URIs just as much as it does to all other "identifiers".
>
> The TAG has a history of making what sound like very basic and important pronouncements about URI meanings which are, on their face, simply false. Enormous amounts of time are then wasted trying to figure out what they are talking about, and other standards and even architectures are designed on the assumption that these falsehoods are not only true, but almost necessary, so that to deny them becomes a kind of sin, when in fact it is imply observing the way that the real world is. The claim, for example, that "cool URIs" identify something uniquely, is such a claim. As stated, this is nonsense. They don't do that because that is IMPOSSIBLE. Nothing does that, and the uses of langauge that come closest to it (legal prose, etc.) are clearly not a good paradigm for URI usage in the wild.
>
> I'm sure there is something important that this communication about cool URIs MEANS to say, but right now, it doesn't say that, whatever it is. Why not say it right, and then maybe people will take it more seriously?
>
> Pat
>
> On Oct 7, 2012, at 10:11 PM, David Booth wrote:
>
>> Hi Graham,
>>
>> I appreciate the intuitive appeal that you've expressed, and I agree
>> that the RDF semantics is not the whole story.  But I think it would be
>> misleading to suggest that humans can somehow bypass these fundamental
>> laws of ambiguity.  I'll explain . . .
>>
>> On Sat, 2012-10-06 at 08:14 +0100, Graham Klyne wrote:
>>> David,
>>>
>>> While I agree with most of what you say, I think it's maybe unhelpful to treat
>>> the consequences of RDF's logical formalism as the whole story.  There is the
>>> matter of *intended* meaning of a URI which, as you indicate, can never be
>>> completely nailed down formally, does exist pragmatically and unambiguously in
>>> many cases, and is (AIUI) part of the "by design ..." of the web.
>>
>> By the "*intended* meaning of a URI", I assume you mean " . . .
>> according to the URI's owner", since otherwise different authors would
>> likely assume different intended meanings, thus making the URI
>> ambiguous.
>>
>> Now to address the suggestion that the intended meaning of a URI exists
>> "pragmatically and unambiguously in many cases".  First of all, if by
>> "in many cases" you mean "in many applications", then I would completely
>> agree with you, because as explained in point #2 below, unambiguity is
>> really *relative* to the application that is consuming the RDF
>> containing the URI in question.  It is *not* a property of the URI
>> itself or its semantics.
>>
>> On the other hand, if by "in many cases" you mean "for many URIs", then
>> I would vehemently disagree.  Perhaps the [intended] meaning of the URI
>> does exist in a *few* -- vanishingly few -- cases, such as purely
>> mathematical concepts.  But for the vast majority of URIs, the meaning
>> is ambiguous even to the URI owner -- regardless of whether it has been
>> documented anywhere outside of the URI owner's head!  To see why, one
>> only needs to recognize that no matter how clearly the URI owner thinks
>> he/she knows exactly what resource is intended, someone (or some
>> application) can always come along and make a finer distinction that the
>> URI owner never anticipated, doesn't know . . . and may not even
>> understand!  As always, such a distinction may be unimportant to most
>> applications, but may be critically important to some new application
>> that was unforeseen by the URI owner.
>>
>> So it seems to me that we are inevitably caught between two
>> possibilities: either one is restricting one's attention to a particular
>> *application* (or class of applications), or the URI is ambiguous.
>>
>>>
>>> When humans are "in the loop", then we can reasonably appeal to a human notion
>>> of unambiguous (e.g. http://dbpedia.org/resource/The_Lord_of_the_Rings refers to
>>> the work, not the web page, or some particular copy).
>>
>> In some cases they can, but *only* because they are restricting their
>> attention to a particular application (or class of applications).
>>
>> This is actually an excellent example of point #2, below:
>> ambiguity/unambiguity is relative to the *application*.  To drive this
>> point home, let me instead choose a slightly different URI (because I
>> happen to have a ready-made punch line on hand for it).  :-)
>>
>> Consider a URI for the Lincoln Bedroom, in the White House:
>> http://dbpedia.org/page/Lincoln_Bedroom
>> Surely humans would consider this URI to unambiguously denote a famous
>> room in a particular building, rather than a web page.  But does that
>> URI *really* unambiguous denote that particular room?  What about for
>> applications that need to make finer distinctions than
>> web-page-versus-part-of-a-building?  For example, what if they need to
>> make statements about rooms, such as what items are in the room, etc.?
>> Is that URI *really* unambiguous, even to us humans?  Pat Hayes posted a
>> wonderful vignette in a 2002 discussion of RDF semantics:
>> http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jun/0069.html
>>
>>   Once, as an initial exercise in formalizing some 'common
>>   sense' I sat down with two people and we decided to make a
>>   list of all the things in the room. After a while, one of them
>>   mentioned a picture which was hanging on one wall; the other
>>   objected that the picture wasn't *in* the room, but was *part*
>>   of the room. The ensuing debate went on for an hour. What do
>>   you think? Is the carpet 'in' the room? (What if it is glued
>>   down?) Is the paint on the wall 'in' the room? (If you bring
>>   a can of paint into the room and use it on the walls, at what
>>   point does it become part of the room?) Is the door of the room
>>   'in' the room? (If it opens inwards, and you open it, is it
>>   in the room then?) Can a sound be said to be in a room? How
>>   about a light, or a scent? And so on....  They fought like cat
>>   and dog : each of them found it hard to accept that the other
>>   could believe such crazy stuff.  And the amusing thing is that
>>   both of these people had reached adulthood using the words
>>   'in' and  'room' without ever discovering that other people
>>   had such different intuitions about what they meant.
>>
>> Clearly, the notion of something so simple as "the room" *is* ambiguous,
>> even to us intelligent humans.
>>
>>> When humans are not in
>>> the loop, then it doesn't really matter, as long as the logical inferences
>>> provided are consistent with what people expect, and the logical formalism does
>>> provide for that much.
>>>
>>> So, while I think we mostly agree on the details, I personally think it's OK to
>>> talk about *the* [intended] referent of a URI as long as we don't expect the
>>> formal logic to constrain itself to that single denotation.
>>
>> As a simplistic guideline, I think it is fine (and even helpful!) to say
>> that "by design, a URI identifies one resource" and to encourage URI
>> owners to avoid ambiguity (a/k/a "URI collisions"). But for anyone
>> attempting to seriously examine Web architecture and draft the
>> architectural principles needed to enable the Semantic Web, such an
>> assumption is hopelessly naive.  It would be analogous to assuming that
>> the Earth is flat when trying to draft the laws of physics.  If the TAG
>> is going to make progress on such deeply rooted issues as issue-57 and
>> httpRange-14, we *must* recognize the inherent falsity of such an
>> assumption.
>>
>> Best wishes,
>> David
>>
>>>
>>> #g
>>> --
>>>
>>> On 03/10/2012 19:54, David Booth wrote:
>>>> More background reading for TAG issue-57 discussion:
>>>>
>>>>   - "Framing the URI Resource Identity Problem: The Fundamental
>>>> Use Case of the Semantic Web":
>>>> http://dbooth.org/2012/fyn/Booth-fyn.pdf
>>>>
>>>>   - "Resource Identity and Semantic Extensions: Making Sense
>>>> of Ambiguity":
>>>> http://dbooth.org/2010/ambiguity/paper.html
>>>>
>>>> And some basic points that should be kept in mind in thinking
>>>> about TAG issue-57 (and httpRange-14):
>>>>
>>>> 1. Ambiguity is a fact of life.  In spite of the AWWW's
>>>> statement that "By design, a URI identifies one resource",
>>>> http://www.w3.org/TR/webarch/#id-resources ambiguity of
>>>> reference is inescapable.  This is well established in
>>>> philosophy, and basically boils down to the fact that when
>>>> descriptions are used to define things, it is always possible
>>>> to make finer distinctions than a description anticipated.
>>>>
>>>> 2. Ambiguity is *relative* to the application.  In spite of
>>>> the fact that a URI's referent is inherently ambiguous, such
>>>> ambiguity may or may not matter to a particular application.
>>>> A URI that denotes influenza but fails to distinguish between
>>>> different kinds of influenza may be perfectly UNambiguous
>>>> to an application that merely needs to distinguish between
>>>> viral infections and bacterial infections, whereas it will be
>>>> hopelessly ambiguous to an application that attempts to measure
>>>> the incidence of different influenza strains.  Similarly,
>>>> a URI that ambiguously denotes both a web page and a toucan
>>>> may be perfectly UNambiguous to an application that cares only
>>>> about different kinds of birds, or to a different application
>>>> that cares only about web pages, even if it is ambiguous to
>>>> an application that needs to distinguish between birds and
>>>> web pages.
>>>>
>>>> 3. The context of this issue is RDF.  This issue only matters
>>>> in the RDF / Semantic Web world.  Nobody else cares about the
>>>> "meaning" of a URI.  The Semantic Web is the use case that
>>>> motivates this issue.  Although in concept the Semantic Web does
>>>> not require RDF per se, as a practical matter RDF is the lingua
>>>> franca for the Semantic Web.  Furthermore, since this same
>>>> issue would arise in any formal/machine-processable language
>>>> in which URIs are used as names for things, for simplicity,
>>>> and without loss of generality, we can assume that the context
>>>> of this issue is RDF.
>>>>
>>>> 4. Because we are attempting to address the meaning of a
>>>> URI in the context of RDF, it is essential to understand a
>>>> small amount about how the RDF semantics works -- not the gory
>>>> details or all the mathematical formalism, but one key point.
>>>> This key point is that RDF semantics does not assign a unique
>>>> interpretation to an RDF graph or URI.  As explained in the
>>>> RDF Semantics specification:
>>>>
>>>>    "It is usually impossible to assert enough in any language
>>>>    to completely constrain the interpretations to a single
>>>>    possible world, so there is no such thing as 'the' unique
>>>>    interpretation of an RDF graph. In general, the larger an
>>>>    RDF graph is - the more it says about the world - then the
>>>>    smaller the set of interpretations that an assertion of
>>>>    the graph allows to be true - the fewer the ways the world
>>>>    could be, while making the asserted graph true of it."
>>>>    http://www.w3.org/TR/rdf-mt/#interp
>>>>
>>>> Thus, there is no such thing as *the* referent of a URI in an
>>>> RDF graph.  A URI can have *many* referents -- infinitely many.
>>>> The referent of a URI only becomes unique when a particular
>>>> interpretation of that graph is selected, and that is up to
>>>> the *consumer* of that RDF graph -- not the RDF semantics.
>>>> This is not merely a technicality that can be waved away,
>>>> it is the formal manifestation of point #1 above.
>>>>
>>>> 5. Interpretations correspond to applications.  RDF graphs
>>>> are designed to be consumed by *applications* -- not people.
>>>> Thus, in essence, it is an RDF application that selects an
>>>> interpretation of a given RDF graph: different interpretations
>>>> correspond to different applications.  Thus, in an RDF graph
>>>> a URI that identifies one resource in one application may
>>>> identify a *different* resource in another application if those
>>>> applications have different purposes.  Compare point #2 above.
>>>>
>>>>                           ----
>>>>
>>>> A consequence of the above points is that if one sets out to
>>>> solve TAG issue-57 (or httpRange-14) under the premise that
>>>> "a URI identifies one resource", then one will be heading in
>>>> the wrong direction, and solving it will be an exceedingly long
>>>> and difficult journey.  A solution might eventually be found,
>>>> but unless that faulty premise is corrected, it is apt to end
>>>> up being a solution to the wrong problem.
>>>>
>>>> Since this message is only intended to provide general
>>>> background material for issue-57, I will comment on Proposal27
>>>> in a separate message.
>>>>
>>>> Thanks!
>>>>
>>>
>>>
>>>
>>>
>>
>> --
>> David Booth, Ph.D.
>> http://dbooth.org/
>>
>> Opinions expressed herein are those of the author and do not necessarily
>> reflect those of his employer.
>>
>>
>>
>>
>
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>
>
>
>
>
>
>
Received on Thursday, 11 October 2012 08:33:55 UTC