Re: Comments on "Interoperability of referential uses of hashless URIs" from Jonathan Rees on 2011-10-24 (www-tag@w3.org from October 2011)

From: Jonathan Rees <jar@creativecommons.org>
Date: Mon, 24 Oct 2011 11:06:10 -0400
To: David Booth <david@dbooth.org>
Cc: www-tag <www-tag@w3.org>
Message-ID: <CACHXnarJ6expmtuJiGhCJ5kG9jt5CexT5jaijDepLAGdKJ4q3g@mail.gmail.com>
This 'ambiguity' thing is a red herring. I know Harry and Pat
introduced it to make a particular point about fallacies in the theory
of web architecture, and they were right. But that doesn't mean you
can use it as a sledgehammer to attack any communication mechanism you
don't like.

The point is, there's a job to be done, and communication is necessary
in order to accomplish it. You need to communicate enough information
to do what's needed, and you need for the receiver to understand the
sender.

If not enough information is sent for some job, send more. If
information isn't needed, you don't have to send it. If the receiver
doesn't understand what the sender is saying, you need coordination,
e.g. prior discussion, or a standard, or something.

Roughly speaking, ambiguity is the reciprocal of information.
Obviously there is no such thing as "completely unambiguous" - that
would be the same as "total information" which is ridiculous. Any
message has the potential to be incomplete, or not understood, in some
context. That is, a communication protocol might work perfectly fine
for a while, and then two communicating parties might say: Hey, this
won't do, we need more clarity / information about this. This just an
inadequacy that should be fixed somehow. We don't say: Oh no, totally
unambiguous communication is impossible, so we have to just give up
all hope of communicating, live with what have, drop the idea of
correctness and accountability, and so on.

All format and protocol specs evolve in the direction of increasing
the richness and success of communication. (Well ideally at least...)
That's all we're talking about here. We can decide that any given
communication problem, such as metadata, should be solved by someone
else; that doesn't make the problem go away.

A large class of metadata expression problems (such as the licensing
one) have a fairly simple solution that Tim has been advocating for
almost twenty years. We have a solution to a problem. It has some
warts and some opposition. If we decide the TAG shouldn't be involved
in fixing it, that's fine. It will just get pushed out to a different
forum and solved by them in some way of their choosing.

Of course there are special cases where D2 and S2 are not mutually
exclusive. If John Smith Jr. and John Smith Sr. live at 5 Ambiguity
Lane, then there is no problem if I say that John Smith lives at 5
Ambiguity Lane, since it's true under any interpretation of "John
Smith". But for some other situations it won't work for me to just say
"John Smith" - like if I say that John Smith is 44 years old. Neither
particular situation is more "general" than the other, but a
communication system that lets you coordinate more meanings and make
finer distinctions is generally more "general" or useful than one that
does not. The "John Smith" situation holds for any "identifier" system
- that was the point of the Hayes/Halpin paper. All it says is that
you need to engineer your communication system so that what has to be
said, is said, and is understood.

So I stand by what I say, and I continue to think it's obvious. D2 and
S2 are incompatible because, as general methods, in most cases they
give mutually inconsistent answers. The fact that occasionally they
don't, or equivalently that the incompatibilities happen to not matter
to some particular sender or receiver, doesn't affect the truth of the
statement that they're incompatible as general methods.

Jonathan

On Mon, Oct 24, 2011 at 10:02 AM, David Booth <david@dbooth.org> wrote:
> Further comments . . .
>
> On Wed, 2011-10-19 at 15:47 -0400, David Booth wrote:
>> Regarding
>> http://www.w3.org/2001/tag/2011/09/referential-use.html
>
>
> Although this document provides a very good example of the harm that can
> occur when a web publisher uses a different convention for the semantics
> of a URI than the consumer of that page assumes -- see the Creative
> Commons licensing example in the section called "The Conflict" -- I
> think there is some danger in the way this document describes the
> problem and potential solutions, in that it subtly perpetuates the idea
> that ambiguity is universally harmful and can be avoided by consistent
> use of better conventions.
>
> For example, the document says: "It should be clear that these answers
> [D2 and S2] are mutually exclusive."  But in the general case, these
> answers are *not* mutually exclusive.  These answers (i.e., the
> conventions described as D2 and S2) only create a conflict for
> applications that *need* to distinguish between the two resources that
> the use of both conventions simultaneously would conflate.  But for
> applications that do not need to make such distinctions, there is no
> problem.
>
> Second, it is impossible to remove all ambiguity anyway.  Thus, it is
> misleading to call out this particular kind of ambiguity as somehow
> special or deserving of more architectural attention than other kinds of
> ambiguity.  The best a URI owner can do is to make a URI unambiguous
> *within* the particular class of applications that the URI owner is
> attempting to support -- hopefully a fairly wide class of applications,
> though never universal.  In the Creative Commons licensing example,
> where the landing page becomes conflated with the artistic content to be
> licensed, the publisher clearly failed to do that very well, because
> there are currently two conventions in use, and a machine cannot
> reliably know which convention a particular site has used.
>
> David
>
>>
>> This document provides a very good explanation and example of the harm
>> that can be caused by the use of conflicting conventions around URI use.
>> The Creative Commons licensing case is a great example, as the publisher
>> of http://www.jamendo.com/en/album/78807 (for example) has created an
>> ambiguity problem for consumers who do not know which convention the
>> site has used.
>>
>> However, the document seems to assume that the solution to this problem
>> (i.e., the conventions that the W3C should recommend) *must* prevent the
>> ambiguity problems that are described in section "The Conflict".  But I
>> think what is required from an architectural perspective is not that the
>> conventions *necessarily* prevent such ambiguity (because we will always
>> have sites of varying quality), but that the conventions support the
>> *ability* of publishers to avoid such ambiguity problems if they choose
>> to do so, and the conventions furthermore encourage publishers to do
>> so.
>>
>> In other words, another potential way forward is to permit both
>> conventions D2 ("A hashless URI refers to the document at that URI, when
>> there is one") and S2 ("A hashless URI permitting retrieval refers to
>> something described by what's retrieved") to be used, but recommend that
>> S2 be used *only* in cases where the ambiguity that it creates is likely
>> to be harmful, such as in the Creative Commons licensing case.
>>
>> Such guidance also might acknowledge that: (a) it is impossible for the
>> publisher to foresee all of the downstream uses that could lead to
>> conflict or ambiguity; and (b) downstream conflict or ambiguity are
>> impossible to prevent anyway (in the general case), regardless of what
>> conventions are adopted.
>>
>>
>
> --
> David Booth, Ph.D.
> http://dbooth.org/
>
> Opinions expressed herein are those of the author and do not necessarily
> reflect those of his employer.
>
>
Received on Monday, 24 October 2011 15:06:42 UTC