Re: Comments on "Interoperability of referential uses of hashless URIs" from David Booth on 2011-10-24 (www-tag@w3.org from October 2011)

From: David Booth <david@dbooth.org>
Date: Mon, 24 Oct 2011 18:15:58 -0400
To: Jonathan Rees <jar@creativecommons.org>
Cc: www-tag <www-tag@w3.org>
Message-ID: <1319494558.2178.91623.camel@dbooth-laptop>
On Mon, 2011-10-24 at 11:06 -0400, Jonathan Rees wrote:
> This 'ambiguity' thing is a red herring. 

I disagree.  As I explain below, I think it is central to a proper
understanding of the problem and the appropriate solutions.

> I know Harry and Pat
> introduced it to make a particular point about fallacies in the theory
> of web architecture, and they were right. But that doesn't mean you
> can use it as a sledgehammer to attack any communication mechanism you
> don't like.

I'm not doing that at all.  I am an ardent supporter of the web, web
architecture, and semantic web technology.  I'm pointing out that the
problem is being subtly misframed, and this may lead to wrong
conclusions about how it should be solved.

> 
> The point is, there's a job to be done, and communication is necessary
> in order to accomplish it. You need to communicate enough information
> to do what's needed, and you need for the receiver to understand the
> sender.

Right, *within* the class of applications that the sender is attempting
to support.  The sender *cannot* support all possible applications.  In
other words, there isn't *a* job to be done -- there are *many*
different jobs to be done.  The architecture needs to support the
ability for all those jobs to be done -- not the ability for *one*
sender's data to be understood by *all* receiver applications.

> 
> If not enough information is sent for some job, send more. If
> information isn't needed, you don't have to send it. If the receiver
> doesn't understand what the sender is saying, you need coordination,
> e.g. prior discussion, or a standard, or something.
> 
> Roughly speaking, ambiguity is the reciprocal of information.
> Obviously there is no such thing as "completely unambiguous" - that
> would be the same as "total information" which is ridiculous. 

I disagree.  That would be true if we were talking about some universal
quality of ambiguity/unambiguity.  But if we ask about a particular
application class, then it certainly *is* possible to be "completely
unambiguous".  And by necessity we *must* restrict our scope to
particular application classes, because it is not possible for a URI
owner to consistently and unambiguously address all applications.
Ambiguity/unambiguity are *relative* to an application class.

> Any
> message has the potential to be incomplete, or not understood, in some
> context. That is, a communication protocol might work perfectly fine
> for a while, and then two communicating parties might say: Hey, this
> won't do, we need more clarity / information about this. This just an
> inadequacy that should be fixed somehow. We don't say: Oh no, totally
> unambiguous communication is impossible, so we have to just give up
> all hope of communicating, live with what have, drop the idea of
> correctness and accountability, and so on.

I'm not saying we should give up on communicating -- quite the opposite
-- but we *do* need to drop the idea of *universal* correctness, since
correctness is *relative* to the application class.  An example that I
have often used is map data that models the world as flat.  Such data is
fine for applications such as street navigation -- in fact it is
*better* than 3D data for those apps, because it is cheaper to implement
and process -- but it would be totally inadequate for other applications
that care about altitudes and the curvature of the earth.  The notion of
universal data correctness is a red herring.  It is better to think of
it in terms of data *usefulness* to applications.

My point is that if we do not frame the problem properly, we are apt to
draw the wrong conclusions about how it should be solved.

> 
> All format and protocol specs evolve in the direction of increasing
> the richness and success of communication. (Well ideally at least...)
> That's all we're talking about here. We can decide that any given
> communication problem, such as metadata, should be solved by someone
> else; that doesn't make the problem go away.
> 
> A large class of metadata expression problems (such as the licensing
> one) have a fairly simple solution that Tim has been advocating for
> almost twenty years. We have a solution to a problem. It has some
> warts and some opposition. If we decide the TAG shouldn't be involved
> in fixing it, that's fine. It will just get pushed out to a different
> forum and solved by them in some way of their choosing.
> 
> Of course there are special cases where D2 and S2 are not mutually
> exclusive. 

Actually I think it is the other way around: there are special cases
where D2 and S2 cause harmful ambiguity -- such as the CC license case
-- but, as Ian Davis and other LOD protagonists of S2 point out, in the
vast majority of cases the ambiguity is harmless.

> If John Smith Jr. and John Smith Sr. live at 5 Ambiguity
> Lane, then there is no problem if I say that John Smith lives at 5
> Ambiguity Lane, since it's true under any interpretation of "John
> Smith". But for some other situations it won't work for me to just say
> "John Smith" - like if I say that John Smith is 44 years old. Neither
> particular situation is more "general" than the other, but a
> communication system that lets you coordinate more meanings and make
> finer distinctions is generally more "general" or useful than one that
> does not. 

The architecture needs to support the ability to convey distinctions
along *any* axis, and there is a virtually infinite number of potential
axes.  But this doesn't mean that a system with built-in support for one
particular axis (e.g. the web page vs. its primary subject) is more
general than a system that also *allows* distinctions on that axis, but
does not give special recognition to that axis over any other axis.

> The "John Smith" situation holds for any "identifier" system
> - that was the point of the Hayes/Halpin paper. All it says is that
> you need to engineer your communication system so that what has to be
> said, is said, and is understood.

Yes . . . *within* the intended class of applications (which may be
quite broad, but are *not* universal).

> 
> So I stand by what I say, and I continue to think it's obvious. D2 and
> S2 are incompatible because, as general methods, in most cases they
> give mutually inconsistent answers. 

If "in most cases they give mutually inconsistent answers" means "in
most cases there exists *some* application for which they give mutually
inconsistent answers", then I would agree.  But if it means "in most
existing applications they give mutually inconsistent answers" then I do
not think that is true at all.  As the LOD community demonstrates, most
of their applications work fine in spite of the potential ambiguity that
S2 creates when used with D2.

> The fact that occasionally they
> don't, or equivalently that the incompatibilities happen to not matter
> to some particular sender or receiver, doesn't affect the truth of the
> statement that they're incompatible as general methods.

Perhaps we need to get some quantitative data on how often this
ambiguity matters to an application, because my perception is that the
dominant case is the other way around: usually this ambiguity does *not*
matter, but occasionally it does -- as nicely illustrated by the CC
licensing use case.

David

> 
> Jonathan
> 
> On Mon, Oct 24, 2011 at 10:02 AM, David Booth <david@dbooth.org> wrote:
> > Further comments . . .
> >
> > On Wed, 2011-10-19 at 15:47 -0400, David Booth wrote:
> >> Regarding
> >> http://www.w3.org/2001/tag/2011/09/referential-use.html
> >
> >
> > Although this document provides a very good example of the harm that can
> > occur when a web publisher uses a different convention for the semantics
> > of a URI than the consumer of that page assumes -- see the Creative
> > Commons licensing example in the section called "The Conflict" -- I
> > think there is some danger in the way this document describes the
> > problem and potential solutions, in that it subtly perpetuates the idea
> > that ambiguity is universally harmful and can be avoided by consistent
> > use of better conventions.
> >
> > For example, the document says: "It should be clear that these answers
> > [D2 and S2] are mutually exclusive."  But in the general case, these
> > answers are *not* mutually exclusive.  These answers (i.e., the
> > conventions described as D2 and S2) only create a conflict for
> > applications that *need* to distinguish between the two resources that
> > the use of both conventions simultaneously would conflate.  But for
> > applications that do not need to make such distinctions, there is no
> > problem.
> >
> > Second, it is impossible to remove all ambiguity anyway.  Thus, it is
> > misleading to call out this particular kind of ambiguity as somehow
> > special or deserving of more architectural attention than other kinds of
> > ambiguity.  The best a URI owner can do is to make a URI unambiguous
> > *within* the particular class of applications that the URI owner is
> > attempting to support -- hopefully a fairly wide class of applications,
> > though never universal.  In the Creative Commons licensing example,
> > where the landing page becomes conflated with the artistic content to be
> > licensed, the publisher clearly failed to do that very well, because
> > there are currently two conventions in use, and a machine cannot
> > reliably know which convention a particular site has used.
> >
> > David
> >
> >>
> >> This document provides a very good explanation and example of the harm
> >> that can be caused by the use of conflicting conventions around URI use.
> >> The Creative Commons licensing case is a great example, as the publisher
> >> of http://www.jamendo.com/en/album/78807 (for example) has created an
> >> ambiguity problem for consumers who do not know which convention the
> >> site has used.
> >>
> >> However, the document seems to assume that the solution to this problem
> >> (i.e., the conventions that the W3C should recommend) *must* prevent the
> >> ambiguity problems that are described in section "The Conflict".  But I
> >> think what is required from an architectural perspective is not that the
> >> conventions *necessarily* prevent such ambiguity (because we will always
> >> have sites of varying quality), but that the conventions support the
> >> *ability* of publishers to avoid such ambiguity problems if they choose
> >> to do so, and the conventions furthermore encourage publishers to do
> >> so.
> >>
> >> In other words, another potential way forward is to permit both
> >> conventions D2 ("A hashless URI refers to the document at that URI, when
> >> there is one") and S2 ("A hashless URI permitting retrieval refers to
> >> something described by what's retrieved") to be used, but recommend that
> >> S2 be used *only* in cases where the ambiguity that it creates is likely
> >> to be harmful, such as in the Creative Commons licensing case.
> >>
> >> Such guidance also might acknowledge that: (a) it is impossible for the
> >> publisher to foresee all of the downstream uses that could lead to
> >> conflict or ambiguity; and (b) downstream conflict or ambiguity are
> >> impossible to prevent anyway (in the general case), regardless of what
> >> conventions are adopted.
> >>
> >>
> >
> > --
> > David Booth, Ph.D.
> > http://dbooth.org/
> >
> > Opinions expressed herein are those of the author and do not necessarily
> > reflect those of his employer.
> >
> >
> 
> 
> 

-- 
David Booth, Ph.D.
http://dbooth.org/

Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.
Received on Monday, 24 October 2011 22:16:24 UTC