Re: Comments on "Interoperability of referential uses of hashless URIs" from David Booth on 2011-10-26 (www-tag@w3.org from October 2011)

From: David Booth <david@dbooth.org>
Date: Wed, 26 Oct 2011 12:26:07 -0400
To: Jonathan Rees <jar@creativecommons.org>
Cc: www-tag <www-tag@w3.org>
Message-ID: <1319646367.2178.104943.camel@dbooth-laptop>
On Tue, 2011-10-25 at 17:53 -0400, Jonathan Rees wrote:
> On Mon, Oct 24, 2011 at 6:15 PM, David Booth <david@dbooth.org> wrote:
> > On Mon, 2011-10-24 at 11:06 -0400, Jonathan Rees wrote:
> >> This 'ambiguity' thing is a red herring.
> >
> > I disagree.  As I explain below, I think it is central to a proper
> > understanding of the problem and the appropriate solutions.
> >
> >> I know Harry and Pat
> >> introduced it to make a particular point about fallacies in the theory
> >> of web architecture, and they were right. But that doesn't mean you
> >> can use it as a sledgehammer to attack any communication mechanism you
> >> don't like.
> >
> > I'm not doing that at all.  I am an ardent supporter of the web, web
> > architecture, and semantic web technology.  I'm pointing out that the
> > problem is being subtly misframed, and this may lead to wrong
> > conclusions about how it should be solved.
> >
> >>
> >> The point is, there's a job to be done, and communication is necessary
> >> in order to accomplish it. You need to communicate enough information
> >> to do what's needed, and you need for the receiver to understand the
> >> sender.
> >
> > Right, *within* the class of applications that the sender is attempting
> > to support.
> 
> This has nothing to do with applications, except inasmuch as senders
> and receivers are running them, which is an implementation detail that
> is best ignored. The meaning of a message comes from prior agreement
> between sender and receiver. In a standards-based world that agreement
> is articulated in standards, although perfectly useful communication
> happens when agreement is reached in other ways.
> 
> If you insist on talking about applications, I might say that the
> application is the Web, but I don't think that would be useful.

This has *everything* to do with applications.  The entire point of the
semantic web is to enable *applications* to do useful work for us, based
on machine-processable data.  And it isn't *the* application, it's
*thousands* of applications.  If you don't see that, then I don't know
what I can say.

> 
> > The sender *cannot* support all possible applications.
> 
> Straw man.
> 
> > In
> > other words, there isn't *a* job to be done -- there are *many*
> > different jobs to be done.  The architecture needs to support the
> > ability for all those jobs to be done -- not the ability for *one*
> > sender's data to be understood by *all* receiver applications.
> 
> Straw man.  If the meaning of a document is determined by the SVG
> specification, then senders and receivers have all the agreement they
> need to do a wide variety of things (many jobs) with SVG documents.
> The meaning comes from the spec.
> 
> In the 'referential use' memo agents agreeing to S2 can do many things
> having made that agreement, and similarly D2. That's the whole point
> of writing these things down.

No, it is not a straw man.  The document in it's current draft makes the
implicit assumption that a publisher's data must be unambiguous to *all*
applications or else there is a harmful conflict.  This assumption is
incorrect, and over constrains the solution space.

The publisher needs to know how to publish data in a manner that will be
unambiguous to the kinds of applications that it has chosen to support.
And the consuming application needs to be able to properly interpret
that data *without* special knowledge of the publisher's conventions.
However, the fact that the data is ambiguous to *other* applications is
okay (and unavoidable).

This is somewhat circular, because in deciding what URIs to use, how to
define them, and how to publish the data, the publisher has made a
choice -- conscious or not -- about what kinds of applications that data
will support.  The data will support applications for which it is
sufficiently ambiguous, i.e., it supports whatever applications it
supports.  However, what breaks this circularity is the fact that the
publisher *could* choose to mint the URIs differently or publish the
data differently.

In other words, the key requirement is that the publisher must have the
*ability* to be unambiguous along *any* desired axis -- including the
"document versus primary subject of the document" axis -- regardless of
whether the publisher chooses to exercise this ability.

For example, if the publisher wishes to be unambiguous about whether a
CC license statement applies to content A versus content B, then the
publisher must have a standards-based way of saying that, so that
applications consuming the publisher's data can properly make that
determination *without* special knowledge of that publisher.  (E.g.,
without having to know that "Flickr always uses rule S2, therefore I
will interpret Flickr's data according to rule S2".)

> 
> >> If not enough information is sent for some job, send more. If
> >> information isn't needed, you don't have to send it. If the receiver
> >> doesn't understand what the sender is saying, you need coordination,
> >> e.g. prior discussion, or a standard, or something.
> >>
> >> Roughly speaking, ambiguity is the reciprocal of information.
> >> Obviously there is no such thing as "completely unambiguous" - that
> >> would be the same as "total information" which is ridiculous.
> >
> > I disagree.  That would be true if we were talking about some universal
> > quality of ambiguity/unambiguity.
> 
> Are you saying there are many ambiguity qualities? 

Yes!  The number of potential axes of ambiguity is virtually infinite.

> I agree that a
> message might be considered unambiguous one day, and ambiguous the
> next, as a result of the receiving agent's need for information
> increasing from one day to the next. Given agreement on meaning,
> ambiguity is relative to the receiver's needs.

Exactly!  Ambiguity is *relative* to the receiver's application.

> 
> > But if we ask about a particular
> > application class, then it certainly *is* possible to be "completely
> > unambiguous".  And by necessity we *must* restrict our scope to
> > particular application classes, because it is not possible for a URI
> > owner to consistently and unambiguously address all applications.
> > Ambiguity/unambiguity are *relative* to an application class.
> 
> Ambiguity is a function of the message, its meaning (i.e. the spec for
> the language it's written in), and your need for information. Two
> agents trying to extract the same information from a message that
> doesn't provide it are both going to experience the message as
> ambiguous, assuming they've understood it in the same way (authorized
> by spec presumably). So I agree, it's relative, 

Yes!  Exactly!

> but it's not relative
> to "application class".

It sounds like you're misunderstanding what I mean by "application
class".  By "application class" I'm referring to the application's "need
for information"; certain classes of application will need certain kinds
of information; other classes will need other kinds.

> 
> >> Any
> >> message has the potential to be incomplete, or not understood, in some
> >> context. That is, a communication protocol might work perfectly fine
> >> for a while, and then two communicating parties might say: Hey, this
> >> won't do, we need more clarity / information about this. This just an
> >> inadequacy that should be fixed somehow. We don't say: Oh no, totally
> >> unambiguous communication is impossible, so we have to just give up
> >> all hope of communicating, live with what have, drop the idea of
> >> correctness and accountability, and so on.
> >
> > I'm not saying we should give up on communicating -- quite the opposite
> > -- but we *do* need to drop the idea of *universal* correctness, since
> > correctness is *relative* to the application class.
> 
> Straw man. I'm not talking about universal correctness, I'm talking
> about the kind of correctness an engineer means when saying that an
> agent is using the SVG spec correctly. Correctness is relative to a
> specification, or some other kind of standard or norm.

That's fine as far as it goes, but there's more to it than that.

> 
> >  An example that I
> > have often used is map data that models the world as flat.  Such data is
> > fine for applications such as street navigation -- in fact it is
> > *better* than 3D data for those apps, because it is cheaper to implement
> > and process -- but it would be totally inadequate for other applications
> > that care about altitudes and the curvature of the earth.  The notion of
> > universal data correctness is a red herring.  It is better to think of
> > it in terms of data *usefulness* to applications.
> 
> Meaning comes from specs (or other kinds of precoordination), not
> applications. An application can do whatever as like, so long as it
> lives up to the agreements into which it (or rather its ag-ee or
> creator) has entered, such as spec conformance.
> 
> There's no way to test whether a spec has modeled the world in some
> way except by its actions. 'Wrong' models (and they all are, that's
> what makes them models) used internally are nobody's business - they
> are fine so long as they don't lead to wrong answers as observed
> externally.

I'm not talking about *internal* models, I'm talking about *published*
data.  Think of published map data that models the world as flat.
Clearly such data is "wrong" in some universal, scientific sense, and
yet it is *useful* to some applications even while it is useless to
others.  The point is that "correctness" in a universal, scientific
sense is irrelevant from an architectural perspective, as the
architecture needs to allow "anybody to say anything about anything".
What matters is usefulness to applications.

> 
> > My point is that if we do not frame the problem properly, we are apt to
> > draw the wrong conclusions about how it should be solved.
> 
> Right, that's why I have taken great care to frame the problem properly.

And I think you've done very well on most aspects.  But I've been trying
to explain that some important adjustments still need to be made on some
parts.

> 
> >>
> >> All format and protocol specs evolve in the direction of increasing
> >> the richness and success of communication. (Well ideally at least...)
> >> That's all we're talking about here. We can decide that any given
> >> communication problem, such as metadata, should be solved by someone
> >> else; that doesn't make the problem go away.
> >>
> >> A large class of metadata expression problems (such as the licensing
> >> one) have a fairly simple solution that Tim has been advocating for
> >> almost twenty years. We have a solution to a problem. It has some
> >> warts and some opposition. If we decide the TAG shouldn't be involved
> >> in fixing it, that's fine. It will just get pushed out to a different
> >> forum and solved by them in some way of their choosing.
> >>
> >> Of course there are special cases where D2 and S2 are not mutually
> >> exclusive.
> >
> > Actually I think it is the other way around: there are special cases
> > where D2 and S2 cause harmful ambiguity -- such as the CC license case
> > -- but, as Ian Davis and other LOD protagonists of S2 point out, in the
> > vast majority of cases the ambiguity is harmless.
> 
> The CC license case has little to do with ambiguity. It's about
> whether the license is applied to the correct resource. In the Flickr
> case, if the standard or prior agreement is S2 then the receiver gets
> the answer intended by the sender. If the agreement is D2 then Flickr
> has made a mistake by sending the wrong message. If there is
> disagreement on what is 'right', then there is no way to tell who is
> 'right' and who is 'wrong' since there is no standard (i.e. agreement)
> to judge against.

But that's exactly the ambiguity I'm talking about!  If the recipient
does not know whether S2 or D2 was used by the sender, then the license
statement is ambiguous to that recipient.  

> 
> If the sender and receiver agree that the message is ambiguous (i.e.
> does not provide the information needed in this situation), then in
> this case, since the assignment matters, the receiver would have to
> ignore the message and use a different channel (or language) for
> obtaining the information they need - that is their only correct
> response. But that is not the situation I was talking about.

Agreed.

> Disagreement, or mistakes concerning what is understood, is not at all
> the same as ambiguity, which is simply an understanding that there is
> information that is not conveyed.

It sounds like you're trying to reserve the word "ambiguous" for some
special sense.   Under normal use of the word, I would say that if the
recipient is unable to determine the sender's intent (i.e., to determine
which content is subject to the specified license) then the message is
*ambiguous* to the recipient.

> 
> >> If John Smith Jr. and John Smith Sr. live at 5 Ambiguity
> >> Lane, then there is no problem if I say that John Smith lives at 5
> >> Ambiguity Lane, since it's true under any interpretation of "John
> >> Smith". But for some other situations it won't work for me to just say
> >> "John Smith" - like if I say that John Smith is 44 years old. Neither
> >> particular situation is more "general" than the other, but a
> >> communication system that lets you coordinate more meanings and make
> >> finer distinctions is generally more "general" or useful than one that
> >> does not.
> >
> > The architecture needs to support the ability to convey distinctions
> > along *any* axis, and there is a virtually infinite number of potential
> > axes.  But this doesn't mean that a system with built-in support for one
> > particular axis (e.g. the web page vs. its primary subject) is more
> > general than a system that also *allows* distinctions on that axis, but
> > does not give special recognition to that axis over any other axis.
> 
> Straw man. Specifications can help make all sorts of distinctions, in
> any direction they like. If a spec says that a message means X (e.g. X
> makes a certain distinction), and the communicating parties have
> agreed to that spec, then the message means X, in their communication.
> Similarly, a spec can intentionally make a message not mean X (which
> is not the same as meaning not X). In that case neither party can
> justifiably conclude X from the message. If they do then they are
> mistaken.

Well, it sounds like you've missed my point, but I don't think further
explanation will help.

> 
> >> The "John Smith" situation holds for any "identifier" system
> >> - that was the point of the Hayes/Halpin paper. All it says is that
> >> you need to engineer your communication system so that what has to be
> >> said, is said, and is understood.
> >
> > Yes . . . *within* the intended class of applications (which may be
> > quite broad, but are *not* universal).
> 
> The spec defines the class of agents that are conformant to the spec,
> not the other way around.

Yes, that's fine.

> 
> >>
> >> So I stand by what I say, and I continue to think it's obvious. D2 and
> >> S2 are incompatible because, as general methods, in most cases they
> >> give mutually inconsistent answers.
> >
> > If "in most cases they give mutually inconsistent answers" means "in
> > most cases there exists *some* application for which they give mutually
> > inconsistent answers", then I would agree.  But if it means "in most
> > existing applications they give mutually inconsistent answers" then I do
> > not think that is true at all.  As the LOD community demonstrates, most
> > of their applications work fine in spite of the potential ambiguity that
> > S2 creates when used with D2.
> 
> Specs for the meaning of a set of messages are incompatible if there
> is any message for which the meanings given by the two are
> incompatible. That is what I meant by incompatible specs. You can't
> conform to both at the same time without the potential for mistake.

Okay, but it's misleading to label them as "incompatible" according to
that criterion, because that's *not* the relevant criterion.  It doesn't
matter if there exists a message that can be misinterpreted, *provided*
that there exists a way to send the desired information -- without
requiring the recipient to have any special knowledge of the sender --
in a way that will *not* be misinterpreted by the target recipients.

> 
> You can use incompatible language specs at the same time if you are
> careful to stay away from conflicting messages. As you say, you might
> even get quite a bit of useful work done this way. That does not mean
> the specs are compatible, just that they're partially compatible.
> 
> >> The fact that occasionally they
> >> don't, or equivalently that the incompatibilities happen to not matter
> >> to some particular sender or receiver, doesn't affect the truth of the
> >> statement that they're incompatible as general methods.
> >
> > Perhaps we need to get some quantitative data on how often this
> > ambiguity matters to an application, because my perception is that the
> > dominant case is the other way around: usually this ambiguity does *not*
> > matter, but occasionally it does -- as nicely illustrated by the CC
> > licensing use case.
> 
> Quantitative deployment studies would have no bearing on an assessment
> of the correctness of what I have said, since what I said is not
> sensitive to properties of the installed base.
> 
> The document is neutral regarding any possible TAG decision to invest
> in building consensus on any particular proposal. It just says D2
> senders are incompatible with S2 receivers for some messages, and vice
> versa (and then has a bit of discussion). 

*That* is a better way of putting it, rather than merely saying that D2
and S2 are incompatible.  But again, it can be misleading to imply that
incompatibility for some messages is harmful.  It *would* be harmful if
the rules did not offer ways to avoid the incompatibility when
necessary.  The fact that it is possible to shoot one's self in the foot
does not matter as long as it is possible to avoid shooting one's self
in the foot when needed.  So I think it is important to be careful in
how "incompatibility" is discussed.

> I think it's very important
> that we all get on the same page regarding the nature of the problem.
> Any process for determining next steps by comparing or synthesizing
> the two approaches is future work and is pointless without some
> neutral baseline understanding. 

Agreed.  I hope this discussion has been helpful in that regard.

> In the future deployed base might be
> interesting to talk about, but that's not what the document is about.
> 
> It would be weird, but conceivable, to agree that some message "ZJW"
> means "it is raining and it is not raining". In that case it is
> unlikely that conformant agents would ever use this message.

Well, think of "it is raining *or* it is not raining" instead.
Certainly conforming agents would avoid it when they need to avoid that
particular ambiguity.  But if their target applications didn't care
about that particular ambiguity, then they may well use it.  For
example, if the application is just categorizing the subjects being
discussed then the mere fact that "rain" is discussed may be sufficient.


-- 
David Booth, Ph.D.
http://dbooth.org/

Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.
Received on Wednesday, 26 October 2011 16:26:34 UTC