RE: Naming things with hashes (not #, but e.g. md5)

From: David Booth <david@dbooth.org>
Date: Wed, 11 Apr 2012 11:03:46 -0400
To: Larry Masinter <masinter@adobe.com>
Cc: Jonathan A Rees <rees@mumble.net>, Kingsley Idehen <kidehen@openlinksw.com>, "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <1334156626.22011.64711.camel@dbooth-laptop>
Hi Larry,

On Wed, 2012-04-11 at 03:50 -0700, Larry Masinter wrote:
> You're not understanding me, and I'm getting tired of trying to explain.

FWIW, I understand your frustration, but I do hope you'll have patience,
as these issues are notoriously thorny and it is notoriously difficult
to communicate clearly about them.  We really need to understand
everyone's perspective in order to get these things worked out.

> For example: 
> > Just to be clear, the hr14 resolution has two parts, a 2xx case (a)
> and a 303 case (b).
> But the hr14 resolution is incoherent because it is a "resolution" to
> a senseless question, and thus "cases" are meaningless. There is no
> "2xx case" and a "303 case", any more than there is a "404 not found"
> case and a "DNS error, server not found" case.  
> If sender A sends 
>     <a href="http://example.org/path">something</a> 
> to recipient B inside a message labeled "text/html", that message
> expresses the intention that if B displays the result of the message,
> it will display 'something' with a hyperlink which, when clicked or
> selected, will attempt to connect to example.org with path /path using
> the HTTP protocol; other messages in other formats with hyperlinks
> (PDF, flash, XML, SVG) or with other URI schemes (ftp, data, mailto,
> etc.) have similar or related meaning.
> I can try to explain by analogy, once more.  If I tell you, "The sky
> is falling", 
> then what I say is independent of what the OED or the American Heritage
> dictionary say about the words "sky" and "falling". There are references
> you can consult to discover what I might have meant, and our conversation
> is more reliable if I make reference to reliable dictionaries in which you can
> discover definitions of terms I use if those terms are not in (our) common
> use.  But the meaning doesn't inherit from the dictionary or whether you 
> have one or change if your dictionary burns up.
> While "follow your nose" may be a "good practice", meaning doesn't
> depend on good practice being followed. The meaning of an instance
> of XML doesn't depend on whether someone has bothered to populate
> the namespace URI's web server with interesting information about
> the URI, and doesn't change when the web site is put up or taken down.
> The reliance on 200 vs 303 vs. 404 vs. "server not found" DNS errors may be
> an artifact of some particular processing systems that actually rely on
> attempting retrieval from URIs used in RDF, but that dependency is an
> artifact of those (poorly designed, IMHO) processing systems, and not
> on the "meaning".
> And no, the HTTP working group scope does not cover defining the
> meaning of HTTP URIs outside of a URI as a signifier of invocation of
> the HTTP protocol with a particular host address found within the
> HTTP URI, so saying that your interpretation is "enshrined in HTTPbis"
> is nonsense.
> >  If
> > I ask what an occurrence (in some context) of a name "means", I'm
> > asking what the party who wrote that occurrence intended when they
> > wrote it. 

FWIW, I think that is *exactly* the right criterion, with the exception
that since we cannot objectively define "meaning", we should instead
talk about *definitions*.  So I would rephrase this criterion as: 

  If I ask what an occurrence of a name in a statement "means", 
  I'm asking what definition of that name the statement author 
  intended when the statement was written.

Please note that this implies: 

 1. Two different statement authors using that same name *could* use
that name with *different* intended meanings, i.e., according to
*different* definitions.  (Whether or not they *should*, or the
circumstances under which we would consider it acceptable for them to do
so, is a separate matter.) 

 2. The same statement author *could* use that name with *different*
intended definitions in different statements, such as using one
definition in an HTML statement and a different one in an RDF statement.
(Again, whether or not we would consider this acceptable is a separate

> I think that's an impoverished model of communication 

FWIW, I do not understand why you think so, but I'll read on . . . .

> which you insist on sticking with, and which leads to the same
> senseless conclusions.  I don't think we're going to make progress on
> clarification here if you insist on framing the question in the way
> you are framing it, and don't think it is worth TAG time to discuss
> this any more. 
> The sender (A) is communicating with the receiver (B) with a message M
> that includes a URI U.  U participates in the communication, and the
> communication of M is effective if A and B share a mutual
> understanding of M and U's role in it.

Okay, but so far that sounds very consistent with the way Jonathan
framed it above.  

> If M is HTML and U appears in a@href, the "meaning" of U in that
> context is pretty widely understood as establishing some expected
> behavior in B's software when B clicks on the link, at least for some
> URI schemes commonly used in URIs within HTML and similar language
> (there's work ongoing to come to a common agreement when U is an
> "about:" URI, for example,  on defining expected behavior when U
> appears within content which is generated by scripts from a different
> origin, when the result of a retrieval contains content that looks
> like it matches a different MIME type than the one it was served with,
> etc.)

> But if M contains a collection of RDF triples and U is used within a
> triple, there isn't a common widely understood expectation of
> behavior, 

I do not understand why behavior is relevant.  AFAICT, what the receiver
*does* with the information that is received is entirely the receiver's
business.  We should only be concerned with the question of whether the
receiver is able to determine the definition of U that the sender
intended when that triple was written.

> at least in terms of processing systems that want use logic processing
> on the triples retrieved.  This isn't the fault of "U" being badly
> defined, or the definition of the protocol used if U is accessed as if
> it were in an a@href, it's a problem in not having a common
> understanding about M.

I don't understand that.  If the message's media type is RDF/XML, then
the RDF specification gives the parties a common understanding of M in
general, though the receiver still needs to know what definition of U
the sender intended, and that's what we're trying to address.  We're
trying to work out recommended conventions that will enable the consumer
of an RDF statement to determine, as often as possible, the likely URI
definition that the statement author intended when the statement was
written.  Those conventions will never be 100% infallible, but they can
still be useful to many applications much of the time.
> If you continue to insist that "U" has a common "meaning" that is
> independent of the context use, 

I don't know if Jonathan insisted on that elsewhere, but that is *not*
implied by the framing above.  Achieving 'common "meaning" that is
independent of the context use' would be an unrealistic goal.  But we
*can* achieve conventions that allow common definitions to be used much
of the time.

> and at the same time insist that the "meaning" of U within a@href in
> HTML has nothing to do with the "meaning" of U within an <U> <R> <B>
> triple in RDF, then you are taking an incoherent position.
> Persistence and equivalence are two URI relationships that depend on
> context. Within HTML, a URI is "persistent" insofar as it continues to
> serve as a good target to illustrate the text that is being marked up.
> So the persistence of <a href="http://www.w3.org/2001/tag">The W3C
> TAG</a> depends on the web site containing information about the
> organization, whether or not the organization is disbanded, split into
> unter-TAG and uber-TAG, or whether MIT forgets to pay the .org renewal
> for w3.org.
> Note that, for the most part, within that context
> http://www.w3.org/2001/tag and HtTp:wWw.W3.orG:80/2001/tag  are
> "equivalent", insofar as their common behavior. 
> But within <svg xmlns=" http://www.w3.org/2000/svg">...</svg>, the
> persistence of utility of http://www.w3.org/2000/svg as namespace name
> doesn't depend on whether the result of a retrieval on the namespace
> URI is 200, 303, 404, or timeout. And the URI is *not* equivalent to
> HtTp:wWw.W3.orG:80/2000/svg. 

> I'm very frustrated with this conversation because I think you are
> spinning in circles, asking nonsensical questions. 

I partially agree, in that I think that: (a) framing the question in
terms of "meaning" is a mistake, because we cannot objectively define
"meaning", nor do we need to, since we can instead talk about
definitions; and (b) the objective should not be framed in terms of a
URI having the same "meaning" (or definition) in all contexts, because
that is impossible to achieve.  However, if we frame the objective more
modestly in terms of achieving URI definitions that are commonly used
much of the time, then that is an achievable objective.

> I've tried time and time again to provide a basis for discussion that
> I think is rational, but the conversation keeps on slipping back into
> what I think is nonsense.
> The reason why this matters and I keep on trying is that I don't think
> we can have a sensible architectural discussion about privacy,
> security, CORS, cross-site-scripting, local storage, publishing and
> linking, and many of the other topics we should be resolving, if you
> start with a model of communication using URIs that is
> decontextualized from the message, the actors involved, the time
> sequence of events and the roles of the players involved in
> communication.
>  I don't think we can begin to discuss those issues meaningfully using
> the current AWWW model -- it was a nice try, and maybe a reasonable
> approximation for some purposes, but it's not good enough to help with
> most of what we're faced with.
> We need to talk about the impact of broken certificate infrastructure,
> attacks on DNS, the effect of take-down notices and legislatively
> mandated redirection from acts like SOPA and PIPA, and an model which
> insists that a URI has an "owner" who is responsible for saying what
> it's "meaning" is, that model doesn't let us talk about how the web
> really works.  The fact that AWWW doesn't seem to work resolving some
> of the more obscure edge cases around linked data and metadata ...
> it's just more evidence to me that we need to move on.

The TAG definitely does have other important issues to address, and the
httpRange-14 issue requires a *lot* of deep thought to work out.  So
maybe the AWWSW task force should continue trying to make more progress
before bothering the TAG further with it.  But the AWWSW task force has
had a lot of difficulty because Jonathan and I -- the most active
participants -- seem to hold fundamentally different views about what we
should be trying to achieve.   Jonathan seems to hold the view that a
URI has or should have a universally agreed "meaning", and its semantics
should be determinable through the various W3C and IETF standards that
the web uses.  This has gotten him all tangled up in trying to
semantically align specifications (such as the HTTP spec) that were
never intended as semantics specifications.  In contrast, I have been
advocating the view that the "meaning" of a URI is irrelevant to the
architecture: what matters from an engineering perspective is URI
definitions and their usefulness to applications.  

(As an aside, I think the term "semantic web" was a huge mistake,
because it misleads people into thinking that it is all about
"semantics".  "Linked data" is a much better term.)

David Booth, Ph.D.

Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.
