RE: Proposed AWWW erratum on "information resources" [was Re: Fwd: Splitting vs. Interpreting] from David Booth on 2009-08-05 (www-archive@w3.org from August 2009)

From: David Booth <david@dbooth.org>
Date: Wed, 05 Aug 2009 13:10:30 -0400
To: Larry Masinter <masinter@adobe.com>
Cc: www-archive <www-archive@w3.org>
Message-Id: <1249492230.25446.4286.camel@dbooth-laptop>
Hi Larry,

On Sat, 2009-08-01 at 00:23 -0700, Larry Masinter wrote: 
> > LSIDs and XRIs are the main two that come to mind.  Here's
> > an example of a claim that "LSIDs are independent of any particular
> > transport protocol":
> > http://lists.w3.org/Archives/Public/public-semweb-lifesci/2006Jul/0173.html
> 
> I think the claim of being "independent of any particular
> transport protocol" is a little confused, but the concerns
> about using http: make sense to me.  Really it's a matter
> of "being independent of other people's resolution mechanisms,
> and instead dependent on ours".
> 
> > My objection to these proposals is that there is no *need* to define new
> > schemes for LSIDs or XRIs: http URIs can do the same thing, and
> > therefore should be used instead.
> 
> I disagree that they "can do" the "same thing". I think that
> they are showing that they value some properties that you
> think are insignificant.

It sounds like we're still misunderstanding each other here, but I'm
hoping some of the explanation below will help.

> 
> > I think the main reason for proposals like an LSID or XRI scheme has
> > been a misconception about how http URIs can be used.  The essential
> > misconception is that http URIs *must* be resolved using the HTTP
> > protocol. The owner of a URI such as http://filbert.example/foo could
> > define an alternate protocol -- the filbert protocol -- for resolving
> > URIs that begin with "http://filbert.example/",
> 
> A world in which that is true is very "Humpty Dumpty", where words
> mean whatever the utterer of the word intends it to mean. I don't
> think that's a reasonable architecture, and -- wearing a Web
> Architect hat, would vote against such a design, as leading to
> poor interoperability. How can a receiver of 
> "http://filbert.example/yes" *know*, reliably, that filbert
> has defined an alternate protocol?   

The *exact* same way a receiver would know that an alternate protocol
has been associated with "filbert:example/yes": The web architecture has
established a deterministic but extensible convention for interpreting a
URI that establishes a chain of authority based on delegation.  That
chain of authority begins with RFC3986, and then delegates to other
specifications.  

Consider a hypothetical URI such as:
filbert:example/nadia/woosel/yes
In this case, the chain of authority looks like:
  1. AWWW delegates to RFC3986 (because the whole thing is a URI)
  2. RFC3986 delegates authority for filbert:* URIs to the owner of the
filbert scheme
  3. Owner of filbert scheme delegates authority for
filbert:example/nadia/* URIs to the Filbert specification.
  4. The Filbert spec delegates authority for filbert:example/nadia/*
URIs to Nadia.
  5. Nadia delegates authority for filbert:example/nadia/woosel/* URIs
to the Woosel spec.
  6. The Woosel spec says that the URI filbert:example/nadia/woosel/yes
means True.

Compare this with a corresponding http URI such as
http://filbert.example/nadia/woosel/yes
In this case, the chain of authority looks like:
  1. AWWW delegates to RFC3986 (because the whole thing is a URI)
  2. RFC3986 delegates authority for http:* URIs to the owner of the
http scheme
  3. Owner of http scheme delegates authority for
http://filbert.example/* URIs to the owner of filbert.example.
  4. The owner of filbert.example delegates authority for
http://filbert.example/* URIs to the Filbert specification.
  5. The Filbert spec delegates authority for
http://filbert.example/nadia/* URIs to Nadia.
  6. Nadia delegates authority for http://filbert.example/nadia/woosel/*
URIs to the Woosel spec.
  7. The Woosel spec says that the URI
http://filbert.example/nadia/woosel/yes means True.

There is one extra level of indirection in the http case, but the net
effect is nearly identical.

For simplicity, but without loss of generality, I have made the syntax
of these two URIs look very similar.  Of course, the syntax could have
differed, which may have required some syntactic encoding in http URIs,
in order to conform to the syntactic requirements of http URIs.  But
that does not affect the point: additional interpretations can be
layered on top of http URIs in exactly the same manner that they are
layered on top of URIs in general by defining new schemes.

To turn this around, if the web did *not* have this kind of
extensibility through delegation then the use of URIs simply would not
scale, because it isn't practical for everyone who needs to define new
URIs with new semantics to create a new scheme for them.

The AWWW discusses these ideas, though not exactly in these words, and
probably not as explicitly as it should:
http://www.w3.org/TR/webarch/#uri-assignment
http://www.w3.org/TR/webarch/#URI-scheme

> 
> Protocols and protocol elements like URIs need a stable definition
> so that receivers of communications can know, without telepathy
> or infinite wisdom, what the communication is intended to mean.

Yes, of course.  That's what delegation is all about: it provides a
clear chain of authority for figuring out the intent.

> URIs are "uniform" resource identifiers in that they carry along
> the indication of intended resolution.
> 
> 
> > just as the owner of a
> > "filbert:" scheme could define the filbert protocol for resolving URIs
> > that begin with "filbert:".  
> 
> The design of the URI system, as I understood it through the
> more than 15 years of shepherding the documents through the
> IETF standards process -- has been that the registered scheme
> is a reliable indicator of the protocol. For
> "http://filbert.example/yes" to mean something other than
> "use the HTTP protocol", it would require an incompatible
> change to the "http" URI scheme definition which is not
> justified at all.

It isn't an incompatible change.  It is *layering*.  The URI still
retains its http semantics.  But it gains *additional* semantics, above
and beyond what the http scheme told you.  It isn't *wrong* to try to
dereference that URI using the HTTP protocol.  But it may not give you
as much benefit as if you used the Filbert protocol.

> 
> > The point is that the ability to do this is
> > due to the fact that http://filbert.example/foo, in *addition* to being
> > an http URI (and thus associated with the HTTP protocol), is also a
> > *name* that can be associated with *any* protocol. 
> 
> I'm sure that you can also use http://filbert.example/foo 
> as a design on a T-shirt or in in a pagan ritual, but
> it doesn't mean that it is therefore also a fashion statement
> or one of the many name of a diety unsuitable for uttering
> aloud. 
> 
> We're discussing the standardized use of URIs within network
> protocols. The fact that I can design a system which uses URIs
> for some other purpose doesn't detract from there being value
> in having a standardize meaning.

Right.  A URI like http://filbert.example/foo still has its standardized
http meaning.  But it has *additional* meaning as a Filbert URI.  This
is *exactly* the same idea as an http URI retaining its standard meaning
as a generic RFC3986 URI while *also* having more meaning specifically
as an http URI.

> 
> >  The association of the name to a protocol is by external
> >  convention. 
> 
> The purpose of writing standards is to write down the conventions
> so that those who agree to use a protocol can infer all of
> the conventions without some out-of-band transmission of what
> the protocol elements mean. A design which relies on out-of-band
> communication (external convention) is a bad network design.
> 
> >  It can be
> > accomplished by publishing a document proclaiming that URIs beginning
> > with "filbert:" should follow certain syntactic conventions and can
> > resolved using the filbert protocol 
> 
> There is a convention for publication established in the URI
> specification and URI registry. Proclaiming such a thing in
> a document in the third drawer of your file cabinet in your
> basement isn't sufficient.
> 
> > or it can be accomplished by
> > publishing a document proclaiming that URIs beginning with
> > "http://filbert.example/" should follow certain syntactic conventions
> > and can be resolved using the filbert protocol.
> 
> But there are already existing documents, long published,
> read, absorbed, and implemented, which already define what
> URIs beginning with "http:" mean and how to process them.
> Proposals for a new system which has severe interoperability
> difficulties as a replacement for one which is functioning
> should be rejected.

I think that I may not have made clear before is that this is a
*non-destructive* layering of new semantics on top of existing
semantics.  When the layering is non-destructive (monotonic), it is
perfectly permissible -- though certainly not recommended -- for the
additional, private semantics to be defined in the someone's third desk
drawer.  Those who know about it gain the additional value and those who
don't can still rely on the more widely known semantics.

> 
> > Perhaps one difference in viewpoint is that I believe that if http URIs
> > can be used, they *should* be used, rather than creating a new scheme.
> 
> This seems much too dogmatic to me. I believe systems should
> be designed in a way that improves reliability and interoperability,
> and that design choices are complex, those making them should
> be informed of the costs and benefits of those design choices,
> and that interoperability, extensibility and reliability
> are important elements of design which often get short shrift.
> 
> > I.e., the proponent of a new scheme should bear the burden of proving
> > that a new scheme is needed
> 
> They have some burden to show that there is value, yes.
> >  -- that the http *scheme* (not protocol) is
> > inadequate -- rather than the other way around. 
> 
> Well, the scheme currently implies the protocol and I think
> proposals that it shouldn't -- well, the ones I've seen
> have been poorly thought out.
> 
> >  Again, the reason why I
> > think the default should be this way is to prevent fragmentation in the
> > URI space.
> 
> I'm not sure what "URI space" is to be fragmented. I don't
> think calling everyone "Bob" is a good idea, Bob Larry Masinter,
> Bob David Booth, etc. 
> 
> I think systems should be designed so they work effectively
> and reliably, and that reuse and simplicity are important
> but secondary goals.
> 
> >   Sure, one can say that the marketplace will choose which
> > schemes will win out -- and indeed it will -- but that won't prevent
> > unnecessary churn along the way, which could be avoided by discouraging
> > unnecessary new schemes.
> 
> I certainly would discourage unnecessary new schemes, but
> I'm more willing to accept rationales for why some new schemes
> may seem "necessary" to their proponents.
> 
> > Rather than jumping directly to a new URI scheme, I think it would be
> > better to initially use an http URI prefix (such as
> > http://filbert.example/ ),
> 
> well, if that works for them, then it's a good indication their
> scheme isn't "necessary".
> 
> >  and only consider creating a new scheme
> > *after* widespread implementation support for that prefix has been
> > observed, i.e., *after* software implementations widely use the filbert
> > protocol to resolve URIs that start with http://filbert.example/ .
>  
> Well, if prefixes are sufficient (as they seem to be with doi, for
> example) then you've supplied evidence that a new scheme isn't
> necessary.

Exactly.  And the point of the document I wrote a while back on
"Converting New URI Schemes or URN Sub-Schemes to HTTP" was to show, by
way of an informal proof-by-construction, that new schemes are not
needed in general:
http://dbooth.org/2006/urn2http/

Has any of this explanation helped?  

David Booth

> 
> > Anyway, I don't know if this explanation has helped.  I do wish we could
> > have an opportunity to chat in person sometime.  :)
> 
> Writing this up is useful, too.  I'm not sure I'm getting through
> about the design space, and I'd like to hone my written arguments.
> 
> David Booth
> 
> 
> 
> On Wed, 2009-07-29 at 07:55 -0700, Larry Masinter wrote:
> > > HTTP protocol *may* be useful in conjunction with it.  This is
> > > additional value that other URIs that are designed to be "protocol
> > > independent" do not have.  
> > 
> > I don't understand this at all. What URIs are "protocol
> > independent"? Every URI scheme I can think of has a "protocol"
> > because how else can you define it? 
> > 
> > > There is nothing intrinsic to a URI or a URI scheme that makes a URI
> > > function only as a name or as a locator (or both). That function
> > > depends on how it is *used* -- not on the URI itself. 
> > 
> > I agree with that
> > 
> > >  A URI that was
> > > intended primarily as a locator can still be used as a name,
> > 
> > Sure, "can be used". Separate question about how well it works.
> > 
> > >  and a URI
> > > that was intended primarily as a name can be used as a locator if a
> > > protocol becomes associated with it.  All that's needed is a way to
> > > resolve it.  
> > 
> > I don't know how to "associate a protocol" with a URI without
> > redefining the URI scheme, which seems generally like a bad idea.
> > Yes, you need a way to resolve it.
> > 
> > > The fact that an http URI can be readily used as a locator does not
> > > *reduce* its value as a name. 
> > 
> > XXX has a value. Knowing something about XXX doesn't increase 
> > or reduce its value.
> > 
> > >  It's potential use as a locator is in *addition* to its use as a name.  
> > 
> > Now I've lost your point here.
> > 
> > Larry
> > --
> > http://larry.masinter.net
> > 
> > 
> > 
> > 
> > 
> > 
-- 
David Booth, Ph.D.
Cleveland Clinic (contractor)

Opinions expressed herein are those of the author and do not necessarily
reflect those of Cleveland Clinic.
Received on Wednesday, 5 August 2009 17:11:05 UTC