Re: Alternative dereference behavior from noah_mendelsohn@us.ibm.com on 2009-09-28 (www-tag@w3.org from September 2009)

From: <noah_mendelsohn@us.ibm.com>
Date: Mon, 28 Sep 2009 13:42:10 -0400
To: Jonathan Rees <jar@creativecommons.org>
Cc: www-tag@w3.org
Message-ID: <OF05553978.263A10C2-ON8525763F.005F344E-8525763F.00614078@lotus.com>
I need to work through your analysis in more detail, but on first skim it 
looks to be tackling the same set of issues that we picked up and put 
(tentatively) aside under the banner of ISSUE-49 (schemeProtocols) [1]. 

When I first joined the TAG, it seemed to me that clarifying the 
connection between schemes and dereference mechanisms (roughly, protocols) 
would be a good priority for the TAG.  I set out to capture good practice 
in a finding, the latest draft of which is at [2].  If you follow back the 
chain of pointers to the intial draft [3], you will find that it told a 
somewhat different story than [2]. 

In fact, each of the drafts that I wrote got quite significant, if not 
always consistent criticism from other members of the TAG and the 
community.  This was to some extent a reflection of my lack of expertise, 
but I gradually came to understand that it also reflected differences of 
opinion among very knowledgeable and influential experts on the TAG.  It 
seemed best to put the work aside until something closer to easy consensus 
emerged, and/or until my level of insight improved.  I had considered 
getting back to it last year, but my appointment as chair prevented that.

Note in particular that this statement in the early version [3] proved 
controversial:

"For schemes such as http and ftp, the association of a URI to a resource 
is defined in terms of the corresponding protocol. Thus, the resource 
identified by http://example.org/resource1 is by definition the one for 
which representations are returned (GET) or updated (PUT) when that URI is 
supplied as the HTTP Request-URI (see [RFC 2616]). Unless otherwise 
stated, this finding deals only with such protocol-associated URI 
schemes."

Accordingly, the later draft [2] changed the presentation (see paragraph 3 
of the Preface [4]).   Unfortunately, this formulation was also poorly 
received, though not necessarily by all the same people who disliked the 
first.

Anyway, if you want to tackle, I suggest you read not only the draft 
findings, flawed as they may be, but also the records of discussions and 
emails exchanged at the time (F2F reviews would have been shortly after 
the publication dates),  Also, I'd urge you to take a look at RFC 2718 and 
RFC 2718/2718bis, of which Larry is a co-author. (Larry, did the bis 
version ever get adopted?)  Anyway, this ground has been trod many times, 
though I don't think there was consensus on where the path might lead. 
FWIW: if you, Jonathan, were inclined to pick this work up, I would be 
happy to see it move forward.  I continue to think that it's very 
important, and a point of significant and recurring confusion.

Thanks.

Noah

[1] http://www.w3.org/2001/tag/group/track/issues/49
[2] http://www.w3.org/2001/tag/doc/schemeProtocols-2005-11-21.html
[3] http://www.w3.org/2001/tag/doc/schemeProtocols-2005-06-16.html
[4] http://www.w3.org/2001/tag/doc/SchemeProtocols.html#preface


--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------








Jonathan Rees <jar@creativecommons.org>
Sent by: www-tag-request@w3.org
09/28/2009 01:00 PM
 
        To:     www-tag@w3.org
        cc:     (bcc: Noah Mendelsohn/Cambridge/IBM)
        Subject:        Alternative dereference behavior


This message is pursuant to ACTION-312 which I took on at the F2F.

Roughly speaking, the question is: Does the canon say that the Web is
the authority for http: URI "dereference" (GET), or does it leave open the
possibility of conforming agents using mechanisms that give answers
at variance with what the Web would give?

Definitions (for present purposes):
  "the canon" = current IETF RFCs and W3C recs, including AWWW
  "the Web" = the usual URI/HTTP/DNS/IANA/ICANN combination

I took a look a several documents to see what "the
canon" has to say about this question.

- URI - RFC 3986
  http://www.ietf.org/rfc/rfc3986.txt

    Each URI begins with a scheme name that refers to a specification for
    assigning identifiers within that scheme.  As such, the URI syntax is
    a federated and extensible naming system wherein each scheme's
    specification may further restrict the syntax and semantics of
    identifiers using that scheme.

  Cites BCP35 (informatively).  Doesn't say how to find the registry.

- BCP35 refers to "the IANA URI scheme registry".
  http://tools.ietf.org/html/bcp35

- A search turns up the "IANA URI scheme registry" at
  http://www.iana.org/assignments/uri-schemes.html

  For the 'http' scheme, this cites RFC 2616.

- HTTP - RFC 2616
  http://www.ietf.org/rfc/rfc2616.txt

    3.2.2 The "http" [URL] scheme
    The "http" scheme is used to locate network resources via the
    HTTP protocol. The semantics are that the identified resource is
    located at the server listening for TCP connections on that port
    of that host ...

  (Note: this text is unchanged in HTTPbis revision 07, 2.1.1,
  http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-07#page-11 .)

  This text is written descriptively, not prescriptively, so it
  doesn't rule out *other* ways to use the scheme (e.g. with other
  protocols).  It doesn't give any special status to the HTTP
  protocol regarding http: URIs.  But even if taken as prescriptive,
  it doesn't say anything about a need to use DNS or to use any
  particular DNS root.  Probably these questions were considered to be
  out of scope (as they probably are).

- AWWW
  http://www.w3.org/TR/2004/REC-webarch-20041215/

  2. Identification: "Global naming leads to global network effects."

  I interpret this as saying that unusual dereference is OK as long as
  everyone does it consistently (ha!).

  3.1. Using a URI to Access a Resource.

  This section talks descriptively about how to
  "dereference" a URI, specifically mentioning HTTP.  But the discussion
  is in the form of a descriptive example, not a spec or even a GPN.

  2.2.2.1. URI ownership:

    "The approach taken for the "http" URI scheme, for example, follows 
the
    pattern whereby the Internet community delegates authority, via the
    IANA URI scheme registry and the DNS, over a set of URIs with a common
    prefix to one particular owner. One consequence of this approach is
    the Web's heavy reliance on the central DNS registry."

  This sentence is descriptive, not prescriptive, although one could
  take the lack of criticism or discussion as some kind of endorsement.
  Taken as prescriptive, it rules out alternative dereference, but I
  don't think it was meant in this way.

  The rest of the section does not seem to bear on the question.

- IAB Technical Comment on the Unique DNS Root
  http://www.icann.org/correspondence/iab-tech-comment-27sept99.htm

    "Allowing multiple public DNS roots would raise a very strong
    possibility that users of different ISPs who click on the same link on
    a web page could end up at different destinations, against the will of
    the web page designers."

    "... if they wish to make use of names uniquely defined for the global
    Internet, they have to fetch that information from the global DNS
    naming hierarchy, and in particular from the coordinated root servers
    of the global DNS naming hierarchy."

  The operative word here is "if".

- RFC 2860 - defines relationship between IETF and IANA (ICANN),
  suggesting (but not saying) that the IANA DNS root has some special 
status.

- The Self-describing Web
  http://www.w3.org/2001/tag/doc/selfDescribingDocuments.html

  The "standard algorithm" herein specifies use of DNS and HTTP, although
   not the IANA DNS root. (On the other hand this document is not
   written as a spec, and does not act as one. The tone is tutorial and
   descriptive.)

- There has been much community discussion of this topic (see
  e.g. http://esw.w3.org/topic/UriSpaceSquatting ).

Summary:

  . The "normative" documents only describe pieces of the puzzle, not
    how it all fits together, so the question of consistent
    dereference or "meaning" is out of scope in all of them.

  . General advice (AWWW, IAB TC) is that if you "split the web" by making 
URIs
    non-global you are doing something really tragic.  A change
    in the rules for dereference would theoretically be OK, as long as 
everyone
    made the change in step (ha!).

Editorializing:

  . The forces that lead to inconsistency would probably not be
    swayed by anyone arguing from a standards perspective, any more
    than you'd expect in any other similar situation of defiance.

  . On the other hand the claim that there's an unbreakable connection
    between http: and unusual dereference is sometimes put forth
    as a reason to stay away from use of http: URIs.

  . It may be useful to try to understand the pressures to diverge
    from the usual stack and to ask whether *any* instances of alternative
    dereference are for the greater good, or at least not harmful to it;
    and to ask if any better alternatives are available to those who
    are departing from the stack, or if they could be created.

  . The various use cases (ISP domain squatting, government policies,
    SmartCache, alternative DNS roots and classes, semweb liberties,
    aggressive user agents or proxies that search caches and archives,
    the persistence "insurance" situation we discussed at the F2F, etc.)
    are not necessarily comparable as the benefits are to
    different actors in each case, and the locus and level of interference
    varies.

Note. I'm well aware of Roy FIelding's "http: isn't HTTP" argument
and don't disagree with it. My question is
about correctness of the answers, not the necessity to use
any particular mechanism.

-Jonathan
Received on Monday, 28 September 2009 17:42:56 UTC