Re: Alternative dereference behavior from noah_mendelsohn@us.ibm.com on 2009-09-30 (www-tag@w3.org from September 2009)

From: <noah_mendelsohn@us.ibm.com>
Date: Wed, 30 Sep 2009 19:02:32 -0400
To: Jonathan Rees <jar@creativecommons.org>
Cc: www-tag@w3.org
Message-ID: <OF568A28A6.D731A132-ON85257641.007C673D-85257641.007E2A65@lotus.com>
Jonathan Rees asks:

> But it focuses on consistency between protocols, without defining what
> "correct" is, which is the deep question here - is "correct" defined
> by one protocol suite + DNS root, and mirrored by other protocols; or
> is it up to the "URI owner" independent of protocol (as suggested by
> AWWW 2.2.2.1); or is "correct" application dependent, or subject to
> user choice (e.g. through user's choice of ISP or DNS root).

With the caveat (again) that my thoughts on this seem not to be 
universally embraced...

The direction I was pursuing started with the resource authority.  Let's 
ignore for the moment proxies, and also cases in which servers are 
malicious or sloppy (not coded with careful attention to the protocol and 
its correct semantics).  In all other cases, any information served 
through any protocol is thus sourced from the resource authority, or by 
someone to whom that authority has delegated.  So, I turn your question 
around:  the information served is by definition correct, because the 
agent doing the serving is authoritative and we're positing that it acts 
in good faith.  Thus, I as a client can assume that the information I 
receive is correct, regardless of the protocol used;  if it wasn't the 
server couldn't have offered it. 

That's what led to the controversial proposal that any URI could in 
principle be served by any protocol.

More specifically, consider an agent responsible for serving resource R. 
The agent is considering using protocol P1.  If the agent finds that the 
operations of the protocol can be correctly implemented for that resource 
(the necessary information is available and the protocol can be used to 
serve it), then the agent can use the protocol.  A client can assume that 
the information received is "correct", at least insofar as the protocol is 
robust (e.g. HTTP is not robust in the face of certain forms of DNS cache 
staleness or pollution.)   I used the term "faithfully serving" in my 
drafts to refer to the case where an agent in good faith correctly 
implemented the operations of a protocol to serve a resource.  If a second 
protocol P2 could be used to serve or update the resource, then that 
protocol can be used instead or in addition.

In this formulation, the ability to use more than one protocol (retrieval 
strategy) for a given resource is an emergent property of the resource and 
the protocols, not directly of the URI scheme (except insofar as the 
scheme restricts the nature of the resource identified).   Since the 
pertinent RFCs pretty much claim that it's always appropriate to serve an 
http-scheme resource with the HTTP protocol, it follows that no correct 
serving of a such a resource using another protocol should ever contradict 
the information one would retrieve using HTTP.  It's the agent serving the 
resource that is responsible for serving it faithfully, and thus ensuring 
that this is so in practice.

I >think< the above line of reasoning does allow limited use of private 
retrieval mechanisms, but only insofar as these can be shown to yield 
results that would be consistent with all legal operations that might be 
attempted on the public Web.  In the case of resources that might be 
served with HTTP, I think this boils down to saying that your 'private' 
retrieval scheme must in fact follow the rules for being an HTTP proxy 
cache for that resource.  The burden is on you as implementor of the 
private retrieval scheme to determine whether the resource you're serving 
may ever be observable by others using HTTP, and if so, to follow the 
rules for HTTP-cache implementation.

That's the line of reasoning I was pursuing, and when it didn't meet with 
easy agreement I let it go.  Anyway, that's how I think about the 
"correctness" question.

Noah

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------








Jonathan Rees <jar@creativecommons.org>
09/28/2009 05:25 PM
 
        To:     noah_mendelsohn@us.ibm.com
        cc:     www-tag@w3.org
        Subject:        Re: Alternative dereference behavior


OK, this is a propos, since it says that you should get the "correct"
outcome for operations such as GET no matter what protocol you use.
But it focuses on consistency between protocols, without defining what
"correct" is, which is the deep question here - is "correct" defined
by one protocol suite + DNS root, and mirrored by other protocols; or
is it up to the "URI owner" independent of protocol (as suggested by
AWWW 2.2.2.1); or is "correct" application dependent, or subject to
user choice (e.g. through user's choice of ISP or DNS root).

Note that I restricted my question to GET and http: URIs, so the
many-schemes-one-protocol issue doesn't come up, and I didn't ask what
the "correct" resource is, only what a "correct"
representation/resource correspondence is.

The specs don't say, and the best we have for "correct" is "do the
obvious thing and be a good citizen." That makes any use of the word
"correct" suspect. This is probably as it should be, but it doesn't
give guidance in the annoying situations I'm concerned about. Not that
I really expected it to...

Jonathan

On Mon, Sep 28, 2009 at 1:42 PM,  <noah_mendelsohn@us.ibm.com> wrote:
> I need to work through your analysis in more detail, but on first skim 
it
> looks to be tackling the same set of issues that we picked up and put
> (tentatively) aside under the banner of ISSUE-49 (schemeProtocols) [1].
>
> When I first joined the TAG, it seemed to me that clarifying the
> connection between schemes and dereference mechanisms (roughly, 
protocols)
> would be a good priority for the TAG.  I set out to capture good 
practice
> in a finding, the latest draft of which is at [2].  If you follow back 
the
> chain of pointers to the intial draft [3], you will find that it told a
> somewhat different story than [2].
>
> In fact, each of the drafts that I wrote got quite significant, if not
> always consistent criticism from other members of the TAG and the
> community.  This was to some extent a reflection of my lack of 
expertise,
> but I gradually came to understand that it also reflected differences of
> opinion among very knowledgeable and influential experts on the TAG.  It
> seemed best to put the work aside until something closer to easy 
consensus
> emerged, and/or until my level of insight improved.  I had considered
> getting back to it last year, but my appointment as chair prevented 
that.
>
> Note in particular that this statement in the early version [3] proved
> controversial:
>
> "For schemes such as http and ftp, the association of a URI to a 
resource
> is defined in terms of the corresponding protocol. Thus, the resource
> identified by http://example.org/resource1 is by definition the one for
> which representations are returned (GET) or updated (PUT) when that URI 
is
> supplied as the HTTP Request-URI (see [RFC 2616]). Unless otherwise
> stated, this finding deals only with such protocol-associated URI
> schemes."
>
> Accordingly, the later draft [2] changed the presentation (see paragraph 
3
> of the Preface [4]).   Unfortunately, this formulation was also poorly
> received, though not necessarily by all the same people who disliked the
> first.
>
> Anyway, if you want to tackle, I suggest you read not only the draft
> findings, flawed as they may be, but also the records of discussions and
> emails exchanged at the time (F2F reviews would have been shortly after
> the publication dates),  Also, I'd urge you to take a look at RFC 2718 
and
> RFC 2718/2718bis, of which Larry is a co-author. (Larry, did the bis
> version ever get adopted?)  Anyway, this ground has been trod many 
times,
> though I don't think there was consensus on where the path might lead.
> FWIW: if you, Jonathan, were inclined to pick this work up, I would be
> happy to see it move forward.  I continue to think that it's very
> important, and a point of significant and recurring confusion.
>
> Thanks.
>
> Noah
>
> [1] http://www.w3.org/2001/tag/group/track/issues/49
> [2] http://www.w3.org/2001/tag/doc/schemeProtocols-2005-11-21.html
> [3] http://www.w3.org/2001/tag/doc/schemeProtocols-2005-06-16.html
> [4] http://www.w3.org/2001/tag/doc/SchemeProtocols.html#preface
>
>
> --------------------------------------
> Noah Mendelsohn
> IBM Corporation
> One Rogers Street
> Cambridge, MA 02142
> 1-617-693-4036
> --------------------------------------
>
>
>
>
>
>
>
>
> Jonathan Rees <jar@creativecommons.org>
> Sent by: www-tag-request@w3.org
> 09/28/2009 01:00 PM
>
>        To:     www-tag@w3.org
>        cc:     (bcc: Noah Mendelsohn/Cambridge/IBM)
>        Subject:        Alternative dereference behavior
>
>
> This message is pursuant to ACTION-312 which I took on at the F2F.
>
> Roughly speaking, the question is: Does the canon say that the Web is
> the authority for http: URI "dereference" (GET), or does it leave open 
the
> possibility of conforming agents using mechanisms that give answers
> at variance with what the Web would give?
>
> Definitions (for present purposes):
>  "the canon" = current IETF RFCs and W3C recs, including AWWW
>  "the Web" = the usual URI/HTTP/DNS/IANA/ICANN combination
>
> I took a look a several documents to see what "the
> canon" has to say about this question.
>
> - URI - RFC 3986
>  http://www.ietf.org/rfc/rfc3986.txt
>
>    Each URI begins with a scheme name that refers to a specification for
>    assigning identifiers within that scheme.  As such, the URI syntax is
>    a federated and extensible naming system wherein each scheme's
>    specification may further restrict the syntax and semantics of
>    identifiers using that scheme.
>
>  Cites BCP35 (informatively).  Doesn't say how to find the registry.
>
> - BCP35 refers to "the IANA URI scheme registry".
>  http://tools.ietf.org/html/bcp35
>
> - A search turns up the "IANA URI scheme registry" at
>  http://www.iana.org/assignments/uri-schemes.html
>
>  For the 'http' scheme, this cites RFC 2616.
>
> - HTTP - RFC 2616
>  http://www.ietf.org/rfc/rfc2616.txt
>
>    3.2.2 The "http" [URL] scheme
>    The "http" scheme is used to locate network resources via the
>    HTTP protocol. The semantics are that the identified resource is
>    located at the server listening for TCP connections on that port
>    of that host ...
>
>  (Note: this text is unchanged in HTTPbis revision 07, 2.1.1,
>  http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-07#page-11 
.)
>
>  This text is written descriptively, not prescriptively, so it
>  doesn't rule out *other* ways to use the scheme (e.g. with other
>  protocols).  It doesn't give any special status to the HTTP
>  protocol regarding http: URIs.  But even if taken as prescriptive,
>  it doesn't say anything about a need to use DNS or to use any
>  particular DNS root.  Probably these questions were considered to be
>  out of scope (as they probably are).
>
> - AWWW
>  http://www.w3.org/TR/2004/REC-webarch-20041215/
>
>  2. Identification: "Global naming leads to global network effects."
>
>  I interpret this as saying that unusual dereference is OK as long as
>  everyone does it consistently (ha!).
>
>  3.1. Using a URI to Access a Resource.
>
>  This section talks descriptively about how to
>  "dereference" a URI, specifically mentioning HTTP.  But the discussion
>  is in the form of a descriptive example, not a spec or even a GPN.
>
>  2.2.2.1. URI ownership:
>
>    "The approach taken for the "http" URI scheme, for example, follows
> the
>    pattern whereby the Internet community delegates authority, via the
>    IANA URI scheme registry and the DNS, over a set of URIs with a 
common
>    prefix to one particular owner. One consequence of this approach is
>    the Web's heavy reliance on the central DNS registry."
>
>  This sentence is descriptive, not prescriptive, although one could
>  take the lack of criticism or discussion as some kind of endorsement.
>  Taken as prescriptive, it rules out alternative dereference, but I
>  don't think it was meant in this way.
>
>  The rest of the section does not seem to bear on the question.
>
> - IAB Technical Comment on the Unique DNS Root
>  http://www.icann.org/correspondence/iab-tech-comment-27sept99.htm
>
>    "Allowing multiple public DNS roots would raise a very strong
>    possibility that users of different ISPs who click on the same link 
on
>    a web page could end up at different destinations, against the will 
of
>    the web page designers."
>
>    "... if they wish to make use of names uniquely defined for the 
global
>    Internet, they have to fetch that information from the global DNS
>    naming hierarchy, and in particular from the coordinated root servers
>    of the global DNS naming hierarchy."
>
>  The operative word here is "if".
>
> - RFC 2860 - defines relationship between IETF and IANA (ICANN),
>  suggesting (but not saying) that the IANA DNS root has some special
> status.
>
> - The Self-describing Web
>  http://www.w3.org/2001/tag/doc/selfDescribingDocuments.html
>
>  The "standard algorithm" herein specifies use of DNS and HTTP, although
>   not the IANA DNS root. (On the other hand this document is not
>   written as a spec, and does not act as one. The tone is tutorial and
>   descriptive.)
>
> - There has been much community discussion of this topic (see
>  e.g. http://esw.w3.org/topic/UriSpaceSquatting ).
>
> Summary:
>
>  . The "normative" documents only describe pieces of the puzzle, not
>    how it all fits together, so the question of consistent
>    dereference or "meaning" is out of scope in all of them.
>
>  . General advice (AWWW, IAB TC) is that if you "split the web" by 
making
> URIs
>    non-global you are doing something really tragic.  A change
>    in the rules for dereference would theoretically be OK, as long as
> everyone
>    made the change in step (ha!).
>
> Editorializing:
>
>  . The forces that lead to inconsistency would probably not be
>    swayed by anyone arguing from a standards perspective, any more
>    than you'd expect in any other similar situation of defiance.
>
>  . On the other hand the claim that there's an unbreakable connection
>    between http: and unusual dereference is sometimes put forth
>    as a reason to stay away from use of http: URIs.
>
>  . It may be useful to try to understand the pressures to diverge
>    from the usual stack and to ask whether *any* instances of 
alternative
>    dereference are for the greater good, or at least not harmful to it;
>    and to ask if any better alternatives are available to those who
>    are departing from the stack, or if they could be created.
>
>  . The various use cases (ISP domain squatting, government policies,
>    SmartCache, alternative DNS roots and classes, semweb liberties,
>    aggressive user agents or proxies that search caches and archives,
>    the persistence "insurance" situation we discussed at the F2F, etc.)
>    are not necessarily comparable as the benefits are to
>    different actors in each case, and the locus and level of 
interference
>    varies.
>
> Note. I'm well aware of Roy FIelding's "http: isn't HTTP" argument
> and don't disagree with it. My question is
> about correctness of the answers, not the necessity to use
> any particular mechanism.
>
> -Jonathan
Received on Wednesday, 30 September 2009 23:00:36 UTC