Re: Review of new HTTPbis text for 303 See Other from Henrik Nordstrom on 2009-07-21 (ietf-http-wg@w3.org from July to September 2009)

From: Henrik Nordstrom <henrik@henriknordstrom.net>
Date: Tue, 21 Jul 2009 03:37:25 +0200
To: Pat Hayes <phayes@ihmc.us>
Cc: "www-tag@w3.org WG" <www-tag@w3.org>, HTTP Working Group <ietf-http-wg@w3.org>
Message-Id: <1248140245.12498.339.camel@localhost.localdomain>
mån 2009-07-20 klockan 13:16 -0500 skrev Pat Hayes:

> Apparently you have not understood my point, above. There are cases  
> where NO implementation of ANY KIND can POSSIBLY map a URI to the  
> resource it identifies. So one cannot simply toss this issue over the  
> wall to some other, unspecified, "implementer". Its nothing to do with  
> implementation.

For the kinds of URIs that HTTP deals with it can, as far as HTTP is
concerned with the definition of "resource" as used by http which for
technical specification writing reasons is slightly narrower than the
general URI definition of resource.

> I understand, but I am not talking about 'effects', but about semantics.

And HTTP is completely ignorant of any semantics that the URIs accessed
via HTTP may have.

What HTTP cares about is if there may be effects on the resource state
by actions requested by HTTP. (i.e. DELETE is assumed to have certain
effect when executed on the http resource)

> My point is that you cannot completely ignore the rest of the world.

When writing a technical specification you can, as the relevant part of
the world is then the parts that the specification intends to cover and
only those parts. 

> BUt you yourself said that I was thinking about the wrong kind of  
> meaning, not the kind of meaning intended by the spec. Really, you  
> cannot have it both ways. Please make up your mind which is your  
> position, and stick to it.

HTTP places absolutely no meaning at all on the general term "resource"
as used in english or even the "resource" as defined by URI
specifications.

The only kind of resource HTTP places any meaning on at all is the very
much narrowed down "resource" as defined by the HTTP specifications, and
even then it's just as an abstract concept to simplify the world
description somewhat. To HTTP it does not matter at all what those
resources are, only if they can be accessed and/or transmitted via HTTP
or not as defined by whoever "owns" the resource and who also defines
their intended URI semantics (again completely outside of HTTP
specifications).

> I know it does not wish to, but http-range-14 has left it no choice  
> but to care about it, at least a little.

Has it? Care to explain that again then, using the term meanings as
defined by HTTP.

> The semantics of URIs has nothing at all to do with layering. It is  
> part of the specification **of URIs themselves**. When anyone talks  
> about the relationship between a URI and the resource it identifies,  
> or denotes, or refers to, or is used to request, or indeed pretty much  
> any relationship between a URI and a resource, they are talking about  
> semantics.

Ok. My point here is that HTTP does not care about those semantics. All
it possibly cares about is that the server is the ultimately responsible
for executing that semantic mapping of URI to resource (in URI terms),
and that this mapping results in HTTP network accessible resources
(which you seem to sometimes call a representation where HTTP calls it a
resource) and their possible representations as defined by HTTP.

> Because the HTTP specs also talk about this. And it is generally a  
> good idea, when two specs talk about the same thing using the same  
> language, that some effort is expended to make sure they are intending  
> to use this language in the same way.

Unfortunately if a new term is to be defined for every slight variation
there is of the term "resource" in this I am afraid it would be even
more confusing.

There is very good reasons why "resource" in the URI specifications
broader than "resource" in HTTP specifications and both being narrower
than the general English "resource".

> I understand, but it refers to resources. If for example the spec says  
> (as I believe it does, currently) that if the server has available a  
> transmittable representation of the requested resource, then it must  
> return that with a 200 code, this statement makes no reference to the  
> URI that was used to identify the resource.

The URI reference is implicit as the whole text is in the context of
builiding a response to a request for a specific URI. Trying to read the
text outside that context is non-sense.

>  The only way I can read it  
> is as saying that this applies **independently of the URI**, and hence  
> to any URI which identifies the same requested resource. So the  
> 'context' you mention is irrelevant.

You can not take random sentences from any specification and apply them
or draw conclusions without taking into account the context within the
specifications where that sentence was found.

> In English, it would be the topic of a request for something that  
> could be used for some purpose, as in "Please pass me the salt."

Yes, but that something may be pretty much anything, substantial,
abstract, quantifiable or non-quantifiable.

> No, it is quite on the point. If the server can respond differently to  
> different URIs which both identify the same resource, that changes the  
> game.

If the defined semantics of the URIs says the server should respond
differently then they in the world as defined by HTTP refer to different
resources, but possibly very closely related such. 

It all boils down to the definition of what a resource is, and the HTTP
resource is as I already explained NOT as general as the URI resource.

> > In the terminology defined by HTTP the difference between an  
> > (HTTP-)URI
> > and resource is more of a special case, and not related to any of what
> > you talk about.
> 
> It is related. In fact it is critical.

To me when talking about HTTP it's not.

> Ah. That certainly makes sense, and indeed is what I understood when I  
> first became involved in these URI-meaning debates. But this position  
> is not consistent with what is said about resources in other  
> standards.  And moreover, if this is true, then the http-range-14  
> decision is simply untenable. For in that case, the 'requested  
> resource' is something that cannot possibly be inside a server. Julius  
> Caesar, let us say, might be the requested resource.

And is what we have been saying all along. Trying to use Julius Casear
as an example when talking about HTTP resources just does not make any
sense as the two by definition can not be the same thing.

> > Yes it's a simplification, but defining or assume anything about
> > resources anywhere beyond that is outside of HTTP scope and nothing  
> > HTTP
> > cares about and is left to the application of HTTP and/or URIs.
> 
> No, sorry, that position is simply untenable. See me earlier replies  
> to Richard on this point. HTTP cannot hide inside a 'layer' and  
> pretend it is only dealing with computational identifiers which 'map'  
> to computational artifacts. Both the uses and the specifications of  
> http URIs have extended its scope beyond that narrow purview.

And I disagree. The semantics of the application of HTTP is and should
be much broader than the semantics as used by the HTTP wire protocol.

> The operation of HTTP, according to http-range-14, is ALREADY  
> concerned with how URIs denote real-world entities beyond the  
> operation of http.

And my viewpoint is that that's completely outside of what the HTTP
specifications or operations is concerned about. In fact it
intentionally does not care about any such concerns and leaves that to
the application of HTTP to any such entities. Anyone is free to define
HTTP applications for such entities, by defining HTTP resources mapping
to such entities as they please. HTTP only defines how one may interface
with those once defined in terms of HTTP resources. What relations those
HTTP resources have to any real-world entities is defined by that
application, not by HTTP.

>  (Not, by the way, with how *resources* map to real- 
> world resources. In the cases in question, the relationship between  
> the URI and the real-world entity is direct, not mediated through some  
> other resource inside a server.)

And in my world that's an impossible condition, as those real-world
resources do not exists in HTTP terms and need to be mediated via some
server defined HTTP resource to be accessible via HTTP, or requests for
that HTTP-URI would simply result in a 404 until a such HTTP resource is
implemented for mapping to the real-world resource.


> But the phrase "that can be used to interact with a resource" ALREADY  
> limits what a resource can be. You cannot interact with the number 27  
> or with Julius Caesar.

Please note that this part is just explanatory text trying to explain
the relationship between HTTP and URI specifications, not a normative
definition.

The definition of "resource" in the HTTP specifications is found in the
terminology section.


> >        resource
> >
> >                A network data object or service
> 
> That is not the definition of resource used in RFC3986, however.

What I said, and why I highlighted it here. The definitions are
different, and you need to use the right definition for each
specification or you'll get confused when discussing borderline issues
like this.

For most practical considerations in the use of HTTP the difference is
negligible however.

> HTTP  
> URIs can identify resources in the broader RFC3986 sense; and for  
> those URIs, there may well not be any resource in this narrow sense  
> identified by the URI at all. And yet, still, a GET on them might  
> resolve to an http endpoint. What does the http spec say about such a  
> case? What is the endpoint to do?

Yes it's correct that HTTP URIs can identify resources in the broader
sense, but not something the HTTP specifications as such concerns itself
about. HTTP specifications end at the http endpoint and it's http mapped
resource. 

> And my point was only  
> that in this case, it is at best confusing any maybe actually wrong to  
> say that IF the server has a transmittable representation available  
> then it must send it with a 200 code.

And we don't. We say "suitable to be transmitted", which is quite
different from "transmittable" as there is representations that MAY be
transmittable in theory but which is still deemed unsuitable (by the
http server endpoint or it's policy)

>  For what are we to say about the  
> second case? It all depends on what is meant by the "requested  
> resource".

The difference between a "resource" (as identified by a specific URI)
and an HTTP "requested resource" not what you think. The two differ when
there are multiple independent representations available by the exact
same URI, such as content in different language based on the language
preferences of the client etc.

> (It seems to me that HTTP rather shoots itself in the foot by this  
> insistence that its specs must not refer to or even acknowledge the  
> existence of resources that are other than network data or services,  
> since it has defined out of existence the very case that it should be  
> able to refer to, if only to explicitly say that its not going to  
> specify what happens in it. This is rather an ostrich way of writing  
> specs, to pretend that all of the world that you don't like doesn't  
> exist, so that you aren't obliged to say anything about it.)

I don¨t agree here. HTTP specifications places a technical limit on what
the word "resource" means within the HTTP specifications, which is
purely a technical definition.


> > My response is that
> > it's the servers role to select a suitable representation of the
> > resource based on the meaning of the URI.
> 
> Does that mean, of the resource that the URI identifies? And does  
> "identify" mean, denote?

Sorry if I am unclear some times. English is not at all my native
language, and the word "denote" is not really part of my limited English
vocabulary.

>From my understanding of "denote" it's:

Of the HTTP resource the HTTP-URI identifies.

Where identifies as in is in the sense of how an Universal Resource
Identifier identifies a network-accessible resource, ignoring completely
what that resource denotes in the broader sense.

> ??!!? Of course two different URIs can refer to the same resource. If  
> HTTP is built on a different supposition, then HTTP is simply wrong.

Sure they can. The points here is:
 * that HTTP does not care if they do
 * and that HTTP has the view that if the semantics of those URIs is
different then they do in fact NOT refer to the same resource They may
refer to different facets of some larger/broader resource but not the
same.

If those URIs happens to really refer to the same resource both URIs
will respond identically, and further is indistinguishable from two
identical copies of the same resource.

> ?? I am trying to make sense of this, and not sure I have it right.  
> Take the case in my email to Richard, where there is a URI denoting  
> him, Richard C., the actual person. (Note, this is not a topic that  
> HTTP gets to rule out or refuse to acknowledge, because this can in  
> fact happen. My question is about what HTTP should do in such a case.)  

HTTP handles the case by restricting it's notion of resource to the
network-accessible resource used for interfacing with Richard C. That
resource MAY or MAY NOT have an actual interface with Richard C, HTTP
does not care and need not care for it's operations.

> In this case, according to Richard, he is the 'requested resource'.  
> The GET request is directed to a server which has some other resource  
> inside it, call this resource R. R is a resource in your narrower  
> sense (a network data object or service), but this is *not* the  
> requested resource in this case, even though the URI resolves to (the  
> server containing) R.

In terms of HTTP R is the requested resource.

>  (Do you agree?) In this case, http-range-14  
> requires that the server emit a 303 coded response, because even  
> though there may well be a transmittable (awww-) representation of R,  
> there is none of Richard C., and he is the requested resource.

That's up to R (or whoever/whatever defines R) to decide.

> From what you say here, I think you may have a different picture in  
> mind, where the "requested resource" in this case must be R itself.

Yes.

> That indeed is the picture I had originally, when I entered this  
> thread. But in that case, Roy's suggested wording is inconsistent with  
> http-range-14, because in this case it would prevent the server  
> issuing a 303, as required by that decision.

Roys answer is based on the HTTP definition of resource, not in the
broader sense.

If the server do have a an available representation of the resource
meant to be returned by access to this URI then it SHOULD be returned,
not an 303 response.

But even in the broader sense, if the server do have an available
representation of the resource but it's not meant to be returned and
should not be returned by access to this URI then it's not a valid
representation of the resource identified by this URI as the two (by URI
definitions, assuming here that there is another HTTP accessible URI for
accessing that representation) are different resources.

> Now, just to clarify, it seems to me that this case could arise even  
> when R itself does not exist at all: there is simply a server which is  
> able to recognize, somehow, that the URI in question identifies  
> something other than a network data object or service, and so it  
> cannot return a transmittable awww:representation of it.

In which case there still is an R in a technical sense as the server can
identify the resource (in the broader sense) and somehow act based on it
or information about it, thereby building an internal R http resource
with the details of what the requested URI refers to.

>  In HTTP  
> terms, no http:resource exists at that endpoint to construct a  
> transmittable representation from. Do you agree that this is a  
> possibility, and is consistent with a 303 response?

Yes.

> Or would you say  
> that in this case, the only acceptable response would be a 400-code?

No.

> Frankly. Im not very interested in any definition of 'resource' other  
> than that in the URI specification, and I don't think that any  
> published specification which refers to URIs should use any other  
> definition. HTTP URIs are, in fact, being used to refer to resources  
> in this broader sense.

The difference is because the broader sense has many implications which
has no impact on HTTP, and using the broader sense would complicate the
HTTP specifications as we would then need to apply these limitations
everywhere, resulting in even more confusing text.

> If the HTTP spec refuses to acknowledge this,  
> then the world will simply go on and ignore the http spec in critical  
> cases. Which would be a pity, but would not be a complete disaster.

The rest of the world is obviously free (and in many cases should)
ignore the HTTP definition of resource as it's of no relevance to them
just as the possible existence of real-world resources has no relevance
to the HTTP specifications.

Regards
Henrik
Received on Tuesday, 21 July 2009 01:38:08 UTC