Re: Fwd: Review of new HTTPbis text for 303 See Other from noah_mendelsohn@us.ibm.com on 2009-07-31 (ietf-http-wg@w3.org from July to September 2009)

From: <noah_mendelsohn@us.ibm.com>
Date: Fri, 31 Jul 2009 16:14:46 -0400
To: Pat Hayes <phayes@ihmc.us>
Cc: HTTP Working Group <ietf-http-wg@w3.org>, "www-tag@w3.org WG" <www-tag@w3.org>
Message-ID: <OF72C24FC5.D0DCA8F4-ON85257604.006EFAD0-85257604.006F3734@lotus.com>
I'm not sure whether the TAG is interested in spending time on this 
question in the near future, as it's taken quite a bit in the past, but I 
will put an item on an upcoming agenda to at least get the sense of the 
group.   Given that some members with important perspectives on this are 
gone a lot in August, I'm not sure whether we'll wind up doing more this 
month than deciding to await their return.  In any case, I'll schedule an 
initial, brief, discussion.

Noah

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------








Pat Hayes <phayes@ihmc.us>
Sent by: www-tag-request@w3.org
07/31/2009 03:25 PM
 
        To:     "www-tag@w3.org WG" <www-tag@w3.org>, HTTP Working Group 
<ietf-http-wg@w3.org>
        cc:     (bcc: Noah Mendelsohn/Cambridge/IBM)
        Subject:        Fwd: Review of new HTTPbis text for 303 See Other


Folks

I do not expect a reply, but I put it to y'all, is this stance (below) in 
fact consistent with what the HTTP and TAG groups have published 
concerning URIs and what they are intended to identify? In particular, is 
it consistent with http-range-14?  It seems to me it is clearly not, and 
that this fact is important to what both groups publish in their 
specifications and recommendations.

As a concrete point to focus discussion, I gather that Henrik's view is 
that in the case where an HTTP URI identifies a non-information resource, 
but resolves to an HTTP endpoint, it must follow that the "requested 
resource" (in the sense of HTTPbis) of the URI in the GET request is an 
information resource interfaced to the HTTP endpoint, and so cannot be the 
same as the non-information resource which the URI "identifies" in the 
sense of RFC 3986. I also gather, from off-line emails, that Richard 
Cygniak would disagree with this interpretation. (I hope I do not 
misrepresent anyone here.) Apparently, therefore, two people both quite 
expert in reading the HTTP spec do not interpret the phrase "requested 
resource" in the same way, leaving me and I suspect others in a state of 
complete confusion. 

Pat Hayes

--------


Begin forwarded message:

From: Henrik Nordstrom <henrik@henriknordstrom.net>
Date: July 31, 2009 1:38:18 PM CDT
To: Pat Hayes <phayes@ihmc.us>
Subject: Re: Review of new HTTPbis text for 303 See Other

I am not even going to answer you this time. Go back, read the HTTP
specifications, and come back when you have something concrete which
actually relate to the specifications as such to talk about. If there is
something you want to change then make concrete suggestions on how (and
make sure to base it on current drafts).

As already said HTTP does not care and have no intentions to ever care
what kind of "resource" an URI maps to, semantics of that or what it
denotes. All HTTP specifies is an interface language for talking to the
server publishing this over HTTP, anything else is irrelevant to HTTP.

HTTP has it's definition of the term "resource", like it or not. Within
the HTTP specification the word "resource" has the meaning as defined by
HTTP. Any meaning defined elsewhere is irrelevant as far as HTTP is
concerned.

But to address your concerns the term resource will quite likely barely
be used at all in the revised HTTP specifications, or at least much less
than it is today.

I have said what I have to say to you on the subject. Further responses
talking about semantics, connections to real-world or even abstract
things, or taking statements in the specifications outside the context
of the specification where they are written will be silently ignored.

Regards
Henrik

fre 2009-07-31 klockan 13:00 -0500 skrev Pat Hayes:
On Jul 20, 2009, at 8:37 PM, Henrik Nordstrom wrote:

mån 2009-07-20 klockan 13:16 -0500 skrev Pat Hayes:

Apparently you have not understood my point, above. There are cases
where NO implementation of ANY KIND can POSSIBLY map a URI to the
resource it identifies. So one cannot simply toss this issue over the
wall to some other, unspecified, "implementer". Its nothing to do 
with
implementation.

For the kinds of URIs that HTTP deals with it can, as far as HTTP is
concerned with the definition of "resource" as used by http which for
technical specification writing reasons is slightly narrower than the
general URI definition of resource.

It is not 'slightly' narrower. The general definition of 'resource' 
has it meaning absolutely anything, real or imaginary, concrete or 
abstract, that can be referred to and distinguished from other things 
(the last five words inserted to make sense of the 'identify' 
language). This is not a 'slightly' wider sense than the one that you 
apparently have in mind. It is a spectacularly, transfinitely, almost 
cosmically wider sense. It is as wide as human language knows how wide 
to make a distinction.

I understand, but I am not talking about 'effects', but about 
semantics.

And HTTP is completely ignorant of any semantics that the URIs 
accessed
via HTTP may have.

What HTTP cares about is if there may be effects on the resource state
by actions requested by HTTP. (i.e. DELETE is assumed to have certain
effect when executed on the http resource)

My point is that you cannot completely ignore the rest of the world.

When writing a technical specification you can, as the relevant part 
of
the world is then the parts that the specification intends to cover 
and
only those parts.

BUt when your specification is about a language or a notation, and in 
part about what that notation means, and when in fact that notation is 
being used to mean things in a certain (wide) category, then such 
usage does fall within the scope of your specification, and you should 
deal with it, if only by stating explicitly that you are not going to 
consider it.  But to ignore it and pretend that it isn't there, by re- 
defining an existing terminology so as to avoid interfacing with other 
specifications, is both intellectually dishonest and socially 
irresponsible. Sorry, strong language, but I really do feel strongly 
about this, having had to face up to this issue myself when writing 
specifications.


BUt you yourself said that I was thinking about the wrong kind of
meaning, not the kind of meaning intended by the spec. Really, you
cannot have it both ways. Please make up your mind which is your
position, and stick to it.

HTTP places absolutely no meaning at all on the general term 
"resource"
as used in english

Never mind the English meaning, which is now lost to history in these 
debates.

or even the "resource" as defined by URI
specifications.

But does it not strike you as inappropriate to simply ignore the 
normative definitions used in defining technical terms which you 
yourself use? What is the point of writing specifications if other 
specification writers are free to redefine the terminology which my 
specification defines normatively?


The only kind of resource HTTP places any meaning on at all is the 
very
much narrowed down "resource" as defined by the HTTP specifications, 
and
even then it's just as an abstract concept to simplify the world
description somewhat. To HTTP it does not matter at all what those
resources are, only if they can be accessed and/or transmitted via 
HTTP

I understand all this. But there are cases where the resource 
identified by the HTTP URI is, in fact, not one of these. That is - 
regardless of its true metaphysical nature, which I agree we will not 
delve into - whatever it really is - it is not something that can be 
accessed and/or transmitted. Such cases are REAL, they are out there 
in the actual world. If your spec refuses to acknowledge this, then it 
is simply an incomplete specification; and as such, it is less useful 
that it can and should be.

or not as defined by whoever "owns" the resource and who also defines
their intended URI semantics (again completely outside of HTTP
specifications).

I know it does not wish to, but http-range-14 has left it no choice
but to care about it, at least a little.

Has it? Care to explain that again then, using the term meanings as
defined by HTTP.

http-range-14 specifies an HTTP-defined action (the use of a 303 
redirect) be used under circumstances which arise when the URI in 
question identifies a thing which is not a resource according to the 
narrow sense of 'resource' which you are arguing HTTP should restrict 
itself to.

The semantics of URIs has nothing at all to do with layering. It is
part of the specification **of URIs themselves**. When anyone talks
about the relationship between a URI and the resource it identifies,
or denotes, or refers to, or is used to request, or indeed pretty 
much
any relationship between a URI and a resource, they are talking about
semantics.

Ok. My point here is that HTTP does not care about those semantics.

And my point is that it must, at least to the minimal extent required 
to state a normatively required action under circumstances which can 
only be described by referring to those semantics. (And also - though 
this is more controversial - I would argue that in fact, HTTP is 
already concerned with the semantics of URIs, even though it refuses 
to acknowledge this elementary fact.)

All
it possibly cares about is that the server is the ultimately 
responsible
for executing that semantic mapping

This is a conceptual mistake. Semantic mappings are not executable.

of URI to resource (in URI terms),
and that this mapping results in HTTP network accessible resources
(which you seem to sometimes call a representation where HTTP calls 
it a
resource

I hope not. I try to keep the resource/representation distinction 
clear. There are however two or more notions of what counts as a 
'representation': when in doubt, I use the now-standard circumlocution 
awww:representation to refer to the narrow sense used in REST and (I 
assume) HTTP.

) and their possible representations as defined by HTTP.

Because the HTTP specs also talk about this. And it is generally a
good idea, when two specs talk about the same thing using the same
language, that some effort is expended to make sure they are 
intending
to use this language in the same way.

Unfortunately if a new term is to be defined for every slight 
variation
there is of the term "resource" in this I am afraid it would be even
more confusing.

As I have tried to emphasize, this is not a 'slight variation', and in 
any case I doubt if there are going to be any more changes once we 
have established that a resource can be absolutely anything.


There is very good reasons why "resource" in the URI specifications
broader than "resource" in HTTP specifications and both being narrower
than the general English "resource".

No, the English meaning is actually narrower than the URI 
specifications sense, which is highly idiosyncratic and of fairly 
recent coinage (see the Wikipedia entry of 'resource' for a good 
history.)


I understand, but it refers to resources. If for example the spec 
says
(as I believe it does, currently) that if the server has available a
transmittable representation of the requested resource, then it must
return that with a 200 code, this statement makes no reference to the
URI that was used to identify the resource.

The URI reference is implicit as the whole text is in the context of
builiding a response to a request for a specific URI. Trying to read 
the
text outside that context is non-sense.

PLease read what I wrote more carefully. To say that the server has 
available a transmittable representation of the requested resource, 
without referring to the URI that was used to request the resouirce, 
is not nonsensical in any way at all. It implies, as I read it, that 
this condition holds independently of the URI, so that if the same 
resource is requested by different URIs then this condition either 
holds for both of them or for neither of them. So it rules out the 
possible case where the condition holds for one URI request but not 
for the other URI request, with a different URI but the same resource.



....


No, it is quite on the point. If the server can respond differently 
to
different URIs which both identify the same resource, that changes 
the
game.

If the defined semantics of the URIs says the server should respond
differently then they in the world as defined by HTTP refer to 
different
resources, but possibly very closely related such.

It all boils down to the definition of what a resource is, and the 
HTTP
resource is as I already explained NOT as general as the URI resource.

No, the situation is far worse than this. According to your previous 
paragraph, we can have a situation where two URIs identify the same 
resource according to the URI spec, but must be understood by HTTP as 
corresponding to different resources. Just narrowing the sense of 
'resource' will not get you this horrible situation. This, if indeed 
you are right (nobody else has suggested this idea, so I hope you are 
wrong) makes the HTTP and URI specifications sharply **incompatible** 
with one another.


In the terminology defined by HTTP the difference between an
(HTTP-)URI
and resource is more of a special case, and not related to any of 
what
you talk about.

It is related. In fact it is critical.

To me when talking about HTTP it's not.

Ah. That certainly makes sense, and indeed is what I understood 
when I
first became involved in these URI-meaning debates. But this position
is not consistent with what is said about resources in other
standards.  And moreover, if this is true, then the http-range-14
decision is simply untenable. For in that case, the 'requested
resource' is something that cannot possibly be inside a server. 
Julius
Caesar, let us say, might be the requested resource.

And is what we have been saying all along. Trying to use Julius Casear
as an example when talking about HTTP resources just does not make any
sense as the two by definition can not be the same thing.

And yet, there are HTTP URIs which identify Julius Caesar, in the 
sense of "identify" used in the URI specs. And, moreover, Http- 
range-14 actually places some conditions on what HTTP must do with 
such a URI, **because** it identifies a resource of that 'off-Web' 
kind. So the behavior of HTTP depends, in part, and can only be 
accurately specified by mentioning, the situation where a URI 
identifies a "non-HTTP" resource. And this DOES make sense. In fact , 
it is actually TRUE.

Yes it's a simplification, but defining or assume anything about
resources anywhere beyond that is outside of HTTP scope and nothing
HTTP
cares about and is left to the application of HTTP and/or URIs.

No, sorry, that position is simply untenable. See me earlier replies
to Richard on this point. HTTP cannot hide inside a 'layer' and
pretend it is only dealing with computational identifiers which 'map'
to computational artifacts. Both the uses and the specifications of
http URIs have extended its scope beyond that narrow purview.

And I disagree. The semantics of the application of HTTP is and should
be much broader than the semantics as used by the HTTP wire protocol.

The operation of HTTP, according to http-range-14, is ALREADY
concerned with how URIs denote real-world entities beyond the
operation of http.

And my viewpoint is that that's completely outside of what the HTTP
specifications or operations is concerned about. In fact it
intentionally does not care about any such concerns and leaves that to
the application of HTTP to any such entities.

And, to repeat, that view is untenable, precisely because semantics is 
not about computation. Your notions of layering simply do not apply 
when you are purporting to make decisions based upon meanings: which 
you are, whether you like it or not. HTTP-range-14 has made this 
choice for you. Don't argue with me, if you want to keep your nice 
tidy 'layering': go back and argue with whoever made the http-range-14 
ruling.


Anyone is free to define
HTTP applications for such entities, by defining HTTP resources 
mapping
to such entities as they please. HTTP only defines how one may 
interface
with those once defined in terms of HTTP resources. What relations 
those
HTTP resources have to any real-world entities is defined by that
application, not by HTTP.

(Not, by the way, with how *resources* map to real-
world resources. In the cases in question, the relationship between
the URI and the real-world entity is direct, not mediated through 
some
other resource inside a server.)

And in my world that's an impossible condition, as those real-world
resources do not exists in HTTP terms

They do exist, you are just refusing to look at them.

and need to be mediated via some
server defined HTTP resource to be accessible via HTTP, or requests 
for
that HTTP-URI would simply result in a 404 until a such HTTP 
resource is
implemented for mapping to the real-world resource.


But the phrase "that can be used to interact with a resource" ALREADY
limits what a resource can be. You cannot interact with the number 27
or with Julius Caesar.

Please note that this part is just explanatory text trying to explain
the relationship between HTTP and URI specifications, not a normative
definition.

The definition of "resource" in the HTTP specifications is found in 
the
terminology section.


     resource

             A network data object or service

That is not the definition of resource used in RFC3986, however.

What I said, and why I highlighted it here. The definitions are
different, and you need to use the right definition for each
specification or you'll get confused when discussing borderline issues
like this.

For most practical considerations in the use of HTTP the difference is
negligible however.

Not any more. Thats why I'm making such a fuss about it. And BTW, 
these are not 'borderline' issues.


HTTP
URIs can identify resources in the broader RFC3986 sense; and for
those URIs, there may well not be any resource in this narrow sense
identified by the URI at all. And yet, still, a GET on them might
resolve to an http endpoint. What does the http spec say about such a
case? What is the endpoint to do?

Yes it's correct that HTTP URIs can identify resources in the broader
sense, but not something the HTTP specifications as such concerns 
itself
about. HTTP specifications end at the http endpoint and it's http 
mapped
resource.

Hmm, so in these cases, the HTTP URI identifies **two** different 
resources? The URI one and the HTTP one? Is that what you are saying? 
I doubt if many people on the TAG would like this.


And my point was only
that in this case, it is at best confusing any maybe actually wrong 
to
say that IF the server has a transmittable representation available
then it must send it with a 200 code.

And we don't. We say "suitable to be transmitted", which is quite
different from "transmittable" as there is representations that MAY be
transmittable in theory but which is still deemed unsuitable (by the
http server endpoint or it's policy)

OK, I wasnt meaning to confuse this issue, just using 'transmittable' 
as a shorthand. Sorry.


For what are we to say about the
second case? It all depends on what is meant by the "requested
resource".

The difference between a "resource" (as identified by a specific URI)
and an HTTP "requested resource" not what you think. The two differ 
when
there are multiple independent representations available by the exact
same URI, such as content in different language based on the language
preferences of the client etc.

But they also differ, presumably, when the identified resource is 
Julius Caesar. Or do they? I really have no way to know.


(It seems to me that HTTP rather shoots itself in the foot by this
insistence that its specs must not refer to or even acknowledge the
existence of resources that are other than network data or services,
since it has defined out of existence the very case that it should be
able to refer to, if only to explicitly say that its not going to
specify what happens in it. This is rather an ostrich way of writing
specs, to pretend that all of the world that you don't like doesn't
exist, so that you aren't obliged to say anything about it.)

I don¨t agree here. HTTP specifications places a technical limit on 
what
the word "resource" means within the HTTP specifications, which is
purely a technical definition.

And says nothing about the cases when HTTP URIs are used to refer to 
other kinds of resource. Which is an ostrich way of writing 
specifications.



My response is that
it's the servers role to select a suitable representation of the
resource based on the meaning of the URI.

Does that mean, of the resource that the URI identifies? And does
"identify" mean, denote?

Sorry if I am unclear some times. English is not at all my native
language, and the word "denote" is not really part of my limited 
English
vocabulary.

Sorry. 'denotes' AKA 'refers to', 'identifies', 'is a name for', is 
used as a name for'. I will try to remember to say 'refers to' or 
'identifies'.


>From my understanding of "denote" it's:

Of the HTTP resource the HTTP-URI identifies.

Where identifies as in is in the sense of how an Universal Resource
Identifier identifies a network-accessible resource, ignoring 
completely
what that resource denotes in the broader sense.

But you cannot ignore this completely when the URI does *in fact* 
identify something other than a network-accessible resource.


??!!? Of course two different URIs can refer to the same resource. If
HTTP is built on a different supposition, then HTTP is simply wrong.

Sure they can. The points here is:
* that HTTP does not care if they do

OK, but...

* and that HTTP has the view that if the semantics of those URIs is
different then they do in fact NOT refer to the same resource

That simply does not make sense. What you say here (seem to say here) 
is logical nonsense. Look, if two names refer to the same thing (call 
it a resource if you like) then there is only one thing that they both 
refer to. So to say that 'as far as X is concerned' they refer to 
different things, is simply meaningless. There aren't two things there 
to be referred to, in this case. So, sorry: they DO IN FACT refer to 
the same resource. If HTTP thinks otherwise, then HTTP is simply 
WRONG. There is no finer-grained identity than identity itself.

If you think I am technically mistaken on this topic, please refer me 
to some published work which makes semantic sense of the view of 
identity that you are basing this claim upon. (And as I have had this 
discussion many times before, if you are going to cite LISP at me: 
identity in LISP is EQ, not EQUAL.)

They may
refer to different facets of some larger/broader resource but not the
same.

I have no idea what you mean by a facet of a resource. What 'facets' 
does Richard or J.C. have?


If those URIs happens to really refer to the same resource both URIs
will respond identically, and further is indistinguishable from two
identical copies of the same resource.

?? I am trying to make sense of this, and not sure I have it right.
Take the case in my email to Richard, where there is a URI denoting
him, Richard C., the actual person. (Note, this is not a topic that
HTTP gets to rule out or refuse to acknowledge, because this can in
fact happen. My question is about what HTTP should do in such a 
case.)

HTTP handles the case by restricting it's notion of resource to the
network-accessible resource used for interfacing with Richard C.

First, there is no such resource: Richard C. isn't the kind of thing 
that you can 'interface' with over a network. (Well, maybe by email, 
but then we would be talking about his emailbox.)  Second, its not 
important what HTTP 'restricts' itself to: the fact remains that (in 
the case described) the URI does **in fact** identify Richard, not 
some network-accessible thingie that stands in some relationship to 
him. (That thingie might have its own URI, of course, which does 
identify it.) So if what you say here is correct, I presume that HTTP 
simply treats the URI as not having a corresponding http:resource at 
all. Right? Because it is a basic assumption of the whole Web 
architecture that the resource identified by a URI is unique. So if 
the URI identifies Richard, it can't also identify the thingie.

That
resource MAY or MAY NOT have an actual interface with Richard C, HTTP
does not care and need not care for it's operations.

In this case, according to Richard, he is the 'requested resource'.
The GET request is directed to a server which has some other resource
inside it, call this resource R. R is a resource in your narrower
sense (a network data object or service), but this is *not* the
requested resource in this case, even though the URI resolves to (the
server containing) R.

In terms of HTTP R is the requested resource.

I thought you might say that. So what then is the relationship between 
a requested resource and the resource identified by a URI? Apparently 
they can be different, so we have at least two resources somehow 
connected with a URI. Are there any more?


(Do you agree?) In this case, http-range-14
requires that the server emit a 303 coded response, because even
though there may well be a transmittable (awww-) representation of R,
there is none of Richard C., and he is the requested resource.

That's up to R (or whoever/whatever defines R) to decide.

No, it is not. It is simply a fact that there is no transmittable 
awww:representation of Richard. He isn't the kind of thing that has 
such representations.

But in any case, it appears that, on your account, the whole action of 
HTTP need have **absolutely nothing** to do with the resource that the 
URI identifies (in this case, Richard.) So tell me: here I am with a 
URI, and in order to find out more about what it identifies, I use it 
in an HTTP GET, and something happens. What, if anything, can I 
conclude about the resource that my URI identifies? AFAIK, the only 
possible answer is, on your account: nothing at all. Its all going to 
be mediated by the resource that the URI requests, and that need have 
nothing to do with what it identifies. Nor need the response codes 
have any connection with the resource identified by the URI: indeed, 
if the requested (not identified) resource has a 200-level-suitable 
awww:representation, then that is what the server must send me back, 
even though neither it not its source (that is, in the above example, 
neither the awww:representation of R not R itself) need have anything 
whatever to do with the identified resource (Richard). Right?

I agree this picture has a certain elegance and simplicity, but it 
makes complete nonsense of almost everything that has been said and 
written about URIs and resources for the past decade. It means that 
the picture of Web architecture promoted by the TAG is sharply and 
fatally different from that supported by HTTP.

Anyone else like to comment on this?

Pat

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes








------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973 
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Friday, 31 July 2009 20:15:40 UTC