Re: HTTP method+status code combos and their (http) meaning. from Jonathan Rees on 2011-01-24 (public-awwsw@w3.org from January 2011)

From: Jonathan Rees <jar@creativecommons.org>
Date: Mon, 24 Jan 2011 14:35:23 -0500
To: nathan@webr3.org
Cc: AWWSW TF <public-awwsw@w3.org>
Message-ID: <AANLkTi=Xj5AXUfkuGJLCk94rA+yVmTmfkrgO=m3WoquU@mail.gmail.com>
On Mon, Jan 24, 2011 at 11:14 AM, Nathan <nathan@webr3.org> wrote:
> Nathan wrote:
>>
>> The same mail in plain text http://webr3.org/http-combinations.txt -
>
> Some mini findings, things I'm sure about after analysing HTTP closely.
>
> 1: It is impossible to tell, via HTTP, whether at a given instant, a
> resource has one, or many, representations - thus it is impossible to say
> "the representation", only "a representation". That is to say, by looking at
> an HTTP request/response pair, or even a set of them, it's impossible to
> prove that a resource has a single representation at one time. (as in, an
> HTTP URI never identifies a representation, even at a single instant, HTTP
> does not provide a way to "say" that).

Right. The key words there are "via HTTP".

> 2: It is impossible to tell, via HTTP, if a resource has never existed
> (yet), the closest you can ever get is that is the unknown state (404 for
> instance) or yes there was a resource, but it's gone now (410!)
>
> 3: The lifetime of a resource, the number of states a resource has had, and
> the times at which each state was changed, is impossible to determine by
> looking at HTTP messages, you cannot tell that resource X was created at
> time t1, changed to state n+1 at t2, n+3 at t3 and then reached it's final
> state (gone) at time t4.

The notion of 'state' is not part of 2616.  It's in Fielding's REST
writings and (very weakly) in AWWW, but it never found its way into
the HTTP spec. So we have no guarantee that apparent 'change' is due
to a change in 'state', as opposed to being random, or a change in the
server itself but not the resource.

That is, it is possible to use 2616 without following Roy's advice to
use REST as a software architecture.

> 4: It is impossible to tell if two HTTP resources are equal, or provably
> assert this, for two resources to be the same, they would have to have
> identical properties, which include interfaces, messages sent forwards and
> backwards,

In order for the "Moby Dick is an IR" theory to work, the messages are
not necessarily properties of the resource in any sense that affects
identity.  That is, you could have two URIs, both "identifying" Moby
Dick, with quite different HTTP behavior (selection of file types,
book design, pagination, etc.). This seems counterintuitive but it's
completely consistent with 2616.

> state changes via HTTP and other server defined methods
> (temporal, randomness, the temperature in a room) and so forth. The best you
> can assert (but cannot determine via HTTP message data) is that two
> resources share the same purpose.

how would you do that?

> example of 3 & 4:
> One of the websites I manage, uses a publishing system to create static
> files, and serve them via two UI servers, both servers have the same specs,
> the same os same file system layouts on the same make and model of hard
> disks, the same versions of all software, they are effectively clones - the
> files are also "the same", and accessed via the same URIs (round robin dns),
> however, the ETags and Last-Modified differ on each server, and even if I
> synchronised these headers, and all headers, the fact would remain that
> yesterday i restarted one server and not the other, that for a second when
> the resources were created, one was there and the other was not; that if you
> got all the requests and response from each server, they would be a
> different set, and the resources /would/ be different - because, they are
> different. Same purpose yes, same things, no.

Yes, this is a good use case.  It is rare that you want to talk about
a particular physical deployment of an info-resource. Even when you
do, the URI may not give adequate resolution - consider the case of
multiple A records. The thing of interest lies somewhere in between
what the deployer *means* to do, and what is *actually* deployed.
(What they *say* they are doing is probably of little value, except
for particularly competent and trustworthy agents!)

> 5: (obvious) HTTP is a messaging protocol, it deals with addresses, not
> names, thus when an GET results in a 301 Moved Permanently it means "moved
> to a new address" (just as humans do with buildings and mail redirects), not
> "moved to a new name" (which makes no sense), or "also known as" (which
> would be temporally invalid).

This seems related to the distinction I'm trying to draw between "the
info resource served from a URI" and "the resource to which we mean to
refer when we use a URI referentially".

The physical address metaphor doesn't work very well, as TimBL has
pointed out many times. Although even addresses aren't locators - did
you know that aircraft carriers have zip codes?  In what sense does
the zip code serve to "locate" the aircraft carrier? Best to not go
down this road.

> At this moment, that's the most I can conclude and be 100% sure about,
> there's a lot more, but I don't want to say it until I can prove it to be
> true.
>
> Roughly, I'm quite sure that even the term "representation of" doesn't hold,
> and can only hold in very very specific circumstances, namely when the term
> "resource" is given a fixed meaning of "a file on the file system"
(don't forget the media type and content language)
> and the
> set of HTTP methods and messages is constrained to a subset that allows
> this, precluding negotiation of all kinds, the POST method, and any kind of
> "dynamic" anything. I'm also quickly coming to the conclusion that thinking
> of a response message as anything other than just that, is dangerous and
> leads to incorrect statements being made, for instance it's impossible to
> prove that this holds:
>
>  <http://example.org/foo.html> media:type "text/html" .
>
> because there /may/ be another representation sitting there that you didn't
> GET (various other reasons including auth by IP, cookie, accept headers,
> temporal/random negotiation, all kinds of things), and thus, one should only
> be making statements like this about the message returned, by the server, in
> response to a certain request.

Yes, beware what conclusions you draw about a resource from one 200
response. It will tell you what the resource isn't, but will never
help you figure out what it is.  E.g. if you get a 'representation' of
Jabberwocky, you will know that the URI shouldn't be used to refer to
Moby Dick, but you'll have no idea whether it's suitable as a
reference to Jabberwocky or any other particular resource that 'has'
that representation. That's not the same as saying that the URI is no
good as a reference. You need some kind of consistency in order to
successfully assert something like dc:title or dc:creator, and any
time you say something like this you're making a prediction about the
future, and the future is always uncertain (this is where I always
mention Popper).  You choices are
1. Take a gamble.  Maybe it *looks* like the page is going to stable
and consistent long enough for your purposes, and the cost of your
metadata assertion being misinterpreted is small enough to justify the
risk.
2. Obtain inside information.  If you can audit the workings of the
server(s), and get an SLA or other assurance of stability, go for it.
3. Seize control.  As long as you are the URI owner, you can arrange
for the representations to be as convincing as you want them to be.

Here's a funny: Prove, under the assumptions of 2616, that there is
more than one resource. I don't think you can. As far as I can tell,
there might be just a single resource, for which *all* representations
are authorized, and servers are just picking and choosing which ones
to send based on the URI.

Jonathan

> Will get there, and follow on later in some form..
>
> Best,
>
> Nathan
>
Received on Monday, 24 January 2011 19:35:54 UTC