Re: Review of new HTTPbis text for 303 See Other from Henrik Nordstrom on 2009-07-21 (www-tag@w3.org from July 2009)

From: Henrik Nordstrom <henrik@henriknordstrom.net>
Date: Tue, 21 Jul 2009 15:57:20 +0200
To: Dan Brickley <danbri@danbri.org>
Cc: Pat Hayes <phayes@ihmc.us>, "www-tag@w3.org WG" <www-tag@w3.org>, HTTP Working Group <ietf-http-wg@w3.org>
Message-Id: <1248184640.23873.315.camel@localhost.localdomain>
tis 2009-07-21 klockan 09:52 +0200 skrev Dan Brickley:

> if http://www.ihmc.us/users/phayes/PatHayes is directly Pat Hayes the 
> person (not a document about him)

And I would argue it's not. It's one possible interface to Pat Hayes,
but it's not the person as such. There may be many other URIs which
references different aspects of Pat Hayes.

And based on the URI I would say it's possibly the Person Pat Hayes in
the context of being a user at ihmc, and is likely to cease exist some
time after has left/quitted his position at imhc. Not Pat Hayes the
person in the general sense, or other unrelated context. But I don't
know as it's up to whoever defined and implemented that URI.

> should HTTP DELETE remove Pat from 
> the world, or just remove that URI from the universe of information?

As far as the HTTP specifications is concerned the DELETE is on the HTTP
resource as defined by the server, not the general term resource. For
resources having multiple URIs with the same meaning delete of one URI
does not neccesarily mean the other ones gets deleted as well as HTTP
does not say anything about how those relate to each other. It's up to
the server to define.

> What's a compliant implementation to do?

Well, at least I would not expect it remove Pat Hayes from the face of
the world only because someone sufficiently authorized asked for the
HTTP resource as published by the imhc http server to be removed.

> So would be perfectly reasonable to have two Web sites / services / 
> installations, call them www-A and www-B, run according to similar 
> readings of HTTP?:
> 
>   * Both of them agree that 
> http://www-a.ihmc.us/users/phayes/PatHayesAbout and 
> http://www-b.ihmc.us/users/phayes/PatHayesAbout are names/identifiers 
> for Pat Hayes (ie. the person whose mailbox is phayes@ihmc.us), rather 
> than for a page/document

Ok, if you say so.

>   * Both of them implement HTTP verbs that proxy Pat into the Web, by 
> allowing (through GETs, 303 redirections etc) representations of him to 
> be exposed and accessed via the HTTP protocol

Yes. Or other actions via other methods such as the hypothetical WALK,
RUN, SLEEP extension etc...


>   * One of them reads an HTTP DELETE on the Pat URIs as a request to 
> adjust the world such that www-a no longer shares any information about 
> Pat via http://www-a.ihmc.us/users/phayes/PatHayesAbout (ie. "forget 
> this resource mapping").

Yes.

>   * the other reads a DELETE on 
> http://www-b.ihmc.us/users/phayes/PatHayesAbout as a request to adjust 
> the world such that Pat is no longer in it. an altogether more serious, 
> expensive and irreversible action.

Yes. That's a possible (even if highly unlikely) real-world sideeffect
of that DELETE. HTTP does not define anything about it's effect on
actual resource beyond the operations of HTTP on HTTP resources. As far
as HTTP is concerned the result (in HTTP) is the same if the operations
is sent to the resource defined above, or to a resource which just
emulates that HTTP resource in the view/level of HTTP without having any
relation what so ever to any physical Pat Hayes.

> Is there a fact-of-the-matter about whether website / webmaster www-A or 
> www-B has the correct reading of HTTP? Or the two are perfectly free to 
> diverge in their readings?

The two are perfectly free to diverge as far as HTTP is concerned, as
long as the result at the HTTP level is consistent.

> If I am considering sending an HTTP DELETE 
> message to www-a and/or www-b, what information should I take into 
> account, while trying to determine how my messages will be understood? 
> Is there any way to find out?

There is no way within HTTP to find out what sideeffects an HTTP
operation may have. Some operations like GET is defined in such way that
they SHOULD NOT have sideeffects, but it's not a guarantee (only a
SHOULD, not a MUST, which means even a GET MAY have sideeffects on some
servers).

> If all HTTP verbs (including extensions) are always with regard to the 
> information-wrapping things, even if the URI is taken to name a "thing 
> in the non-digital world", this is important to know and to agree.

Imho this is clearly spelled out in the HTTP specification by it's
definition of the verb "resource".

> If 
> it's up to the Web server, that's important to know. If nobody knows and 
> it's all a bit vague still, that's also important to know. I don't think 
> we collectively have a clear account of these issues yet.

Everything which happens on the web server beyond what happens on the
HTTP message level is defined by the web server. HTTP does NOT touch how
web servers define resources or how the web server is to carry out
actions on those beyond the HTTP message syntax enabling these to be
represented in HTTP messages.

> > If the defined semantics of the URIs says the server should respond
> > differently then they in the world as defined by HTTP refer to different
> > resources, but possibly very closely related such.
> 
> So HTTP always interposes a wrapper / proxy entity-thing-object (sheesh, 
> we're running out of neutral words :) ...


> 
> ie. even if we all agree that
> http://www-a.ihmc.us:8080/users/phayes/PatHayesAbout
> http://www-a.ihmc.us:80/users/phayes/PatHayesAbout
> ...are two names for the self-same thing (namely Pat)

And I would argue they are not. These are names for some HTTP interface
to Pat, but those URIs or what they identify in them selves are NOT Pat.
The thing they identify may well be 1-1 related to Pat, but that's an
implementation detail of no concern to HTTP.

>  they are in 
> HTTP-speak inevitably going to be different (http:)"resources"? That's 
> my reading of your last post, at least.

HTTP or URI specifications does not enforce one or the other. It's
implementation defined if those are in fact just two URIs to the same
resource on the server, or if they in fact are two completely different
resource.

HTTP does not define, restrict or say anything else about sideeffects on
another URI than the currently accessed URI, with the sole exception
that HTTP does include support to indicate tha ONE other URI on the same
server may have changed as result of certain operations.

> (irrespective even of whether the same Apache webserver exposes 
> either/both of these, or different servers, or whatever - that's all 
> internal and not directly relevant)

Correct.

> (following through my example)
> 
> So, still in the world where we all "agree" that both 
> http://www-a.ihmc.us:8080/users/phayes/PatHayesAbout AND
> http://www-a.ihmc.us:80/users/phayes/PatHayesAbout "are Pat"...
> 
> 1. we have two different URIs
> 2. we have one wordly thing ("a URI resource?") that they name/identify; 
> a human person in this case.
> 3. we have two other kinds of thing (HTTP resources) that proxy/wrap 
> that person into the Web; each such "wrapping" is implemented by some 
> "HTTP server" thing that speaks HTTP to the digital world, and typically 
> has some private link to the single underlying thing named.
> 4. HTTP Verbs such as DELETE are understood in the context of one of 
> these (possibly many) bindings, rather than "in the abstract": the 
> server isn't getting a message saying "DELETE Pat Hayes" it is getting a 
> message saying "DELETE the HTTP resource /phayes/PatHayesAbout"
> 5. It isn't clear how much DELETing the server is expected to do; in OO 
> or SQL-backed sites, a DELETE might also cause the bound thing to be 
> removed, ie. information removed from some external backend system.

As far as HTTP is concerned that's an implementation detail.

> 6. Whether the HTTP client who sent can be considered to have 
> *requested* for anything more than the resource-to-thing mapping to be 
> DELETEd isn't clear.

Indeed.

But whatever defined (in human terms) the meaning of that URI is
responsible for publishing this information as suitable to the required
parties in a way that users would have a reasonable chance of
understanding what their actions will result in. If not publishing that
URI would be very irresponsible if it has significant and irrevocable
sideeffects on anything. 

> At this point I picture someone stood up in court of law, flapping a 
> printout of the HTTP/1.1 spec, saying "but but ... you DELETEd my 
> *car*... Sure, we agreed the HTTP resource identified my car, but all I 
> wanted to do was remove that *mapping* when I sent an HTTP DELETE to 
> /car/32".
> 
> Do HTTP/1.1 experts have a role in adjudicating in such disputes?

See above.

> How much deleting is http-justifiable?

Not the business of HTTP to define. That's defined by the meaning of the
accessed URI and it's implementation.

> HTTP DELETE is a destructive act, we can agree that I hope. To request 
> or to honour an HTTP DELETE request is to do something potentially 
> damaging (or potentially life-saving, even). If we don't know quite 
> clearly what an "HTTP DELETE" message is asking, how can anyone ever 
> risk sending one? Especially to complex Web services, that connect to 
> backend databases, sensors and to a rich world of ecommerce, users etc.

Sending DELETE usually requires a sufficient level of authorization,
which implies a bit of education on what DELETE really deletes in this
web application.

> Yet these messages are regularly sent and handled. I can only assume 
> they are typically interpreted conservatively, or by a tighter 
> client/server implicit understanding than is mandated by HTTP alone.

They are sent regularly as it's a very simple operation, with it's
effects easily defined on a per-resource basis, combined with the fact
that the user(agents) who are authorized to send these DELETE operations
also know the service they are deleting quite well.

> > In fact it intentionally does not care about any such concerns and leaves that to
> > the application of HTTP to any such entities. Anyone is free to define
> > HTTP applications for such entities, by defining HTTP resources mapping
> > to such entities as they please. HTTP only defines how one may interface
> > with those once defined in terms of HTTP resources. What relations those
> > HTTP resources have to any real-world entities is defined by that
> > application, not by HTTP.
> 
> Can the nature of those mapping be hinted at or otherwise revealed 
> *through* HTTP?

No.

Dosing so would require HTTP to dive into all possible meanings of URIs
which it does not intend to even touch. As already highlighted by Pat
any such definitions by HTTP would just restrict and confuse what kind
of resources HTTP may be applied to and in what ways HTTP may be applied
on those.

> So - just as any server that changes the *underlying* resource in 
> response to an HTTP GET is exceeding it's http-requested mandate, any 
> server that removes the *underlying*  real world (rather than 
> http-wrapping) resource on an HTTP DELETE is also exceeding what was asked?

HTTP does not care how far a DELETE operation actually deletes the URI
identified resource. All it cares about is that it's deleted from that
specific URI. What happens beyond that is outside the scope of HTTP and
implementation defined.

The case of GET having sideeffects is slightly different. In this case
HTTP for very good reasons recommends that GET operations should not
have any sideeffects beyond the retreival of the requested resource, but
it's not something HTTP can or will enforce.

> * do we agree that HTTP servers who change the underlying database 
> record after a GET are doing something that wasn't asked of them?

Yes, if that change has any significant meaning. I.e. updating a "view
count" is entirely fine, but erasing or singificantly modifying the data
is not recommended.

> * do we agree that HTTP servers who change the underlying database 
> record after a DELETE are doing something that wasn't asked of them?

No. The server is entirely free to execute the DELETE in any way it sees
fit as far as HTTP is concerned. Possible actions

- Flag the row in the SQL table as no longer published via HTTP
- Move that to another table of "deleted" car entries, possibly
published via another URI.
- Remove the relevant row(s) from the SQL database
- Remove the server entity that knows /car/001 is the first row in the
cars table but keeping the SQL database intact, but most likely keeping
the one that knows /car/002 as removing that one as well would be odd
even if permitted. The implementation may well map the DELETE operations
to the common /car/ server resource if the number is implemented as an
argument, but this is not really the intended meaning.

The HTTP specifications limits itself to say that after a successful
DELETE /car/001 then /var/001 can no longer be used to refer to that
car, at least not until some other operation (HTTP or via other means)
adds it back.

> To be explicit ... The Car example here is supposed to emphasise that we 
> run into these same inclarities, even when the concern is with purely 
> digital stuff. An HTTP server wrapped around a database server, for 
> example. Do we consider the records in the SQL server to be intimate 
> pieces of the "http resources" (and hence they live and die by the same 
> HTTP verbs), or are they somehow merely mapped/linked.

That's up to the implementation to define. But generally one intends
that a DELETE removes/deactivates the requested information and only
that.

> I can imagine sysadmins and Web developers who quite reasonably answer 
> such questions differently, and structure their data integrity policies 
> accordingly. This matter of understanding how deep a DELETE request 
> should go isn't one which only arises when we are talking about cars and 
> people... but also when dealing with backend car SQL records or people 
> directly entries.

And HTTP leaves that to be defined by whatever defines the semantics and
operations of the URI in question, intentionally not touching this.
Everything that happens beyond the HTTP endpoint in the server is
implementation defined and outside of HTTP specifications.

The URI specification also has quite a lot to say on this topic, but as
noted is somewhat more general than the HTTP specifications. The HTTP
specifications apart from the wire protocol also defines the http:// URL
scheme which is a limited subset of URI.

Regards
Henrik
Received on Tuesday, 21 July 2009 13:58:17 UTC