Defining the meaning of headers associated with a request body from Duncan Cragg on 2012-01-18 (ietf-http-wg@w3.org from January to March 2012)

From: Duncan Cragg <duncan@cilux.org>
Date: Wed, 18 Jan 2012 15:51:17 +0000
To: ietf-http-wg@w3.org
Message-ID: <4F16EA75.1020705@cilux.org>
I'm not certain of the protocol here, so if it's OK I'll just wade in 
and hope for the
best...

I'd like to know if http-bis will tighten up the meanings of various 
headers on a
request that contains a body: POST and PUT in particular. I'm wondering 
how they will
compare to their meaning on a GET /response/.


Example
-------

For example, consider this GET response:

  GET /x HTTP/1.1

  HTTP/1.1 200 OK
  Content-Type: text/plain
  Content-Length: 100
  Cache-Control: max-age=3600
  Etag: "1"
  Content-Location: http://foo.com/y

  ..

Now suppose I had this very similar request with a body on a POST or a PUT:

  POST|PUT /z HTTP/1.1
  Content-Type: text/plain
  Content-Length: 100
  Cache-Control: max-age=3600
  Etag: "1"
  Content-Location: http://foo.com/y

  ..

Clearly, Content-Type and Content-Length are fine, and mean exactly the 
same as on the
GET response. But what of Cache-Control, Etag and Content-Location?

Note that I'm not considering the meaning of the target (/z, here). I'm 
assuming only
that someone has a valid reason to want to tie up the metadata to its 
entity, for either
PUT or POST. Maybe I want to notify via POST an indexer at /z that there 
is an interesting
resource at http://foo.com/y , which had an Etag of "1" and TTL of 3600 
when I fetched it?


RFC2616
-------

The current spec says, for PUT, there is 'Must-Understand' on all 
Content-* headers, and
that 'entity-headers in the PUT request SHOULD be applied to the 
resource created or
modified by the PUT'.

Thus it gives hope of interpreting a sensible meaning for the Expires 
(similar in use to
the max-age), Last-Modified (similar in use to Etag) and 
Content-Location headers, at
least in the context of a PUT request: implying that they are associated 
with the
request entity. POST has nothing similar to say.

Now the Cache-Control on a request normally attempts to effectively 
modify the
Cache-Control of the subsequent response, rather than, as here, being 
intended to
describe the TTL or lifetime of the request entity. Thus it may clash 
for such a use, or,
since there is no real need for the existing meaning for PUT and POST, 
it could be
specified as applying to the request entity in particular for those 
methods, especially
since we may interpret the old Expires this way.

Similarly, we are allowed to suggest a Last-Modified header value, but 
not an Etag,
which has a different, non-entity or 'response-header' status, even 
though used in a
very similar way.

Finally, we have: "The meaning of the Content-Location header in PUT or 
POST requests is
undefined; servers are free to ignore it in those cases." Which is slightly
contradictory to the hope offered by the preceding text!


Bis
---

I'm not going to pretend I've deeply read and tracked the work here - 
it's a full-time
job, I'm sure. I tried to find relevant material, though.

Here, for PUT, we have 'Unrecognized header fields SHOULD be ignored 
(i.e., not saved as
part of the resource state).' A superficial skim indicates that much of 
the above wording
has been removed. Tickets 79 & 102 seem to be related. So it seems to be 
less clear than
before that we can associate such headers this way?

Or are we now more free, to decide to 'recognize header fields and save 
them with the
resource state'? Which ones?

Here's more on Content-Location on requests (presumably either PUT or 
POST, although the
comment is specifically about trying to manipulate PUT):

    .. Content-Location .. MAY be
    interpreted by the origin server as an indication of where the user
    agent originally obtained the content of the enclosed representation
    .. the Content-Location
    cannot be used as a form of reverse content selection that identifies
    only one of the negotiated representations to be updated...

Which seems fine. But sadly:

    A Content-Location field received in a request message is transitory
    information that SHOULD NOT be saved with other representation
    metadata for use in later responses.  The Content-Location's value
    might be saved for use in other contexts, such as within source links
    or other metadata.

I don't understand 'source links or other metadata', however. Perhaps 
some extra wording
would help future readers? Maybe we can still submit something using 
POST to our indexer?


Summary
-------

So there is some information in the RFC and in Bis, but it isn't a 
complete or, to me,
very clear set, particularly around decoration of POST bodies/entities 
with metadata.

What do the core of this list think about all this?

Cheers!

Duncan Cragg
Received on Wednesday, 18 January 2012 15:51:48 UTC