ACTION-685: message/http, message/external-body and/or use of WARNING headers for 3.1.4 from Francois Daoust on 2008-03-21 (public-bpwg-ct@w3.org from March 2008)

From: Francois Daoust <fd@w3.org>
Date: Fri, 21 Mar 2008 12:36:50 +0100
To: public-bpwg-ct <public-bpwg-ct@w3.org>
Message-ID: <47E39DD2.8000701@w3.org>
The problem
-----------
We want the CT-proxy to be able to send an HTTP request that defines 
modified headers, but also includes the original ones.

Embedding the original headers must not break existing content. In other 
words, it must be transparent for content providers that are not aware 
of our guidelines.


About the "embedded" possibilities
----------------------------------
I can't think of any way to embed things in an HTTP request using 
alternate content-types and still respect the required transparency 
above-mentioned.

To make things hopefully clear, let me use a pseudo-syntax to describe a 
typical POST HTTP request (the syntax may be slightly incorrect, but 
that should not be important):
   "POST [URI] HTTP/1.1
   User-Agent: [modified User-Agent header]
   Accept: [modified Accept header]
   Content-Type: [depends on POST data]
   [Other modified headers]
   [Other original headers]

   [POST data]"

1/ message/http is about enclosing an HTTP message as the body of 
another one. The problem is this body would replace the body of our 
request, leading to:

   "POST [URI] HTTP/1.1
   User-Agent: [modified User-Agent header]
   Accept: [modified Accept header]
   Content-Type: message/http
   [Other modified headers]
   [Other original headers]

   POST [URI] HTTP/1.1
   User-Agent: [original User-Agent header]
   Accept: [original Accept header]
   Content-Type: [depends on POST data]
   [Other original headers]

   [POST data]"

Web servers receiving this request will pass it to the underlying 
application (ASP, PHP, JSP, servlet, ISAPI extension, whatever), with 
the enclosed HTTP message as the POST data. If the application was not 
coded with that in mind (why would it be?), [POST data] is lost in the 
newly received body.


2/ message/external-body is about referencing an external *body* in an 
HTTP message. We don't have any "body" to reference, we have 
"headers"... Anyway, same problem as above, emphasized by the fact that 
the request is not supposed to contain a body:

   "POST [URI] HTTP/1.1
   User-Agent: [modified User-Agent header]
   Accept: [modified Accept header]
   Content-Type: message/external-body;access-type=local-file;
     name="[path]"
   Content-ID: <[content ID]>
   [Other modified headers]
   [Other original headers]"

3/ multipart/mixed messages could be used to be able to have both the 
original headers and the request body as part of one HTTP request, but 
again, it can't be transparent to the unaware applications:

   "POST [URI] HTTP/1.1
   User-Agent: [modified User-Agent header]
   Accept: [modified Accept header]
   Content-Type: multipart/mixed;boundary="ct"
   [Other modified headers]
   [Other original headers]

   --ct
   Content-Type: [depends on POST data]

   [POST data]

   --ct
   Content-Type: message/http

   POST [URI] HTTP/1.1
   User-Agent: [original User-Agent header]
   Accept: [original Accept header]
   Content-Type: [depends on POST data]
   [Other original headers]
   --ct--"

A Web server can't infer that [POST data] needs to be passed to the 
underlying application rather than the message/http part.

4/ What about GET requests? We may think we could use 1/ in that case 
(no POST data in that case). The HTTP RFC doesn't state that request 
bodies can't be used in GET requests. In practice, this means that the 
behavior of agents regarding GET requests with bodies in unpredictable, 
and that we can't rely on anything.


Using the Warning HTTP header
-----------------------------
Although (as usual?) that's a bit unclear when one reads the HTTP RFC, 
Warnings typically apply more to HTTP responses than to HTTP requests. 
But that seems harmless anyway.

The "214" (Transformation Applied) code is about modifications of the 
message *body* (coding, content-type or other), so we would need another 
code to say "Headers modified" (228 where 28 stands for CT on a 
keypad?). Here again, the procedure to follow to make a registration for 
such a code is unclear, but should not be that a big deal (as compared 
with Cache-Control extensions for instance).

The value of the header is a quoted string, supposedly intended for 
humans but opened for whatever we may want to stuff in it, so we could 
go for a:

Warning: 228 [hostname] "{User-Agent: [original one], {Accept: [original 
one]}"

The maybe good thing about using a Warning header is that is looks less 
"official" than using an additional X- HTTP header. It looks more as an 
"informational note" than as a needed one.



My thoughts
-----------
Even though I can't find any reply for the moment to the "why not?" 
question other than "because it's dirty", I still don't see why we would 
need to pass on the original headers:

a. our guidelines are along the lines of "do not transform unless...". 
So if the CT-proxy decided to change the HTTP headers, it should have 
good reasons to do so.

b. the recommended content tasting approach using original headers at 
first ensures - even though there may be cases where it's not respected 
- the content provider will be given a chance to answer the original 
request.

c. the use of the "Vary" header in the response may be used to handle 
the case where the CT-proxy actually sent the modified request first. 
Upon receipt of such a header, the CT-proxy should re-try the content 
tasting approach.

d. the communication between the actors should be clear: embedding two 
sets of headers leads to confusion. I tend to prefer sticking to a "keep 
it simple" rule.

If we stick to it, I guess I would suggest the use of the Warning HTTP 
header to the use of an additional X- HTTP header.


François.
Received on Friday, 21 March 2008 11:37:21 UTC