- From: Francois Daoust <fd@w3.org>
- Date: Fri, 21 Mar 2008 12:36:50 +0100
- To: public-bpwg-ct <public-bpwg-ct@w3.org>
The problem
-----------
We want the CT-proxy to be able to send an HTTP request that defines
modified headers, but also includes the original ones.
Embedding the original headers must not break existing content. In other
words, it must be transparent for content providers that are not aware
of our guidelines.
About the "embedded" possibilities
----------------------------------
I can't think of any way to embed things in an HTTP request using
alternate content-types and still respect the required transparency
above-mentioned.
To make things hopefully clear, let me use a pseudo-syntax to describe a
typical POST HTTP request (the syntax may be slightly incorrect, but
that should not be important):
"POST [URI] HTTP/1.1
User-Agent: [modified User-Agent header]
Accept: [modified Accept header]
Content-Type: [depends on POST data]
[Other modified headers]
[Other original headers]
[POST data]"
1/ message/http is about enclosing an HTTP message as the body of
another one. The problem is this body would replace the body of our
request, leading to:
"POST [URI] HTTP/1.1
User-Agent: [modified User-Agent header]
Accept: [modified Accept header]
Content-Type: message/http
[Other modified headers]
[Other original headers]
POST [URI] HTTP/1.1
User-Agent: [original User-Agent header]
Accept: [original Accept header]
Content-Type: [depends on POST data]
[Other original headers]
[POST data]"
Web servers receiving this request will pass it to the underlying
application (ASP, PHP, JSP, servlet, ISAPI extension, whatever), with
the enclosed HTTP message as the POST data. If the application was not
coded with that in mind (why would it be?), [POST data] is lost in the
newly received body.
2/ message/external-body is about referencing an external *body* in an
HTTP message. We don't have any "body" to reference, we have
"headers"... Anyway, same problem as above, emphasized by the fact that
the request is not supposed to contain a body:
"POST [URI] HTTP/1.1
User-Agent: [modified User-Agent header]
Accept: [modified Accept header]
Content-Type: message/external-body;access-type=local-file;
name="[path]"
Content-ID: <[content ID]>
[Other modified headers]
[Other original headers]"
3/ multipart/mixed messages could be used to be able to have both the
original headers and the request body as part of one HTTP request, but
again, it can't be transparent to the unaware applications:
"POST [URI] HTTP/1.1
User-Agent: [modified User-Agent header]
Accept: [modified Accept header]
Content-Type: multipart/mixed;boundary="ct"
[Other modified headers]
[Other original headers]
--ct
Content-Type: [depends on POST data]
[POST data]
--ct
Content-Type: message/http
POST [URI] HTTP/1.1
User-Agent: [original User-Agent header]
Accept: [original Accept header]
Content-Type: [depends on POST data]
[Other original headers]
--ct--"
A Web server can't infer that [POST data] needs to be passed to the
underlying application rather than the message/http part.
4/ What about GET requests? We may think we could use 1/ in that case
(no POST data in that case). The HTTP RFC doesn't state that request
bodies can't be used in GET requests. In practice, this means that the
behavior of agents regarding GET requests with bodies in unpredictable,
and that we can't rely on anything.
Using the Warning HTTP header
-----------------------------
Although (as usual?) that's a bit unclear when one reads the HTTP RFC,
Warnings typically apply more to HTTP responses than to HTTP requests.
But that seems harmless anyway.
The "214" (Transformation Applied) code is about modifications of the
message *body* (coding, content-type or other), so we would need another
code to say "Headers modified" (228 where 28 stands for CT on a
keypad?). Here again, the procedure to follow to make a registration for
such a code is unclear, but should not be that a big deal (as compared
with Cache-Control extensions for instance).
The value of the header is a quoted string, supposedly intended for
humans but opened for whatever we may want to stuff in it, so we could
go for a:
Warning: 228 [hostname] "{User-Agent: [original one], {Accept: [original
one]}"
The maybe good thing about using a Warning header is that is looks less
"official" than using an additional X- HTTP header. It looks more as an
"informational note" than as a needed one.
My thoughts
-----------
Even though I can't find any reply for the moment to the "why not?"
question other than "because it's dirty", I still don't see why we would
need to pass on the original headers:
a. our guidelines are along the lines of "do not transform unless...".
So if the CT-proxy decided to change the HTTP headers, it should have
good reasons to do so.
b. the recommended content tasting approach using original headers at
first ensures - even though there may be cases where it's not respected
- the content provider will be given a chance to answer the original
request.
c. the use of the "Vary" header in the response may be used to handle
the case where the CT-proxy actually sent the modified request first.
Upon receipt of such a header, the CT-proxy should re-try the content
tasting approach.
d. the communication between the actors should be clear: embedding two
sets of headers leads to confusion. I tend to prefer sticking to a "keep
it simple" rule.
If we stick to it, I guess I would suggest the use of the Warning HTTP
header to the use of an additional X- HTTP header.
François.
Received on Friday, 21 March 2008 11:37:21 UTC