- From: Francois Daoust <fd@w3.org>
- Date: Fri, 21 Mar 2008 12:36:50 +0100
- To: public-bpwg-ct <public-bpwg-ct@w3.org>
The problem ----------- We want the CT-proxy to be able to send an HTTP request that defines modified headers, but also includes the original ones. Embedding the original headers must not break existing content. In other words, it must be transparent for content providers that are not aware of our guidelines. About the "embedded" possibilities ---------------------------------- I can't think of any way to embed things in an HTTP request using alternate content-types and still respect the required transparency above-mentioned. To make things hopefully clear, let me use a pseudo-syntax to describe a typical POST HTTP request (the syntax may be slightly incorrect, but that should not be important): "POST [URI] HTTP/1.1 User-Agent: [modified User-Agent header] Accept: [modified Accept header] Content-Type: [depends on POST data] [Other modified headers] [Other original headers] [POST data]" 1/ message/http is about enclosing an HTTP message as the body of another one. The problem is this body would replace the body of our request, leading to: "POST [URI] HTTP/1.1 User-Agent: [modified User-Agent header] Accept: [modified Accept header] Content-Type: message/http [Other modified headers] [Other original headers] POST [URI] HTTP/1.1 User-Agent: [original User-Agent header] Accept: [original Accept header] Content-Type: [depends on POST data] [Other original headers] [POST data]" Web servers receiving this request will pass it to the underlying application (ASP, PHP, JSP, servlet, ISAPI extension, whatever), with the enclosed HTTP message as the POST data. If the application was not coded with that in mind (why would it be?), [POST data] is lost in the newly received body. 2/ message/external-body is about referencing an external *body* in an HTTP message. We don't have any "body" to reference, we have "headers"... Anyway, same problem as above, emphasized by the fact that the request is not supposed to contain a body: "POST [URI] HTTP/1.1 User-Agent: [modified User-Agent header] Accept: [modified Accept header] Content-Type: message/external-body;access-type=local-file; name="[path]" Content-ID: <[content ID]> [Other modified headers] [Other original headers]" 3/ multipart/mixed messages could be used to be able to have both the original headers and the request body as part of one HTTP request, but again, it can't be transparent to the unaware applications: "POST [URI] HTTP/1.1 User-Agent: [modified User-Agent header] Accept: [modified Accept header] Content-Type: multipart/mixed;boundary="ct" [Other modified headers] [Other original headers] --ct Content-Type: [depends on POST data] [POST data] --ct Content-Type: message/http POST [URI] HTTP/1.1 User-Agent: [original User-Agent header] Accept: [original Accept header] Content-Type: [depends on POST data] [Other original headers] --ct--" A Web server can't infer that [POST data] needs to be passed to the underlying application rather than the message/http part. 4/ What about GET requests? We may think we could use 1/ in that case (no POST data in that case). The HTTP RFC doesn't state that request bodies can't be used in GET requests. In practice, this means that the behavior of agents regarding GET requests with bodies in unpredictable, and that we can't rely on anything. Using the Warning HTTP header ----------------------------- Although (as usual?) that's a bit unclear when one reads the HTTP RFC, Warnings typically apply more to HTTP responses than to HTTP requests. But that seems harmless anyway. The "214" (Transformation Applied) code is about modifications of the message *body* (coding, content-type or other), so we would need another code to say "Headers modified" (228 where 28 stands for CT on a keypad?). Here again, the procedure to follow to make a registration for such a code is unclear, but should not be that a big deal (as compared with Cache-Control extensions for instance). The value of the header is a quoted string, supposedly intended for humans but opened for whatever we may want to stuff in it, so we could go for a: Warning: 228 [hostname] "{User-Agent: [original one], {Accept: [original one]}" The maybe good thing about using a Warning header is that is looks less "official" than using an additional X- HTTP header. It looks more as an "informational note" than as a needed one. My thoughts ----------- Even though I can't find any reply for the moment to the "why not?" question other than "because it's dirty", I still don't see why we would need to pass on the original headers: a. our guidelines are along the lines of "do not transform unless...". So if the CT-proxy decided to change the HTTP headers, it should have good reasons to do so. b. the recommended content tasting approach using original headers at first ensures - even though there may be cases where it's not respected - the content provider will be given a chance to answer the original request. c. the use of the "Vary" header in the response may be used to handle the case where the CT-proxy actually sent the modified request first. Upon receipt of such a header, the CT-proxy should re-try the content tasting approach. d. the communication between the actors should be clear: embedding two sets of headers leads to confusion. I tend to prefer sticking to a "keep it simple" rule. If we stick to it, I guess I would suggest the use of the Warning HTTP header to the use of an additional X- HTTP header. François.
Received on Friday, 21 March 2008 11:37:21 UTC