- From: Jeffrey Mogul <mogul@pa.dec.com>
- Date: Fri, 21 Feb 97 18:41:46 PST
- To: "Roy T. Fielding" <fielding@kiwi.ICS.UCI.EDU>
- Cc: http-wg@cuckoo.hpl.hp.com
Roy's messages have been helpful. I certainly understood that browsers are not the only clients involved, but I hadn't really made another important distinction. Roy hints at it, but I don't think he's made it clear (perhaps because he won't agree with me that it exists). If you bear with me through another long message, I think we can actually specify "Accept-encoding" so that Roy and I are both happy. I think we can more or less agree on several things: (1) It's not good if any client tries to interpret the content of a response without realizing that it has been encoded (e.g., a browser rendering an HTML page, or an automated network manager that sets off alarm bells when something seems wrong). (2) It's also not good if a client that wants to interpret (e.g., render) a response, but realizes that it has been encoded in a way that the client doesn't understand, and the client *would* have been able to understand the identify encoding. (3) It's also not good if a server fails to send a response to a client because it's not sure if the client will be able to use it, and in fact all that the client wants to do is to make a copy of the server's bits. Roy seems to grudgingly grant #1 and #2, when he writes: If a UA receives a response that includes a Content-Encoding value which it is incapable of decoding, then the UA should be smart enough to know that it cannot simply render the response. There is no excuse for failing to anticipate extensions of an extensible field value. There may be no excuse, but Henrik says that this happens, and we need to face up to that. I hadn't really realized the issue for #3, which Roy expresses as: For example, I have a save-URL-to-file program called lwpget. It never sends Accept-Encoding, because there has never been any requirement that it should, and no HTTP/1.0 server ever needed it. Should lwpget be prevented from working because of the *possibility* that it might be a rendering engine that doesn't understand the encoding? Roy and I also apparently agree that there is a distinction (which has already been made in the past) between browser clients and non-browser clients (such as lwpget or a mirroring system). But I think that the missing distinction is this one: Some clients interpret the bits of the response But some clients just copy the bits without interpreting them An unknown (or unexpected) content-coding is a problem for bit-interpreting clients (such as a browser), but it's not a problem for bit-copying clients (such as a mirror or lwpget). There's another distinction that we need to make: Some resources are "inherently content-coded"; they exist only in a form that requires decoding before most useful interpretations Some responses are "content-coded in transit"; a server or proxy has applied the encoding to a value that is also available as "plaintext" Example of the first type: http://www.ics.uci.edu/pub/ietf/http/rfc1945.ps.gz Example of the second type: http://www.ics.uci.edu/pub/ietf/http/rfc1945.html after the server (or some proxy) has passed it through gzip With these distinctions in mind, I can now state what I believe are useful goals: (1) a bit-copying client wants to have the server's default representation of a resource, whether this is encoded or not. E.g., if server X is mirroring the contents of server Y, then the result (response body) of retrieving http://X/foo should be the same as the result of retrieving http://Y/foo (2) a bit-interpreting client needs to have, ultimately, the unencoded representation of the resource. For example, if my browser retrieves an HTML file, then at some point it has to have an non-compressed version of this file before it can render it. Now, these two goals are not inconsistent with applying encodings (such as compression) at various stages. For example, when a bit-copying client that understands gzip retrieves an HTML resource from a server that understands gzip, we would probably prefer that the bits transmitted over the wire between these two are sent using gzip compression, even if the mirrored result is decompressed before anyone else sees it. So here's what I think is the right solution: (1) If there is only one representation available at the server, or if the server's "normal" representation is encoded, then the server should send that representation. (2) If there are multiple representations, and the client does not specify which one it prefers (i.e., the request does not include "Accept-Encoding"), then the server should send the least-encoded representation available. (3) If there are multiple representations, and the client specifies (using "Accept-Encoding") that it is willing to accept at least one of these, then the server should send the "best" of these acceptable representations. (4) If there are multiple representations, and the client specifies (using "Accept-Encoding") a set of encodings that it is willing to accept, but there is no intersection between these sets, then the server should return "None Acceptable". I think these rules satisfy both Roy's stated requirements and mine. That is, all of the existing clients will continue to get the responses they get today, because they don't send "Accept-encoding". In particular, mirroring clients work exactly the way Roy wants (by rule #1), and servers that optionally compress responses before sending them won't do this to unsuspecting HTTP/1.0 browsers (by rule #2). However, rule #3 allows HTTP/1.1 clients and servers to agree to use any encoding that they choose, no matter what is listed in the HTTP/1.1 spec. (Presumably, the encoding name should be listed in the IANA registry.) I think this is a codification of what Roy meant when he wrote: It is the responsibility of the origin server to prevent [a browser rendering garbage] from happening by accident. It is not possible to prevent [it] from happening on purpose, because attempting to do so breaks my (2). I'm interpreting the bracketed [it] to mean "sending the server's normal representation of a resource". Roy might object to my rule #4, based on this: HTTP/1.1 browsers will have "Save As..." functionality, and thus it isn't possible for an HTTP/1.1 application to exhaustively list all accepted content-codings in an Accept-Encoding field for every type of GET request it will perform. If one wants to be as aggressive as possible about using compression (or other encodings) in such cases, there is the potential for needing one extra round trip. That is, the client can either send no Accept-encoding at all, which (probably) will result in a non-compressed transfer ... or the client can send an Accept-Encoding field that lists a finite set of encodings it can handle, taking a chance that none of these will be available at the server, and so requiring one more round trip for the client to retry the request with no Accept-Encoding header. But this somewhat begs the question, because what does "Save As" really mean when the server has a choice of encodings? Does the client want to save the decoded contents, or one of the encoded representations? Does this depend on whether the server's default representation is compressed, or if the compression was applied in flight? These seem like UA questions, not protocol questions. For example, Netscape 3.0 knows enough to gunzip http://www.ics.uci.edu/pub/ietf/http/rfc1945.ps.gz before invoking a Postscript previewer on it, but "Save As" stores it as a compressed file. Regarding my scenario with the HTTP/1.0 proxy cache and HTTP/1.0 client, I still think this requires the use of a special status code to prevent accidents (unwitting rendering of garbage). Roy can hope that people will replace their HTTP/1.0 proxies and HTTP/1.0 browsers because "comes a point when we must recognize the limitations of older technology and move on", but wishing won't make it so. (And I could have argued that Roy's lwpget program, and existing mirror clients, should be upgraded, but I don't think we should be making anything that works today obsolete.) At any rate, on this topic, Roy write: this particular scenario only occurs if the URL in question has negotiated responses based on Accept-Encoding. It is quite reasonable for the origin server to modify its negotiation algorithm based on the capabilities of the user agent, or even the fact that it was passed through a particular cache; I even described that in section 12.1. I think it would be far simpler (and safer, because it's probably impossible to enumerate the universe of User-Agent values) if the server simply used my proposed 207 status code for "negotiated" encodings. I.e., if the server follows my rule #3, then it sets 207. If the server is following rules #1 or #2, then there hasn't really been a negotiation, and I suppose it makes sense to cache the response. Yes, in a world without any HTTP/1.0 proxy caches one could rely on "Vary: Accept-encoding", but it's pointless to expect all such caches to disappear any time soon. By the way, Roy, when you write, re: my proposed 207 (Encoded Content) status code, it breaks the distinction between the response status and the payload content, which would be extremely depressing for the future evolution of HTTP. I really have no idea what you mean by this. Perhaps you could elaborate? -Jeff
Received on Friday, 21 February 1997 18:47:24 UTC