- From: Yaron Goland <yarong@microsoft.com>
- Date: Fri, 21 Feb 1997 19:47:29 -0800
- To: 'Jeffrey Mogul' <mogul@pa.dec.com>, "'Roy T. Fielding'" <fielding@kiwi.ICS.UCI.EDU>
- Cc: "'http-wg@cuckoo.hpl.hp.com'" <http-wg@cuckoo.hpl.hp.com>
Is it my imagination or is this issue really boiling down to: Is identity an implicit or explicit content-encoding? Yaron >-----Original Message----- >From: Jeffrey Mogul [SMTP:mogul@pa.dec.com] >Sent: Friday, February 21, 1997 6:42 PM >To: Roy T. Fielding >Cc: http-wg@cuckoo.hpl.hp.com >Subject: Re: Content encoding problem... > >Roy's messages have been helpful. I certainly understood that >browsers are not the only clients involved, but I hadn't really >made another important distinction. Roy hints at it, but I don't >think he's made it clear (perhaps because he won't agree with me >that it exists). > >If you bear with me through another long message, I think we >can actually specify "Accept-encoding" so that Roy and I are >both happy. > >I think we can more or less agree on several things: > > (1) It's not good if any client tries to interpret > the content of a response without realizing that it > has been encoded (e.g., a browser rendering an HTML > page, or an automated network manager that sets off > alarm bells when something seems wrong). > > (2) It's also not good if a client that wants to > interpret (e.g., render) a response, but realizes > that it has been encoded in a way that the client > doesn't understand, and the client *would* have > been able to understand the identify encoding. > > (3) It's also not good if a server fails to send > a response to a client because it's not sure if > the client will be able to use it, and in fact > all that the client wants to do is to make a copy > of the server's bits. > >Roy seems to grudgingly grant #1 and #2, when he writes: > If a UA receives a response that includes a Content-Encoding value > which it is incapable of decoding, then the UA should be smart > enough to know that it cannot simply render the response. There is > no excuse for failing to anticipate extensions of an extensible > field value. >There may be no excuse, but Henrik says that this happens, and we >need to face up to that. > >I hadn't really realized the issue for #3, which Roy expresses as: > > For example, I have a save-URL-to-file program called lwpget. It > never sends Accept-Encoding, because there has never been any > requirement that it should, and no HTTP/1.0 server ever needed it. > Should lwpget be prevented from working because of the > *possibility* that it might be a rendering engine that doesn't > understand the encoding? > >Roy and I also apparently agree that there is a distinction (which >has already been made in the past) between browser clients and >non-browser clients (such as lwpget or a mirroring system). But >I think that the missing distinction is this one: > > Some clients interpret the bits of the response > > But some clients just copy the bits without interpreting them > >An unknown (or unexpected) content-coding is a problem for >bit-interpreting clients (such as a browser), but it's not a problem >for bit-copying clients (such as a mirror or lwpget). > >There's another distinction that we need to make: > > Some resources are "inherently content-coded"; they exist > only in a form that requires decoding before > most useful interpretations > > Some responses are "content-coded in transit"; a server > or proxy has applied the encoding to a value > that is also available as "plaintext" > >Example of the first type: > http://www.ics.uci.edu/pub/ietf/http/rfc1945.ps.gz > >Example of the second type: > http://www.ics.uci.edu/pub/ietf/http/rfc1945.html after the > server (or some proxy) has passed it through gzip > >With these distinctions in mind, I can now state what I believe >are useful goals: > > (1) a bit-copying client wants to have the server's default > representation of a resource, whether this is encoded or > not. E.g., if server X is mirroring the contents of server Y, > then the result (response body) of retrieving > http://X/foo > should be the same as the result of retrieving > http://Y/foo > > (2) a bit-interpreting client needs to have, ultimately, > the unencoded representation of the resource. For example, > if my browser retrieves an HTML file, then at some point > it has to have an non-compressed version of this file before > it can render it. > >Now, these two goals are not inconsistent with applying encodings >(such as compression) at various stages. For example, when a >bit-copying client that understands gzip retrieves an HTML resource >from a server that understands gzip, we would probably prefer >that the bits transmitted over the wire between these two are sent >using gzip compression, even if the mirrored result is decompressed >before anyone else sees it. > >So here's what I think is the right solution: > > (1) If there is only one representation available > at the server, or if the server's "normal" representation > is encoded, then the server should send that representation. > > (2) If there are multiple representations, and the client > does not specify which one it prefers (i.e., the request > does not include "Accept-Encoding"), then the server should > send the least-encoded representation available. > > (3) If there are multiple representations, and the client > specifies (using "Accept-Encoding") that it is willing > to accept at least one of these, then the server should > send the "best" of these acceptable representations. > > (4) If there are multiple representations, and the client > specifies (using "Accept-Encoding") a set of encodings > that it is willing to accept, but there is no intersection > between these sets, then the server should return "None > Acceptable". > >I think these rules satisfy both Roy's stated requirements and >mine. That is, all of the existing clients will continue to >get the responses they get today, because they don't send >"Accept-encoding". In particular, mirroring clients work exactly >the way Roy wants (by rule #1), and servers that optionally >compress responses before sending them won't do this to unsuspecting >HTTP/1.0 browsers (by rule #2). However, rule #3 allows HTTP/1.1 >clients and servers to agree to use any encoding that they choose, >no matter what is listed in the HTTP/1.1 spec. (Presumably, the >encoding name should be listed in the IANA registry.) > >I think this is a codification of what Roy meant when he wrote: > It is the responsibility of the origin server to prevent [a browser > rendering garbage] from happening by accident. It is not possible > to prevent [it] from happening on purpose, because attempting to do > so breaks my (2). >I'm interpreting the bracketed [it] to mean "sending the server's >normal representation of a resource". > >Roy might object to my rule #4, based on this: > HTTP/1.1 browsers will have "Save As..." functionality, and thus > it isn't possible for an HTTP/1.1 application to exhaustively list > all accepted content-codings in an Accept-Encoding field for every > type of GET request it will perform. > >If one wants to be as aggressive as possible about using compression >(or other encodings) in such cases, there is the potential for needing >one extra round trip. That is, the client can either send no >Accept-encoding at all, which (probably) will result in a >non-compressed transfer ... or the client can send an Accept-Encoding >field that lists a finite set of encodings it can handle, taking a >chance that none of these will be available at the server, and so >requiring one more round trip for the client to retry the request with >no Accept-Encoding header. > >But this somewhat begs the question, because what does "Save As" >really mean when the server has a choice of encodings? Does the >client want to save the decoded contents, or one of the encoded >representations? Does this depend on whether the server's default >representation is compressed, or if the compression was applied >in flight? These seem like UA questions, not protocol questions. >For example, Netscape 3.0 knows enough to gunzip > http://www.ics.uci.edu/pub/ietf/http/rfc1945.ps.gz >before invoking a Postscript previewer on it, but "Save As" stores >it as a compressed file. > >Regarding my scenario with the HTTP/1.0 proxy cache and HTTP/1.0 >client, I still think this requires the use of a special status >code to prevent accidents (unwitting rendering of garbage). Roy >can hope that people will replace their HTTP/1.0 proxies and >HTTP/1.0 browsers because "comes a point when we must recognize >the limitations of older technology and move on", but wishing won't >make it so. (And I could have argued that Roy's lwpget program, >and existing mirror clients, should be upgraded, but I don't think >we should be making anything that works today obsolete.) > >At any rate, on this topic, Roy write: > this particular scenario only occurs if the URL in question has > negotiated responses based on Accept-Encoding. It is quite > reasonable for the origin server to modify its negotiation > algorithm based on the capabilities of the user agent, or even the > fact that it was passed through a particular cache; I even > described that in section 12.1. > >I think it would be far simpler (and safer, because it's probably >impossible to enumerate the universe of User-Agent values) if the >server simply used my proposed 207 status code for "negotiated" >encodings. I.e., if the server follows my rule #3, then it sets >207. If the server is following rules #1 or #2, then there hasn't >really been a negotiation, and I suppose it makes sense to cache >the response. Yes, in a world without any HTTP/1.0 proxy caches >one could rely on "Vary: Accept-encoding", but it's pointless to >expect all such caches to disappear any time soon. > >By the way, Roy, when you write, re: my proposed 207 (Encoded Content) >status code, > it breaks the distinction between the response status and > the payload content, which would be extremely depressing > for the future evolution of HTTP. >I really have no idea what you mean by this. Perhaps you could >elaborate? > >-Jeff >
Received on Friday, 21 February 1997 19:50:04 UTC