RE: Content encoding problem... from Yaron Goland on 1997-02-22 (ietf-http-wg@w3.org from January to March 1997)

From: Yaron Goland <yarong@microsoft.com>
Date: Fri, 21 Feb 1997 19:47:29 -0800
To: 'Jeffrey Mogul' <mogul@pa.dec.com>, "'Roy T. Fielding'" <fielding@kiwi.ICS.UCI.EDU>
Cc: "'http-wg@cuckoo.hpl.hp.com'" <http-wg@cuckoo.hpl.hp.com>
Message-Id: <c=US%a=_%p=msft%l=RED-44-MSG-970222034729Z-4088@INET-05-IMC.microsoft.com>
Is it my imagination or is this issue really boiling down to: Is
identity an implicit or explicit content-encoding?
		Yaron

>-----Original Message-----
>From:	Jeffrey Mogul [SMTP:mogul@pa.dec.com]
>Sent:	Friday, February 21, 1997 6:42 PM
>To:	Roy T. Fielding
>Cc:	http-wg@cuckoo.hpl.hp.com
>Subject:	Re: Content encoding problem... 
>
>Roy's messages have been helpful.  I certainly understood that
>browsers are not the only clients involved, but I hadn't really
>made another important distinction.  Roy hints at it, but I don't
>think he's made it clear (perhaps because he won't agree with me
>that it exists).
>
>If you bear with me through another long message, I think we
>can actually specify "Accept-encoding" so that Roy and I are
>both happy.
>
>I think we can more or less agree on several things:
>
>	(1) It's not good if any client tries to interpret
>	the content of a response without realizing that it
>	has been encoded (e.g., a browser rendering an HTML
>	page, or an automated network manager that sets off
>	alarm bells when something seems wrong).
>
>	(2) It's also not good if a client that wants to
>	interpret (e.g., render) a response, but realizes
>	that it has been encoded in a way that the client
>	doesn't understand, and the client *would* have
>	been able to understand the identify encoding.
>	
>	(3) It's also not good if a server fails to send
>	a response to a client because it's not sure if
>	the client will be able to use it, and in fact
>	all that the client wants to do is to make a copy
>	of the server's bits.
>
>Roy seems to grudgingly grant #1 and #2, when he writes:
>    If a UA receives a response that includes a Content-Encoding value
>    which it is incapable of decoding, then the UA should be smart
>    enough to know that it cannot simply render the response.  There is
>    no excuse for failing to anticipate extensions of an extensible
>    field value.
>There may be no excuse, but Henrik says that this happens, and we
>need to face up to that.
>
>I hadn't really realized the issue for #3, which Roy expresses as:
>
>    For example, I have a save-URL-to-file program called lwpget.  It
>    never sends Accept-Encoding, because there has never been any
>    requirement that it should, and no HTTP/1.0 server ever needed it.
>    Should lwpget be prevented from working because of the
>    *possibility* that it might be a rendering engine that doesn't
>    understand the encoding?
>
>Roy and I also apparently agree that there is a distinction (which
>has already been made in the past) between browser clients and
>non-browser clients (such as lwpget or a mirroring system).  But
>I think that the missing distinction is this one:
>
>	Some clients interpret the bits of the response
>	
>	But some clients just copy the bits without interpreting them
>
>An unknown (or unexpected) content-coding is a problem for
>bit-interpreting clients (such as a browser), but it's not a problem
>for bit-copying clients (such as a mirror or lwpget).
>
>There's another distinction that we need to make:
>
>	Some resources are "inherently content-coded"; they exist
>		only in a form that requires decoding before
>		most useful interpretations
>	
>	Some responses are "content-coded in transit"; a server
>		or proxy has applied the encoding to a value
>		that is also available as "plaintext"
>
>Example of the first type:
>    http://www.ics.uci.edu/pub/ietf/http/rfc1945.ps.gz
>
>Example of the second type:
>    http://www.ics.uci.edu/pub/ietf/http/rfc1945.html after the
>	server (or some proxy) has passed it through gzip
>
>With these distinctions in mind, I can now state what I believe
>are useful goals:
>
>	(1) a bit-copying client wants to have the server's default
>	representation of a resource, whether this is encoded or
>	not.  E.g., if server X is mirroring the contents of server Y,
>	then the result (response body) of retrieving
>		http://X/foo
>	should be the same as the result of retrieving
>		http://Y/foo
>
>	(2) a bit-interpreting client needs to have, ultimately,
>	the unencoded representation of the resource.  For example,
>	if my browser retrieves an HTML file, then at some point
>	it has to have an non-compressed version of this file before
>	it can render it.
>
>Now, these two goals are not inconsistent with applying encodings
>(such as compression) at various stages.  For example, when a
>bit-copying client that understands gzip retrieves an HTML resource
>from a server that understands gzip, we would probably prefer
>that the bits transmitted over the wire between these two are sent
>using gzip compression, even if the mirrored result is decompressed
>before anyone else sees it.
>
>So here's what I think is the right solution:
>
>	(1) If there is only one representation available
>	at the server, or if the server's "normal" representation
>	is encoded, then the server should send that representation.
>
>	(2) If there are multiple representations, and the client
>	does not specify which one it prefers (i.e., the request
>	does not include "Accept-Encoding"), then the server should
>	send the least-encoded representation available.
>
>	(3) If there are multiple representations, and the client
>	specifies (using "Accept-Encoding") that it is willing
>	to accept at least one of these, then the server should
>	send the "best" of these acceptable representations.
>	
>	(4) If there are multiple representations, and the client
>        specifies (using "Accept-Encoding") a set of encodings
>	that it is willing to accept, but there is no intersection
>	between these sets, then the server should return "None
>	Acceptable".
>	
>I think these rules satisfy both Roy's stated requirements and
>mine.  That is, all of the existing clients will continue to
>get the responses they get today, because they don't send
>"Accept-encoding".  In particular, mirroring clients work exactly
>the way Roy wants (by rule #1), and servers that optionally
>compress responses before sending them won't do this to unsuspecting
>HTTP/1.0 browsers (by rule #2).  However, rule #3 allows HTTP/1.1
>clients and servers to agree to use any encoding that they choose,
>no matter what is listed in the HTTP/1.1 spec.  (Presumably, the
>encoding name should be listed in the IANA registry.)
>
>I think this is a codification of what Roy meant when he wrote:
>    It is the responsibility of the origin server to prevent [a browser
>    rendering garbage] from happening by accident.  It is not possible
>    to prevent [it] from happening on purpose, because attempting to do
>    so breaks my (2).
>I'm interpreting the bracketed [it] to mean "sending the server's
>normal representation of a resource".
>    
>Roy might object to my rule #4, based on this:
>    HTTP/1.1 browsers will have "Save As..." functionality, and thus
>    it isn't possible for an HTTP/1.1 application to exhaustively list
>    all accepted content-codings in an Accept-Encoding field for every
>    type of GET request it will perform.
>
>If one wants to be as aggressive as possible about using compression
>(or other encodings) in such cases, there is the potential for needing
>one extra round trip.  That is, the client can either send no
>Accept-encoding at all, which (probably) will result in a
>non-compressed transfer ... or the client can send an Accept-Encoding
>field that lists a finite set of encodings it can handle, taking a
>chance that none of these will be available at the server, and so
>requiring one more round trip for the client to retry the request with
>no Accept-Encoding header.
>
>But this somewhat begs the question, because what does "Save As"
>really mean when the server has a choice of encodings?  Does the
>client want to save the decoded contents, or one of the encoded
>representations?  Does this depend on whether the server's default
>representation is compressed, or if the compression was applied
>in flight?  These seem like UA questions, not protocol questions.
>For example, Netscape 3.0 knows enough to gunzip
>    http://www.ics.uci.edu/pub/ietf/http/rfc1945.ps.gz
>before invoking a Postscript previewer on it, but "Save As" stores
>it as a compressed file.
>
>Regarding my scenario with the HTTP/1.0 proxy cache and HTTP/1.0
>client, I still think this requires the use of a special status
>code to prevent accidents (unwitting rendering of garbage).  Roy
>can hope that people will replace their HTTP/1.0 proxies and
>HTTP/1.0 browsers because "comes a point when we must recognize
>the limitations of older technology and move on", but wishing won't
>make it so.  (And I could have argued that Roy's lwpget program,
>and existing mirror clients, should be upgraded, but I don't think
>we should be making anything that works today obsolete.)
>
>At any rate, on this topic, Roy write:
>    this particular scenario only occurs if the URL in question has
>    negotiated responses based on Accept-Encoding.  It is quite
>    reasonable for the origin server to modify its negotiation
>    algorithm based on the capabilities of the user agent, or even the
>    fact that it was passed through a particular cache; I even
>    described that in section 12.1.
>
>I think it would be far simpler (and safer, because it's probably
>impossible to enumerate the universe of User-Agent values) if the
>server simply used my proposed 207 status code for "negotiated"
>encodings.  I.e., if the server follows my rule #3, then it sets
>207.  If the server is following rules #1 or #2, then there hasn't
>really been a negotiation, and I suppose it makes sense to cache
>the response.  Yes, in a world without any HTTP/1.0 proxy caches
>one could rely on "Vary: Accept-encoding", but it's pointless to
>expect all such caches to disappear any time soon.
>
>By the way, Roy, when you write, re: my proposed 207 (Encoded Content)
>status code,
>	it breaks the distinction between the response status and
>	the payload content, which would be extremely depressing
>	for the future evolution of HTTP.
>I really have no idea what you mean by this.  Perhaps you could
>elaborate?
>
>-Jeff
>
Received on Friday, 21 February 1997 19:50:04 UTC