- From: Roy T. Fielding <fielding@gbiv.com>
- Date: Thu, 14 Feb 2008 19:01:57 -0800
- To: "Frank Ellermann" <hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com>
- Cc: ietf-http-wg@w3.org
It looks like 1/2 of your response is about small changes to the text that is being deleted, and another 1/4 about the bits left after the last change, and only the last 1/4 about my proposed rewrite. That is really confusing. I'm just going to skip ahead to the comments on my proposed text... > On Feb 14, 2008, at 9:18 AM, Frank Ellermann wrote: >> Roy T. Fielding wrote: >> > ACK. Potential issues in your version: > >> : When a media type is registered with a default charset value >> : of "US-ASCII", it MAY be used to label data transmitted via >> : HTTP in the "iso-8859-1" charset (a superset of US-ASCII) >> : without including an explicit charset parameter on the media >> : type. > > For 2616bis that should be no valid option (MAY), it should be > a *violation* of a new SHOULD for the stated historical reason. > Going from MAY to SHOULD NOT is possible, nothing breaks. That would change the protocol such that all currently compliant HTTP senders that transmit text messages in "iso-8859-1" without a charset parameter would be violating a SHOULD requirement. My proposal states the fact that such messages do occur in practice and alters the MIME requirement for HTTP to accommodate them. >> : In addition, when a media type registered with a default >> : charset value of "US-ASCII" is received via HTTP without a >> : charset parameter or with a charset value of "iso-8859-1", >> : the recipient MAY inspect the data for indications of a >> : different character encoding > [...] > > That is convoluted. Certainly it "MAY" try to determine the > charset by sniffing if there is no charset, arguably it "must" > (lower case) do this for the (non-HTTP) purpose of displaying > a document. And it "MAY" do this whenever it wishes, the case > of an erroneous iso-8859-1 IMO does not justify a HTTP "MAY". If browsers are willing to implement that, fine. >> : if the encoding can be determined within the first 16 octets >> : of data and interpreted consistently thereafter. > > Please no arbitrary magic numbers like "16" in a standard, let > alone in a standard where the complete "sniffing" business is > off topic. It is more important that it works (or that we find out it doesn't). >> : Note: The first variance is due to a significant portion of >> : early HTTP user agents not parsing media type parameters and >> : instead relying on a then-common default encoding of iso-8859-1. >> : As a result, early server implementations avoided the use of >> : charset parameters and user agents evolved to "sniff" for new >> : character encodings as the Web expanded beyond iso-8859-1 >> : content. > > Yes, and (as you noted in another article) servers have no time > for any sniffing on their side for dynamical content. But that > does not justify a "variance" going as far as an option (MAY), > violating a SHOULD NOT is good enough for this historical case. Sorry, that decision was made in 1994 and is now way out of scope. > I don't see why 2616bis should try to overrule text/xml defaults > with a MAY, as HTTP certainly does not try to tell clients what > a say image/x-icon might be, and how to display it. Then you don't know (or don't care) what the MIME specs say. I do. It was an intentional decision based on the needs of different protocols. The other alternative would be to define a separate media type registration system, which was considered more harmful than simply stating the differences and noting the requirements for translating an HTTP-compliant message to a MIME-compliant message. >> : The second variance is due to a certain popular user agent that >> : employed an unsafe encoding detection and switching algorithm >> : within documents that might contain user-provided data (see >> : Section security.sniffing), the most common workaround for >> : which is to supply a specific charset parameter even when the >> : actual character encoding is unknown. > > No. Plausible reasons why servers might intentionally lie with > "iso-8859-1" do not belong in an Internet standard. If an UA is > broken it needs to be fixed. Servers could also try their luck > with the registered "unknown-8bit" instead of lying, this is out > of scope for HTTP. Then get back to us when you have fixed that user agent. Sending any charset that is invalid/unknown to that user agent will fail to trigger the one safe path that allows us to workaround its stupid bugs. All I am trying to explain in that paragraph is why the theory of "servers should just leave the charset empty" is never going to happen in the foreseeable future. Otherwise, I am happy to mark the issue as WONTFIX and let the browsers deal with their own bugs. ....Roy
Received on Friday, 15 February 2008 03:02:06 UTC