- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Sun, 20 Jan 2008 12:28:41 +0100
- To: 'HTTP Working Group' <ietf-http-wg@w3.org>
Larry Masinter wrote: > "If we couldn't fix it then, why do you imagine you can fix it now?" Depends on the definition of "fixing" :-) Given almost 10 additional years of experience, and observing what software actually does today, we really have sufficient reason to improve what the spec says. > I'm arguing for documenting current practice, making some recommendations > for safe behavior, and moving on. +1 > We certainly *wanted* to change the default charset for HTTP when working on > 2026 but couldn't find a way around the impasse between backward > compatability, client sniffing, server misconfiguration et al. So let's assume we remove <http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p3-payload-01.html#rfc.section.2.3.1.p.4>: "The "charset" parameter is used with some media types to define the character set (Section 2.1) of the data. When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP. Data in character sets other than "ISO-8859-1" or its subsets MUST be labeled with an appropriate charset value. See Section 2.1.1 for compatibility problems." Would that break anything in practice today? > I think the main thing to do is to document the actual situation > sufficiently such that new HTTP implementations don't break things: > > 1) servers (senders): don't make up a charset if you don't know what it is > (this is a good rule for any kind of descriptive information, isn't it?). Right. > 2) clients (receivers): servers (senders) are unfortunately often > misconfigured and will label things with the wrong charset. (This is often > because lots of software uses 'mime type' when what's wanted is usually > 'content type' and the parameters get lost). But guessing blindly and > ignoring what the server sent seems like a bad idea, and even has security > implications. So "beware". Right; maybe also point to <http://www.w3.org/2001/tag/doc/mime-respect>. > 3) everybody: even if you agree about charset, accept other end-of-line > terminations (not just CRLF which MIME required.) <http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p3-payload-01.html#rfc.section.2.3.1.p.2> currently says: "When in canonical form, media subtypes of the "text" type use CRLF as the text line break. HTTP relaxes this requirement and allows the transport of text media with plain CR or LF alone representing a line break when it is done consistently for an entire entity-body. HTTP applications MUST accept CRLF, bare CR, and bare LF as being representative of a line break in text media received via HTTP. In addition, if the text is represented in a character set that does not use octets 13 and 10 for CR and LF respectively, as is the case for some multi-byte character sets, HTTP allows the use of whatever octet sequences are defined by that character set to represent the equivalent of CR and LF for line breaks. This flexibility regarding line breaks applies only to text media in the entity-body; a bare CR or LF MUST NOT be substituted for CRLF within any of the HTTP control structures (such as header fields and multipart boundaries)." Does this need fixing? > (It's really receiver & sender, not client & server, since the rules should > apply for file upload as well as anything else.) Correct. So my proposal would be: - drop paragraph 4 (ISO-8859-1), - add a note covering Larry's points 1) and 2), and - mention this is a normative change in <http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p3-payload-01.html#changes.from.rfc.2616>. BR, Julian
Received on Sunday, 20 January 2008 11:29:04 UTC