- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Tue, 12 Feb 2008 18:23:57 +0100
- To: ietf-http-wg@w3.org
Julian Reschke wrote: >| When in canonical form, media subtypes of the "text" type use >| CRLF as the text line break. HTTP relaxes this requirement and >| allows the transport of text media with plain CR or LF alone >| representing a line break when it is done consistently for an >| entire entity-body. I'm not sure about this, it was found to be strange enough for a dishonourable note in the future net-utf8 RFC. I think what is really going on is something like this: | HTTP does not depend on this canonical lineend in "text" types, | and therefore does not require it in the content. >| HTTP applications MUST accept CRLF, bare CR, and bare LF as being >| representative of a line break in text media received via HTTP. >| In addition, if the text is represented in a character set that >| does not use octets 13 and 10 for CR and LF respectively, as is >| the case for some multi-byte character sets, HTTP allows the use >| of whatever octet sequences are defined by that character set to >| represent the equivalent of CR and LF for line breaks. I think that's beside the point. AFAIK XML permits U+0085 NEL, and text/xml exists (but maybe I confused XML 1.1 with XML 1.0 here). If the charset uses octets 0D and 0A for U+000D and U+000A does not necessarily affect octet 85 used as U+0085 in some legacy charsets. HTTP does not really "allow" whatever represents a line break, it simply does not "care" (within bodies or chunks). How applications interpret content is their business. As far as HTTP is concerned applications cannot trust that text/* comes with a canonical CRLF. What really matters for HTTP is the header (and anything else not belonging to the content). And *there* CRLF is of course REQUIRED. > HTTP/1.1 recipients MUST respect the charset label provided by > the sender Please justify this MUST strictly following RFC 2119, or replace it by a SHOULD. Many HTTP servers (even including IANA and W3C) get some content types and their charsets where applicable wrong. > those user agents that have a provision to "guess" a charset > MUST use the charset from the content-type field Please justify also this MUST using RFC 2119 terms. Let's better face it as it is, many HTTP servers are liars in practice, and do not deserve too much respect. Frank
Received on Tuesday, 12 February 2008 17:22:51 UTC