- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Tue, 25 Mar 2008 19:41:24 +0100
- To: ietf-http-wg@w3.org
Simon Perreault wrote: > I investigated and came to the conclusion that MIME doesn't > specify that text/* has a default character set of ASCII. > I may very well be wrong, and this email is also a way to > ask you for clarifications. RFC 2045 chapter 5: | For example, the "charset" parameter is applicable to any | subtype of "text" RFC 2045 chapter 5.2: | Content-type: text/plain; charset=us-ascii | | This default is assumed if no Content-Type header field is | specified. RFC 2046 chapter 4.1.2: | Note that the character set used, if anything other than | US-ASCII, must always be explicitly specified in the | Content-Type field. There are additional remarks about treating unknown text/* subtypes like text/plain, and for text/plain the default US-ASCII is very clear (various places including RFC 2049, minimal MIME conformance). You can argue that text/html is not "unknown" for a typical HTTP application. *Historic* info: RFC 2070, chapter 2.1: | HTML, as an application of SGML, does not directly address | the question of the external character encoding. This is | deferred to mechanisms external to HTML, such as MIME as | used by the HTTP protocol or by electronic mail. [...] | Similarly, if HTML documents are transferred by electronic | mail, the external character encoding is defined by the | "charset" parameter of the "Content-Type" MIME header field | [RFC2045], and defaults to US-ASCII in its absence. It's fun to see how RFC 2070 avoids to mention the Latin-1 default in RFC 2068 :-) > It is also mentioned that other text/* types may not even > have the charset parameter. So a default value would be > meaningless. Yes, I'm too lazy to dig in the IANA registry, are you aware of a text/* subtype with this property ? > What I understand from this section is that there is no > default character set for text/*. Not the only possible interpretation, but without doubt the dubious text/* MIME default is *NOT* Latin-1, for text/html in RFC 2070 it is also *NOT* Latin-1. RFC 2854 is clearer: RFC 2854 chapter 6 (about text/html): | The use of an explicit charset parameter is strongly | recommended. While [MIME] specifies "The default character | set, which must beassumed in the absence of a charset | parameter, is US-ASCII." [HTTP] Section 3.7.1, defines | that "media subtypes of the 'text' type are defined to have | a default charset value of 'ISO-8859-1'". Section 19.3 of | [HTTP] gives additional guidelines. Using an explicit | charset parameter will help avoid confusion. Frank
Received on Tuesday, 25 March 2008 18:39:57 UTC