On Jan 22, 2008, at 6:26 PM, Yutaka Oiwa wrote: > There are number of ways to solve this, and my current preference is > to add the following restrictions regarding charset auto-detection: > > * If charset is declared in the header, it MUST be honored. (current > requirement in 2.1.1 may be copied). > > * If charset is not declared in the header, clients MAY guess the > charset of the payload by any means (e.g. by examining the payload > octets, using special attributions defined for content-types, or > using the client-defined defaults). However, if the payload is > composed solely by octets representing ASCII printable > characters and > HTML-defined control characters (CR, LF, HT, VT and SP), it MUST be > treated as if it is in ASCII or equivalent character sets. If the > payload contains other octets, the behavior of clients is > implementation-dependent. > > By the above specification, the client is disallowed to guess charset > which is not ASCII upper-compatible (such as UTF-7). I think it would be easier to simply say that (i.e., "The charset guessing algorithm MUST exclude 7-bit character encodings other than US-ASCII. In particular, UTF-7 MUST NOT be guessed.") ....RoyReceived on Wednesday, 23 January 2008 02:54:24 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 12 September 2008 03:49:00 GMT