- From: Roy T. Fielding <fielding@gbiv.com>
- Date: Tue, 22 Jan 2008 18:54:20 -0800
- To: Yutaka Oiwa <y.oiwa@aist.go.jp>
- Cc: Mark Nottingham <mnot@mnot.net>, Julian Reschke <julian.reschke@gmx.de>, "'HTTP Working Group'" <ietf-http-wg@w3.org>
On Jan 22, 2008, at 6:26 PM, Yutaka Oiwa wrote: > There are number of ways to solve this, and my current preference is > to add the following restrictions regarding charset auto-detection: > > * If charset is declared in the header, it MUST be honored. (current > requirement in 2.1.1 may be copied). > > * If charset is not declared in the header, clients MAY guess the > charset of the payload by any means (e.g. by examining the payload > octets, using special attributions defined for content-types, or > using the client-defined defaults). However, if the payload is > composed solely by octets representing ASCII printable > characters and > HTML-defined control characters (CR, LF, HT, VT and SP), it MUST be > treated as if it is in ASCII or equivalent character sets. If the > payload contains other octets, the behavior of clients is > implementation-dependent. > > By the above specification, the client is disallowed to guess charset > which is not ASCII upper-compatible (such as UTF-7). I think it would be easier to simply say that (i.e., "The charset guessing algorithm MUST exclude 7-bit character encodings other than US-ASCII. In particular, UTF-7 MUST NOT be guessed.") ....Roy
Received on Wednesday, 23 January 2008 02:54:24 UTC