Re: security impact of dropping charset default [Re: text/* types and charset defaults [i20]]

On Jan 22, 2008, at 6:26 PM, Yutaka Oiwa wrote:
> There are number of ways to solve this, and my current preference is
> to add the following restrictions regarding charset auto-detection:
>
>  * If charset is declared in the header, it MUST be honored. (current
>    requirement in 2.1.1 may be copied).
>
>  * If charset is not declared in the header, clients MAY guess the
>    charset of the payload by any means (e.g. by examining the payload
>    octets, using special attributions defined for content-types, or
>    using the client-defined defaults).  However, if the payload is
>    composed solely by octets representing ASCII printable  
> characters and
>    HTML-defined control characters (CR, LF, HT, VT and SP), it MUST be
>    treated as if it is in ASCII or equivalent character sets. If the
>    payload contains other octets, the behavior of clients is
>    implementation-dependent.
>
> By the above specification, the client is disallowed to guess charset
> which is not ASCII upper-compatible (such as UTF-7).

I think it would be easier to simply say that (i.e., "The charset
guessing algorithm MUST exclude 7-bit character encodings other
than US-ASCII.  In particular, UTF-7 MUST NOT be guessed.")

....Roy

Received on Wednesday, 23 January 2008 02:54:24 UTC