Re: security impact of dropping charset default [Re: text/* types and charset defaults [i20]] from Roy T. Fielding on 2008-01-23 (ietf-http-wg@w3.org from January to March 2008)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Tue, 22 Jan 2008 18:54:20 -0800
To: Yutaka Oiwa <y.oiwa@aist.go.jp>
Cc: Mark Nottingham <mnot@mnot.net>, Julian Reschke <julian.reschke@gmx.de>, "'HTTP Working Group'" <ietf-http-wg@w3.org>
Message-Id: <3AC0BA61-6E52-4D80-9298-BAC7D36CFD9D@gbiv.com>

On Jan 22, 2008, at 6:26 PM, Yutaka Oiwa wrote:
> There are number of ways to solve this, and my current preference is
> to add the following restrictions regarding charset auto-detection:
>
>  * If charset is declared in the header, it MUST be honored. (current
>    requirement in 2.1.1 may be copied).
>
>  * If charset is not declared in the header, clients MAY guess the
>    charset of the payload by any means (e.g. by examining the payload
>    octets, using special attributions defined for content-types, or
>    using the client-defined defaults).  However, if the payload is
>    composed solely by octets representing ASCII printable  
> characters and
>    HTML-defined control characters (CR, LF, HT, VT and SP), it MUST be
>    treated as if it is in ASCII or equivalent character sets. If the
>    payload contains other octets, the behavior of clients is
>    implementation-dependent.
>
> By the above specification, the client is disallowed to guess charset
> which is not ASCII upper-compatible (such as UTF-7).

I think it would be easier to simply say that (i.e., "The charset
guessing algorithm MUST exclude 7-bit character encodings other
than US-ASCII.  In particular, UTF-7 MUST NOT be guessed.")

....Roy

Received on Wednesday, 23 January 2008 02:54:24 UTC