Re: Two new encoding related articles for review from Zack Weinberg on 2014-03-15 (www-international@w3.org from January to March 2014)

From: Zack Weinberg <zackw@panix.com>
Date: Fri, 14 Mar 2014 23:15:36 -0400
To: Glenn Adams <glenn@skynav.com>
Cc: Richard Ishida <ishida@w3.org>, www International <www-international@w3.org>, W3C Style <www-style@w3.org>, "HTML WG (public-html@w3.org)" <public-html@w3.org>
Message-ID: <CAKCAbMjR6TJZFghxFCEcWHfwi7ZGDjc48SD5bRHi7cW21tNd3Q@mail.gmail.com>

On Fri, Mar 14, 2014 at 10:42 PM, Glenn Adams <glenn@skynav.com> wrote:
> On Fri, Mar 14, 2014 at 10:13 AM, Zack Weinberg <zackw@panix.com> wrote:
...
>> Furthermore, UTF-32, UTF-16, JIS_C6226-1983, JIS_X0212-1990,
>> HZ-GB-2312, JOHAB (Windows code page 1361), CESU-8, UTF-7, BOCU-1,
>> SCSU, ISO-2022 (all varieties), and EBCDIC (all varieties) MUST NOT be
>> used.  These encodings are *ASCII-incompatible* -- that is, in these
>> encodings, octets with values 00 through 7F (hexadecimal) are not
>> always interpreted as Unicode code points U+0000 through U+007F.  This
>> has historically been a source of security vulnerabilities.
>
> It seems strange for a guideline to say "MUST NOT". I would suggest SHOULD
> NOT is more appropriate. In any case, we shouldn't be in the business of
> telling content authors what they can or can't do. If they want to use an
> encoding that isn't well supported, then the risk is theirs.

You can tell I'm used to writing normative specs, huh?  How's this instead?

"UTF-32, UTF-16, (etcetera) are especially unlikely to work: HTML5 and
the Encoding Standard forbid Web clients from accepting most of them.
(These encodings are *ASCII-incompatible* -- octets with values 00
through 7F (hexadecimal) do not always encode U+0000 through U+007F --
which has historically been a source of security vulnerabilities.)"

Received on Saturday, 15 March 2014 03:16:02 UTC