- From: Joshua Cranmer <Pidgeot18@verizon.net>
- Date: Sun, 16 Mar 2014 17:48:11 -0500
- To: www-style@w3.org
On 3/14/2014 10:15 PM, Zack Weinberg wrote: > On Fri, Mar 14, 2014 at 10:42 PM, Glenn Adams <glenn@skynav.com> wrote: >> On Fri, Mar 14, 2014 at 10:13 AM, Zack Weinberg <zackw@panix.com> wrote: > ... >>> Furthermore, UTF-32, UTF-16, JIS_C6226-1983, JIS_X0212-1990, >>> HZ-GB-2312, JOHAB (Windows code page 1361), CESU-8, UTF-7, BOCU-1, >>> SCSU, ISO-2022 (all varieties), and EBCDIC (all varieties) MUST NOT be >>> used. These encodings are *ASCII-incompatible* -- that is, in these >>> encodings, octets with values 00 through 7F (hexadecimal) are not >>> always interpreted as Unicode code points U+0000 through U+007F. This >>> has historically been a source of security vulnerabilities. >> It seems strange for a guideline to say "MUST NOT". I would suggest SHOULD >> NOT is more appropriate. In any case, we shouldn't be in the business of >> telling content authors what they can or can't do. If they want to use an >> encoding that isn't well supported, then the risk is theirs. > You can tell I'm used to writing normative specs, huh? How's this instead? > > "UTF-32, UTF-16, (etcetera) are especially unlikely to work: HTML5 and > the Encoding Standard forbid Web clients from accepting most of them. > (These encodings are *ASCII-incompatible* -- octets with values 00 > through 7F (hexadecimal) do not always encode U+0000 through U+007F -- > which has historically been a source of security vulnerabilities.)" > Strictly speaking, it's not completely true. UTF-16, HZ-GB-2312, and ISO-2022-JP are both permitted by the encoding standard. CESU-8. UTF-7, BOCU-1, and SCSU are explicitly prohibited by hTML5 (although email clients need to support UTF-7, unfortunately). EBCDIC and UTF-32 are "especially discouraged" (to the point that HTML5 doesn't attempt to support them, like it does UTF-16 via BOM detection). The JIS_* and JOHAB standards are mentioned by neither the encoding standard nor HTML5. ISO-2022-CN and ISO-2022-KR are mapped to the replacement encoding and so are effectively banned by the encoding standard (AIUI). -- Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth
Received on Sunday, 16 March 2014 22:49:03 UTC