- From: Andrew Cunningham <acunningham@slv.vic.gov.au>
- Date: Sat, 15 Mar 2014 16:17:54 +1100
- To: Glenn Adams <glenn@skynav.com>
- Cc: "HTML WG (public-html@w3.org)" <public-html@w3.org>, W3C Style <www-style@w3.org>, Richard Ishida <ishida@w3.org>, www International <www-international@w3.org>, Zack Weinberg <zackw@panix.com>
- Message-ID: <CAOUP6Kk1AMXcig5BYwS1y-sNOWSHxXF7=DG8tEZQwxjL2RKgUQ@mail.gmail.com>
There are times when ascii incompatible legacy encodings are the only workable choice. Andrew On 15/03/2014 1:42 PM, "Glenn Adams" <glenn@skynav.com> wrote: > > > > On Fri, Mar 14, 2014 at 10:13 AM, Zack Weinberg <zackw@panix.com> wrote: > >> I'd like to suggest that the "Avoid these encodings" section at the >> bottom of the "Choosing and applying a character set" document should >> be merged into the "Choosing an encoding" section at the top of that >> document. You are saying the same thing in two places but slightly >> differently (leading to confusion), and the "Avoid these encodings" >> section is (IMHO) one of the most important bits of the document - it >> should be up front. >> >> I'd write it like this: >> >> ## Choosing an encoding >> >> Encode new content in UTF-8. All of the present generation of Web >> standards, servers, clients, and libraries are designed to work best >> with UTF-8, and it allows you to use the same encoding for all of your >> content regardless of language. If you have a corpus of "legacy" >> content in some other encoding, you are strongly encouraged to convert >> it within your server and send clients UTF-8 anyway. >> >> If it is *impossible* for you to send UTF-8 over the network, you need >> to be aware that many other historical encodings are poorly, or not at >> all, supported by Web clients. [The Encoding Standard] contains an >> *exhaustive* list of "legacy" character encodings that are supported: >> anything not in the list simply will not work. >> >> Furthermore, UTF-32, UTF-16, JIS_C6226-1983, JIS_X0212-1990, >> HZ-GB-2312, JOHAB (Windows code page 1361), CESU-8, UTF-7, BOCU-1, >> SCSU, ISO-2022 (all varieties), and EBCDIC (all varieties) MUST NOT be >> used. These encodings are *ASCII-incompatible* -- that is, in these >> encodings, octets with values 00 through 7F (hexadecimal) are not >> always interpreted as Unicode code points U+0000 through U+007F. This >> has historically been a source of security vulnerabilities. >> > > It seems strange for a guideline to say "MUST NOT". I would suggest SHOULD > NOT is more appropriate. In any case, we shouldn't be in the business of > telling content authors what they can or can't do. If they want to use an > encoding that isn't well supported, then the risk is theirs. > > >> >> The Big5 and EUC-JP encodings suffer from interoperability problems >> due to the large number of incompatible variants "in the wild", and >> should be avoided. ISO-8859-8 ("visually ordered" Hebrew) should also >> be avoided; if UTF-8 cannot be used for Hebrew, use ISO-8859-8-i, >> which like Unicode is "logically ordered". >> >> The "replacement" encoding, listed in the Encoding Standard, is not >> actually an encoding; it is a fallback that maps every octet to U+FFFD >> REPLACEMENT CHARACTER. Obviously, it is not useful to transmit data >> in this encoding. The "x-user-defined" encoding is a single-byte >> encoding whose lower half is ASCII and whose upper half is mapped into >> the Unicode Private Use Area. Like the PUA in general, using this >> encoding on the public Internet is best avoided. >> >> --- >> >> The other document ("Declaring character encodings in CSS") looks good >> to me, except for one technical point that needs clarified: If there >> is a byte order mark, that means the '@' in '@charset' is not the >> first byte of the stylesheet, and therefore the @charset directive is >> ineffective. (Unless the bit about IE 10 and 11 means that they skip >> the BOM when looking for @charset?) >> >> zw >> >> >> On Fri, Mar 7, 2014 at 7:49 AM, Richard Ishida <ishida@w3.org> wrote: >> > Following on from the revision of the i18n article about encoding >> > declarations in HTML (that review period ends today), I have revised and >> > updated two further articles: >> > >> > Choosing & applying a character encoding >> > http://www.w3.org/International/questions/qa-choosing-encodings-new >> > >> > Declaring character encodings in CSS >> > http://www.w3.org/International/questions/qa-css-charset-new >> > >> > Please take a look and send any comments to www-international@w3.orgbefore >> > 14th March. >> > >> > Thanks, >> > RI >> > >> > >> > >> > On 28/02/2014 14:20, Richard Ishida wrote: >> >> An updated version of Declaring character encodings in HTML[1] is out >> >> for review at >> >> >> >> >> >> >> http://www.w3.org/International/questions/qa-html-encoding-declarations-new >> >> >> >> We are looking for comments before 7 March. Please send comments to >> >> www-international@w3.org. >> >> >> >> After the review period is over, this content will be copied to the >> same >> >> location as the current version of the document, ie. >> >> >> >> >> http://www.w3.org/International/questions/qa-html-encoding-declarations >> >> >> >> and the URL of the updated version will cease to exist. >> >> >> >> The update brings the article in line with recent developments in >> HTML5, >> >> and de-emphasizes information about legacy formats. >> >> >> >> An attempt was also made to organize the material so that readers can >> >> find information more quickly, and also de-clutter the essential >> >> information by moving edge topics, such as UTF-16 and charset links, >> >> down the page. This led to the article being almost completely >> rewritten. >> >> >> >> >> >> >> >> RI >> > >> >> >
Received on Saturday, 15 March 2014 05:18:26 UTC