- From: Henri Sivonen <hsivonen@hsivonen.fi>
- Date: Mon, 27 Jan 2014 11:05:37 +0200
- To: "www-international@w3.org" <www-international@w3.org>
On Fri, Jan 24, 2014 at 3:55 PM, Richard Ishida <ishida@w3.org> wrote: > I'm thinking that we should be pointing them to the Encoding spec, rather > than the IANA list. Good idea. > We could point at http://encoding.spec.whatwg.org/#concept-encoding-get, > although there are two issues with that: > > 1. that table isn't really intended to provide a list of labels people > should use, it maps labels to encodings > > 2. the most commonly used label for an encoding, where there are more than > one per encoding, is generally not at the top of the list (although it is > used for the name of the encoding). 3. The encodings don't have equal status: * Apart from UTF-8, GB18030 is the only other encoding that can be used for form submissions without data loss. * x-user-defined must not be used except in overrideMimeType() in XHR in browser versions that don't support obtaining the response bytes as an ArrayBuffer. (Publishers who use intentionally mis-encoded fonts with @font-face, which of course no one should do, are better off declaring windows-1252 even if that means they are polluting search data for everyone else.) * The labels that map to the replacement encoding must not be used and it makes no sense to use them. * UTF-16BE, UTF-16LE (including the UTF-16 label), HZ-GB-2312 and ISO-2022-JP are dangerous and authors should expect browser vendors take varying levels of countermeasures against these, which makes its a bad idea to use these. (In particular, if telemetry data permits, I intend to map HZ-GB-2312, which is *really* scary, to the replacement encoding in Gecko.) Even if browser don't take countermeasures, it's still a bad idea to use these, because they are dangerous (especially for encoding user-supplied content). Consider these as "must not use". * The implementation status of Big5 is sad. Would-be users of Big5 should migrate to UTF-8 even more hastily than users of the other legacy encodings. * There are interoperability issues with the parts of EUC-JP that an Encoding Standard-compliant *encoder* never outputs. Would-be users of EUC-JP should migrate to UTF-8 even more hastily than the users of other legacy encodings. * One shouldn't expect the current state of the Encoding Standard to be the last word on ibm866, x-mac-cyrillic and koi8-u. Don't use them. * Don't use iso-8859-8 (Visual Hebrew). Support may be going away in the future. Always use the logical order for Hebrew. So, really, people should only use one encoding, UTF-8, and the list of labels they should use should have one item only: "UTF-8". -- Henri Sivonen hsivonen@hsivonen.fi https://hsivonen.fi/
Received on Monday, 27 January 2014 09:06:08 UTC