- From: <bugzilla@jessica.w3.org>
- Date: Sun, 28 Nov 2010 19:52:15 +0000
- To: public-html-bugzilla@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=11423 --- Comment #2 from brian m. carlson <sandals@crustytoothpaste.net> 2010-11-28 19:52:14 UTC --- (In reply to comment #1) > (In reply to comment #0) > > HTML5 should not be encouraging > > people to use a character set that the creator has not even bothered to > > register with IANA. > > It doesn't. When a user agent would otherwise use an encoding given in the first column of the following table to either convert content to Unicode characters or convert Unicode characters to bytes, it *must* instead use the encoding given in the cell in the second column of the same row. When a byte or sequence of bytes is treated differently due to this encoding aliasing, it is said to have been misinterpreted for compatibility. (Emphasis mine.) EUC-KR and KS_C_5601-1987 are mapped onto windows-949. I think a "must" directive is definitely an encouragement, even if you don't. > > It's not like registering a character set with IANA is a particularly difficult or drawn-out process � > > And yet Microsoft's attempt to do so (back in 2005) seems to have failed: > > http://mail.apps.ietf.org/ietf/charsets/msg01510.html Probably because, as the responses indicate, the specifications for those character sets were insufficient and contradictory. It doesn't matter what exactly the reason is; it's not registered. HP, IBM, and Adobe have managed to do it, so I'm sure that it's not impossible or unreasonably difficult. > It's trivial to comply with this, since "preferred MIME name" is defined by the > spec as "the name or alias labeled as 'preferred MIME name' in the IANA > Character Sets registry, if there is one, or the encoding's name, if none of > the aliases are so labeled". The name of windows-949 is "windows-949". I believe "if there is one" means "if there is a name or alias labeled as 'preferred MIME name'", not "if there is an entry in the IANA Character Sets registry". Even if we were to use your suggested interpretation, there are other names for this character set, such as "CP949". How are we to know what the preferred name is if it's not IANA-registered? > "User agents must at a minimum support the UTF-8 and Windows-1252 encodings, > but may support more." Right, but if they support EUC-KR or KS_C_5601-1987, they are effectively required to. (Actually, the spec seems to prohibit the useful implementation of EUC-KR, since it's mandated that user agents use something else instead.) If it's acceptable to support EUC-KR and not windows-949, then the spec should so state. > > I must therefore object to suggesting or encouraging the use of windows-949 > > until it has been registered appropriately with IANA. > > Maybe try registering it? Perhaps you'll have better luck than Microsoft. I'm really not interested in registering what amount to platform-specific character sets. Plus, since I don't use that platform, I have no knowledge about what the mapping should look like or whether it is correct. Finally, there are numerous character sets in existence that handle Korean just fine, including UTF-8, and I don't see the need to add more. -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Sunday, 28 November 2010 19:52:18 UTC