- From: Michael Day <mikeday@yeslogic.com>
- Date: Tue, 15 May 2007 19:35:50 +1000
Hi, Suggestion: drop UTF-32 from the character encoding detection section of HTML5, and even better, discourage of forbid user agents from implementing support for UTF-32. Why: - It's not widely used. In fact, has UTF-32 ever been used at all, outside of test suites? - It's not widely implemented. For example, the expat XML parser does not support it, and nobody cares. - When it is supported, people get it wrong, and the bugs are never fixed because no one uses UTF-32 anyway and no one cares. For an example of this, see html5lib 0.9, which implements the BOM detection algorithm, but gets it wrong by checking for UTF-16 before checking for UTF-32. Because the UTF-16 BOM (FF FE) is a substring of the UTF-32 BOM (FF FE 00 00) the test will always succeed and UTF-32 will always be misidentified as UTF-16. But no one cares, as no one uses UTF-32 anyway. - UTF-32 is horrendously inefficient for just about all real world text and its use should not be encouraged on the web. Really, UTF-32 only exists as a tutorial example of how UNICODE can be encoded, not as a practical character encoding that people should actually use. Please, drop UTF-32 and save implementors from worrying about it when no one uses it and no one should use it. Thanks, Michael -- Print XML with Prince! http://www.princexml.com
Received on Tuesday, 15 May 2007 02:35:50 UTC