- From: John Cowan <cowan@ccil.org>
- Date: Thu, 24 Sep 2009 17:37:03 -0400
- To: "Cokus, Michael S." <msc@mitre.org>
- Cc: Paul Pierce <prp@teleport.com>, EXI Comments <public-exi-comments@w3.org>, "public-xml-core-wg@w3.org" <public-xml-core-wg@w3.org>
Cokus, Michael S. scripsit: (Personal response, not approved by the XML Core WG) > > 7) We believe that the current representation of strings has no > > material advantage over UTF-8, since although it uses at most 3 bytes > > per character, 4-byte UTF characters are very rare except in documents > > written in obsolete scripts. > > In our initial response we noted that a number of languages in common > use are represented in UTF using 4 bytes. Actually you said no such thing. What you wrote was: E.g., there is a range of code points where EXI uses 2 bytes, versus 3 for UTF-8. Any content in such scripts would therefore be 50% larger in UTF-8 vs. current EXI. This would include the Devanagari scripts (used in several Indic languages, including Hindi), Thai, Hangul Jamo (but not Hangul syllables; Korea), Hiragana and Katakana (but not Kanji/CJK unified, Japan). This argument is correct, and I didn't challenge it. You then added: The EXI WG can't endorse the rarity claim, as these scripts appear to be in daily use by easily over one billion people with little observable tendencies to obsolete any of them. The "rarity claim" was for scripts using characters from U+10000 up, which require four bytes in UTF-8 but only three in EXI. Your examples were for the range U+0800 to U+3FFF, which require three bytes in UTF-8 but only two in EXI. In any case, I don't propose you do anything; this is just to correct the record. -- John Cowan http://ccil.org/~cowan cowan@ccil.org [P]olice in many lands are now complaining that local arrestees are insisting on having their Miranda rights read to them, just like perps in American TV cop shows. When it's explained to them that they are in a different country, where those rights do not exist, they become outraged. --Neal Stephenson
Received on Thursday, 24 September 2009 21:37:42 UTC