- From: Asmus Freytag <asmusf@ix.netcom.com>
- Date: Thu, 11 May 2000 16:34:51 -0700
- To: Saba Sundaramurthy <ssundaramurthy@verisign.com>, "'Robert A. Rosenberg'" <rarpsl@flashcom.net>
- Cc: mozilla-i18n@mozilla.org, www-international@w3.org, i18n-prog@acoin.com
At 10:09 AM 5/11/00 -0700, Saba Sundaramurthy wrote: > UTF-8 characters may expand to any number of bytes (up to 6 for >UCS-4), I don't think byte order is important since the sequence will be >written out one byte at a time in the correct order. Consensus is forming to restrict both UCS-4 and UTF-32 to the same code points that can be reached with UTF-16. That would result in UTF-8 being limited to 4 bytes maximum. > As confirmed by Michka, the BOM is placed in UTF-8 files only as a >'magic cookie'. > That is correct. While you need to know the byte order when converting from UTF-16 or UTF-32 (aka UCS-4) once the data is in UTF-8 there is no ambiguity about the arrangement of bytes, and the BOM is a 'signature' as we like to call it. (It's also not the bytes 'FE' 'FF' but the UTC-8 tranformation). A./ *(This will result in reducing the private use characters in UCS-4 to 137,472 characters)
Received on Thursday, 11 May 2000 19:46:01 UTC