At 10:09 AM 5/11/00 -0700, Saba Sundaramurthy wrote: > UTF-8 characters may expand to any number of bytes (up to 6 for >UCS-4), I don't think byte order is important since the sequence will be >written out one byte at a time in the correct order. Consensus is forming to restrict both UCS-4 and UTF-32 to the same code points that can be reached with UTF-16. That would result in UTF-8 being limited to 4 bytes maximum. > As confirmed by Michka, the BOM is placed in UTF-8 files only as a >'magic cookie'. > That is correct. While you need to know the byte order when converting from UTF-16 or UTF-32 (aka UCS-4) once the data is in UTF-8 there is no ambiguity about the arrangement of bytes, and the BOM is a 'signature' as we like to call it. (It's also not the bytes 'FE' 'FF' but the UTC-8 tranformation). A./ *(This will result in reducing the private use characters in UCS-4 to 137,472 characters)Received on Thursday, 11 May 2000 19:46:01 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:55 GMT