Etan Wexler scripsit: > There is no encoding scheme UTF-16. There is a "charset" value in the > IANA registry called "UTF-16", but UTF-16 is an encoding form. Any > serialized UTF-16 document is either big-endian or little-endian. > Nevertheless, the "UTF-16" label is allowed and in use, so we resort to > the BOM to disambiguate the label's meaning. The situation with UTF-32 > is analogous. The meaning of the encoding scheme UTF-16 is that if the first two bytes are 0xFE 0xFF, then the rest is interpreted as big-endian UTF-16; if the first two bytes are 0xFF 0xFE, then the rest is interpreted as little-endian UTF-16; otherwise, the whole is interpreted as big-endian UTF-16. In the encoding schemes UTF-16BE and UTF-16LE, the interpretation is always big-endian or little-endian respectively; if the first character is U+FEFF, then it is a ZWNBSP and part of the data stream. -- John Cowan www.reutershealth.com www.ccil.org/~cowan jcowan@reutershealth.com Arise, you prisoners of Windows / Arise, you slaves of Redmond, Wash, The day and hour soon are coming / When all the IT folks say "Gosh!" It isn't from a clever lawsuit / That Windowsland will finally fall, But thousands writing open source code / Like mice who nibble through a wall.Received on Saturday, 6 December 2003 23:49:46 UTC
This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:40:48 UTC