- From: <jcowan@reutershealth.com>
- Date: Sat, 6 Dec 2003 23:48:51 -0500
- To: Etan Wexler <ewexler@stickdog.com>
- Cc: Chris Lilley <chris@w3.org>, www-international@w3.org, w3c-css-wg@w3.org, www-style@w3.org
Etan Wexler scripsit: > There is no encoding scheme UTF-16. There is a "charset" value in the > IANA registry called "UTF-16", but UTF-16 is an encoding form. Any > serialized UTF-16 document is either big-endian or little-endian. > Nevertheless, the "UTF-16" label is allowed and in use, so we resort to > the BOM to disambiguate the label's meaning. The situation with UTF-32 > is analogous. The meaning of the encoding scheme UTF-16 is that if the first two bytes are 0xFE 0xFF, then the rest is interpreted as big-endian UTF-16; if the first two bytes are 0xFF 0xFE, then the rest is interpreted as little-endian UTF-16; otherwise, the whole is interpreted as big-endian UTF-16. In the encoding schemes UTF-16BE and UTF-16LE, the interpretation is always big-endian or little-endian respectively; if the first character is U+FEFF, then it is a ZWNBSP and part of the data stream. -- John Cowan www.reutershealth.com www.ccil.org/~cowan jcowan@reutershealth.com Arise, you prisoners of Windows / Arise, you slaves of Redmond, Wash, The day and hour soon are coming / When all the IT folks say "Gosh!" It isn't from a clever lawsuit / That Windowsland will finally fall, But thousands writing open source code / Like mice who nibble through a wall.
Received on Saturday, 6 December 2003 23:49:46 UTC