- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Wed, 10 Sep 2003 18:25:52 +0200
- To: "Sigurd Lerstad" <sigler@bredband.no>
- Cc: <www-svg@w3.org>
* Sigurd Lerstad wrote: >> >In an XML file that says utf-8 in the xml declaration. There could be 4 >> >byte characters later in the file. How should those be treated to convert >> >them to utf-16? >> >> Just like any other sequence. U+10000 is F0 90 80 80 in UTF-8 and >> D8 00 DC 00 or 00 D8 00 DC (depending on byte order) in UTF-16. >Okay, I feel stupid, I've purchased the utf-8 spec from iso, and they >explain how to convert from utf-8 to ucs4, I'm afraid we're talking past one >another. My question is simply: How can 4 bytes be represented in 2 bytes, >it can't be done. what am I missing? That UTF-16 does not mean two bytes per character. As I've said, characters above U+FFFF are represented using *four* bytes in UTF-16.
Received on Wednesday, 10 September 2003 12:26:03 UTC