> > * Sigurd Lerstad wrote: > >In an XML file that says utf-8 in the xml declaration. There could be 4 byte > >characters later in the file. How should those be treated to convert them to > >utf-16? > > Just like any other sequence. U+10000 is F0 90 80 80 in UTF-8 and > D8 00 DC 00 or 00 D8 00 DC (depending on byte order) in UTF-16. > > >Is there some spec which says what to do? > > Unicode. > Okay, I feel stupid, I've purchased the utf-8 spec from iso, and they explain how to convert from utf-8 to ucs4, I'm afraid we're talking past one another. My question is simply: How can 4 bytes be represented in 2 bytes, it can't be done. what am I missing? thanks, -- Sigurd LerstadReceived on Wednesday, 10 September 2003 11:39:04 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 4 September 2006 18:11:23 GMT