W3C home > Mailing lists > Public > www-svg@w3.org > September 2003

Re: utf-8

From: Sigurd Lerstad <sigler@bredband.no>
Date: Wed, 10 Sep 2003 17:42:38 +0200
Message-ID: <05c901c377b2$29c164d0$6e1273d5@mmstudio>
To: "Bjoern Hoehrmann" <derhoermi@gmx.net>
Cc: <www-svg@w3.org>

>
> * Sigurd Lerstad wrote:
> >In an XML file that says utf-8 in the xml declaration. There could be 4
byte
> >characters later in the file. How should those be treated to convert them
to
> >utf-16?
>
> Just like any other sequence. U+10000 is F0 90 80 80 in UTF-8 and
> D8 00 DC 00 or 00 D8 00 DC (depending on byte order) in UTF-16.
>
> >Is there some spec which says what to do?
>
> Unicode.
>

Okay, I feel stupid, I've purchased the utf-8 spec from iso, and they
explain how to convert from utf-8 to ucs4, I'm afraid we're talking past one
another. My question is simply: How can 4 bytes be represented in 2 bytes,
it can't be done. what am I missing?

thanks,

--
Sigurd Lerstad
Received on Wednesday, 10 September 2003 11:39:04 GMT

This archive was generated by hypermail 2.3.1 : Friday, 8 March 2013 15:54:25 GMT