Re: utf-8

* Sigurd Lerstad wrote:
>DOM is always 2 bytes, what happens in an utf-8 file when you encounter a
>character that uses 4 bytes (UCS-4), just ignore the two last bytes?

Characters > U+FFFF are encoded using surrogate characters in UTF-16.

Received on Friday, 25 July 2003 10:55:46 UTC