- From: Karl Waclawek <karl@waclawek.net>
- Date: Fri, 7 Nov 2003 19:25:58 -0500
- To: "Richard Tobin" <richard@cogsci.ed.ac.uk>
- Cc: <public-xml-testsuite@w3.org>
> > According to table 3.1B of Unicode 3.2, the sequence e0 9f ac is not > > a valid UTF-8 sequence. > > Right. But if you don't check that it's legal, and follow the natural > algorithm for decoding it, you will get 7EC. Some implementations > just apply the algorithm blindly without checking. > > There were two mistakes: the code point used was 2028 decimal (= 7EC hex) > instead of 2028 hex. And 2028 decimal was encoded as a 3-byte sequence > instead of a 2-byte sequence. Thanks, that clears it up for me. Karl
Received on Friday, 7 November 2003 19:23:59 UTC