Re: IBM's XML 1.1 tests

> According to table 3.1B of Unicode 3.2, the sequence e0 9f ac is not
> a valid UTF-8 sequence.

Right.  But if you don't check that it's legal, and follow the natural
algorithm for decoding it, you will get 7EC.  Some implementations
just apply the algorithm blindly without checking.

There were two mistakes: the code point used was 2028 decimal (= 7EC hex)
instead of 2028 hex.  And 2028 decimal was encoded as a 3-byte sequence
instead of a 2-byte sequence.

-- Richard

Received on Friday, 7 November 2003 18:26:10 UTC