Re: IBM's XML 1.1 tests

> > According to table 3.1B of Unicode 3.2, the sequence e0 9f ac is not
> > a valid UTF-8 sequence.
> 
> Right.  But if you don't check that it's legal, and follow the natural
> algorithm for decoding it, you will get 7EC.  Some implementations
> just apply the algorithm blindly without checking.
> 
> There were two mistakes: the code point used was 2028 decimal (= 7EC hex)
> instead of 2028 hex.  And 2028 decimal was encoded as a 3-byte sequence
> instead of a 2-byte sequence.

Thanks, that clears it up for me.

Karl

Received on Friday, 7 November 2003 19:23:59 UTC