RE: Possible XML and C14N errata

John Boyer wrote:
> ...Xerces code base from Oct. 2002...
> 
> For example, if I place byte 0x82 from the ANSI code page 
> into content, it gets translated to Unicode 0x201A, which our 
> C14N implementation then encodes with the proper 3 byte UTF-8 
> sequence.  
> 
> But when Xerces reads the UTF-8, it reconstitutes the Unicode 
> 0x201A, then throws an error saying that 0x1A is illegal... 

Two comments:

1) I don't think this has anything to do with the CharData production in XML
1.0 being confusing or not.  It looks like an ugly bug in Xerces which
misinterpets U+201A as illegal because its memory representation (in UTF-16
or UTF-32, presumably) contains the byte 0x1A.  I hope this bug does not
extend beyond the part of the code that checks the CharData production...

2) [Process] This doesn't concern the xml-editor list, which is not a
discussion list, anymore, so I removed it from the recipients.

Regards,

-- 
François

Received on Thursday, 27 February 2003 10:21:12 UTC