- From: Francois Yergeau <FYergeau@alis.com>
- Date: Thu, 27 Feb 2003 10:19:44 -0500
- To: John Boyer <JBoyer@PureEdge.com>, Martin Duerst <duerst@w3.org>, Joseph Reagle <reagle@w3.org>, w3c-ietf-xmldsig@w3.org
John Boyer wrote: > ...Xerces code base from Oct. 2002... > > For example, if I place byte 0x82 from the ANSI code page > into content, it gets translated to Unicode 0x201A, which our > C14N implementation then encodes with the proper 3 byte UTF-8 > sequence. > > But when Xerces reads the UTF-8, it reconstitutes the Unicode > 0x201A, then throws an error saying that 0x1A is illegal... Two comments: 1) I don't think this has anything to do with the CharData production in XML 1.0 being confusing or not. It looks like an ugly bug in Xerces which misinterpets U+201A as illegal because its memory representation (in UTF-16 or UTF-32, presumably) contains the byte 0x1A. I hope this bug does not extend beyond the part of the code that checks the CharData production... 2) [Process] This doesn't concern the xml-editor list, which is not a discussion list, anymore, so I removed it from the recipients. Regards, -- François
Received on Thursday, 27 February 2003 10:21:12 UTC