RE: Possible XML and C14N errata from Francois Yergeau on 2003-02-27 (w3c-ietf-xmldsig@w3.org from January to March 2003)

From: Francois Yergeau <FYergeau@alis.com>
Date: Thu, 27 Feb 2003 10:19:44 -0500
To: John Boyer <JBoyer@PureEdge.com>, Martin Duerst <duerst@w3.org>, Joseph Reagle <reagle@w3.org>, w3c-ietf-xmldsig@w3.org
Message-ID: <F7D4BDA0E5A1D14B99D32C022AEB7366B3E6B6@alis-2k.alis.domain>

John Boyer wrote:
> ...Xerces code base from Oct. 2002...
> 
> For example, if I place byte 0x82 from the ANSI code page 
> into content, it gets translated to Unicode 0x201A, which our 
> C14N implementation then encodes with the proper 3 byte UTF-8 
> sequence.  
> 
> But when Xerces reads the UTF-8, it reconstitutes the Unicode 
> 0x201A, then throws an error saying that 0x1A is illegal... 

Two comments:

1) I don't think this has anything to do with the CharData production in XML
1.0 being confusing or not.  It looks like an ugly bug in Xerces which
misinterpets U+201A as illegal because its memory representation (in UTF-16
or UTF-32, presumably) contains the byte 0x1A.  I hope this bug does not
extend beyond the part of the code that checks the CharData production...

2) [Process] This doesn't concern the xml-editor list, which is not a
discussion list, anymore, so I removed it from the recipients.

Regards,

-- 
François

Received on Thursday, 27 February 2003 10:21:12 UTC