RE: Possible XML and C14N errata from Martin Duerst on 2003-02-27 (w3c-ietf-xmldsig@w3.org from January to March 2003)

From: Martin Duerst <duerst@w3.org>
Date: Thu, 27 Feb 2003 10:48:47 -0500
To: Francois Yergeau <FYergeau@alis.com>, John Boyer <JBoyer@PureEdge.com>, Joseph Reagle <reagle@w3.org>, w3c-ietf-xmldsig@w3.org
Message-Id: <4.2.0.58.J.20030227104733.03e8b408@localhost>

At 10:19 03/02/27 -0500, Francois Yergeau wrote:
>John Boyer wrote:
> > ...Xerces code base from Oct. 2002...
> >
> > For example, if I place byte 0x82 from the ANSI code page
> > into content, it gets translated to Unicode 0x201A, which our
> > C14N implementation then encodes with the proper 3 byte UTF-8
> > sequence.
> >
> > But when Xerces reads the UTF-8, it reconstitutes the Unicode
> > 0x201A, then throws an error saying that 0x1A is illegal...
>
>Two comments:
>
>1) I don't think this has anything to do with the CharData production in XML
>1.0 being confusing or not.  It looks like an ugly bug in Xerces which
>misinterpets U+201A as illegal because its memory representation (in UTF-16
>or UTF-32, presumably) contains the byte 0x1A.  I hope this bug does not
>extend beyond the part of the code that checks the CharData production...

I agree with Francois that this is a very ugly bug that has nothing
to do with the CharData production. Somebody was messing around with
Xerces had absolutely no clue, it seems. The earlier you tell them,
the better, I guess.

Regards,    Martin.




>2) [Process] This doesn't concern the xml-editor list, which is not a
>discussion list, anymore, so I removed it from the recipients.
>
>Regards,
>
>--
>Fran輟is

Received on Thursday, 27 February 2003 10:52:05 UTC