W3C home > Mailing lists > Public > w3c-ietf-xmldsig@w3.org > January to March 2003

RE: Possible XML and C14N errata

From: Martin Duerst <duerst@w3.org>
Date: Thu, 27 Feb 2003 10:48:47 -0500
Message-Id: <>
To: Francois Yergeau <FYergeau@alis.com>, John Boyer <JBoyer@PureEdge.com>, Joseph Reagle <reagle@w3.org>, w3c-ietf-xmldsig@w3.org

At 10:19 03/02/27 -0500, Francois Yergeau wrote:
>John Boyer wrote:
> > ...Xerces code base from Oct. 2002...
> >
> > For example, if I place byte 0x82 from the ANSI code page
> > into content, it gets translated to Unicode 0x201A, which our
> > C14N implementation then encodes with the proper 3 byte UTF-8
> > sequence.
> >
> > But when Xerces reads the UTF-8, it reconstitutes the Unicode
> > 0x201A, then throws an error saying that 0x1A is illegal...
>Two comments:
>1) I don't think this has anything to do with the CharData production in XML
>1.0 being confusing or not.  It looks like an ugly bug in Xerces which
>misinterpets U+201A as illegal because its memory representation (in UTF-16
>or UTF-32, presumably) contains the byte 0x1A.  I hope this bug does not
>extend beyond the part of the code that checks the CharData production...

I agree with Francois that this is a very ugly bug that has nothing
to do with the CharData production. Somebody was messing around with
Xerces had absolutely no clue, it seems. The earlier you tell them,
the better, I guess.

Regards,    Martin.

>2) [Process] This doesn't concern the xml-editor list, which is not a
>discussion list, anymore, so I removed it from the recipients.
Received on Thursday, 27 February 2003 10:52:05 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:21:38 UTC