Re: Canonical XML error from Frederick.Hirsch@nokia.com on 2011-09-06 (public-xmlsec@w3.org from September 2011)

From: <Frederick.Hirsch@nokia.com>
Date: Tue, 6 Sep 2011 15:43:31 +0000
To: <steve.derose@openamplify.com>
CC: <Frederick.Hirsch@nokia.com>, <jboyer@PureEdge.com>, <public-xmlsec@we.org>, <w3c-ietf-xmldsig@w3.org>, <public-xmlsec@w3.org>
Message-ID: <69C5491D-8DFD-4C34-8827-5145D7C8C82C@nokia.com>

Steve

The Canonical XML Recommendation [1] states in section 1.1 and details in section 2.1 that "CDATA sections are replaced with their character content". This means the characters to mark the end of a CDATA section are removed as part of replacing that section with its character content.

If you are asking how to present what looks like a CDATA section so it can be retained as text without having replacement occur then this is not a canonicalization question, as the characters will be treated as ordinary text and not recognized as a CDATA section.   If the start of CDATA were to have < escaped as &lt; , for example, no CDATA section would be present, and canonical character encoding would occur in a uniform manner.

As a consequence no encoding need be specified and no errata is needed.

Does this make sense?

regards, Frederick

Frederick Hirsch, Nokia
Chair XML Security WG

[1] http://www.w3.org/TR/2001/REC-xml-c14n-20010315

For tracker this should complete ACTION-833

On Aug 30, 2011, at 9:20 AM, ext Steve DeRose wrote:

I recently discovered that the Canonical XML spec does not appear to specify  which of several possible options to use, to encode the literal string "]]>" in content. I have also checked the errata, and cannot find this mentioned there.

This strings marks the end of an XML CDATA marked section, so must be escaped somehow when needed literally. It seems to me that the best choice given other decisions in Canonical XML, is to express it as  "]]&gt;". That is the method used in the source for the current edition of the XML Recommendation. But of course there are multiple alternatives, including at least:


    &#x5D;]>
    ]&#x5D;>
    ]]&#x3E;
    &#x5D;&#x5D;>
    &#x5D;]&#x3E;
    &#x5D;&#x5D;&#x3E;
    &#x5D;]&gt;
    &#x5D;&#x5D;&gt;


Clearly, if different users or applications encode the same intended content in different ways, that's a problem in the context of Canonical XML. Whether the string is common is irrelevant. Yet, there are contexts where this string naturally occurs: the most obvious are documents describing XML, and documents containing program code examples such as "a[b[0]]>1".

Please specify a specific encoding for this string in Canonical XML documents.

Steve DeRose
sderose@acm.org<mailto:sderose@acm.org>

Received on Tuesday, 6 September 2011 15:46:19 UTC