Changes to C14N 2.0 from Pratik Datta on 2011-01-18 (public-xmlsec@w3.org from January 2011)

From: Pratik Datta <pratik.datta@oracle.com>
Date: Tue, 18 Jan 2011 08:42:17 -0800 (PST)
To: public-xmlsec@w3.org
Message-ID: <e2e856da-82ee-4f0e-8353-9d2a36080342@default>

Made the following changes

. ACTION-759 : Added section "1.4.4 Portability"

It should be possible to canonicalize a subdocument in such a way, that the signature doesn't break when the subdocument moved into a completely different XML document. This is also the goal of Exclusive canonicalization [[XML-EXC-C14N]], which mostly satisfies this requirement except of case of namespace prefixes embedded in content. This specification builds on exclusive canonicalization, and solves the problem of namespaces in content.

. Removed the phrase " - it also introduces a minimal canonicalization mode." From section "1.4.5 Siimplicity" as it is no longer applicable

. Removed CURIE

. ACTION-763: Review ISSUE-198 and where algorithm should be placed

Note: The algorithm for prefix scanning doesn't cover all kinds of prefix embedding. For example if a text node's value is a space separate list of qnames, this algorithm will not detect the prefixes of these qnames. It will only detect two kinds of embedding, a) when the entire text node or attribute is a qname, and b) when a text node is an XPath expression containing prefixes.

. Also put back the regular expressions in prefix scanning algorithm. Here is the new text

If there is a XPathElement subchild, whose Name and NS attributes match E's localname and namespace respectively, then E is expected to have a single text node child containing a XPath 1.0 expression. Extract the prefixes from this XPath by using the following algorithm. All of these extracted prefixes should be considered as visibly utilized.

* Search for single colons : in the XPath expression, but do not consider single colons inside quoted strings. Double colons are used for axes, e.g. in self::node() , "self:" is not a prefix, but an axis name.

* The prefix will be present just before the single colon. Go backwards from the colon, skip whitespace, and extract the prefix, by collecting charcaters till the first non NCName match. E.g. in /soap : Body, extract the "soap". The NCName production is defined in [XML-NAMES].

This can be evaluated using perl style regular expressions as follows. Note the regular expressions here are provided as an example only, they are not normative.

1. First remove all single quoted and double quoted strings from the XPath, because prefixes cannot be present there. i.e. do substitute of s/"[^"]*"//g and s/'[^']*'//g. Removing the quoted string eliminates false positives in the next step.

2. In the resultant string search for single colons and get the word just before colon, i.e search for match for m/([\w-_.]+)?\s*:(?!:)/ Note prefixes follow the NCName production, i.e. consists of alphanumeric or hyphen or underscore or dot, but cannot start with digit, hyphen or dot. In an NCName, the allowed alphanumeric characters are not just Ascii, but any Unicode alphanumeric characters. However the regular expression provided here is a very simplified form of NCName production.

http://www.w3.org/2008/xmlsec/Drafts/c14n-20/ Jan 18th version

Received on Tuesday, 18 January 2011 16:43:35 UTC