RE: New draft of Canonical XML 2.0 from Pratik Datta on 2010-02-25 (public-xmlsec@w3.org from February 2010)

From: Pratik Datta <PRATIK.DATTA@oracle.com>
Date: Thu, 25 Feb 2010 14:11:08 -0800 (PST)
To: XMLSec WG Public List <public-xmlsec@w3.org>
Cc: pratik.datta@oracle.com
Message-ID: <5037569f-030b-43eb-a170-aff07eee540e@default>
High level changes between C14N 1.1 and C14N 2.0   (ACTION-520)

 

.         C14N 2.0  includes both inclusive and exclusive canonicalization.   section 2.4 contains whatever I coped over from the exc C14N spec.  

.         C14N 2.0 is parameterized - rather than having a separate URI for each variation , there is only one algorithm URI and variations are represented as parameters.

.         Unlike C14N 1.x,  the input to C14N 2.0 is not an XPath nodeset, rather it is a list of included subtrees, and list of excluded subtrees or attributes. This is exactly the model used in Signature 2.0.

.         There are some differences is now xml: attributes are handled in C14N 1.0, C14N 1.1, and exc C14N.  C14N 2.0 has parameters to emulate either of these modes.   The "exclusiveMode" parameter controls only whether namespace are treated exclusively, it does not affect the xml: attributes handling.

.         C14N 2.0's processing model  is based on a tree walk, rather than extremely inefficient nodeset model in C14N 1.x.  Most practical implementations of C14N 1.x already do the tree walk, because that is really the most efficient way to do canonicalization.  This processing model is along the lines of the non normative processing model mentioned in Section 3.1 of exclusive canonicalization (the only major difference is that it Exc C14n algorithm walks the whole tree, whereas the C14N 2.0 algorithm walks only the nodes that are to be canonicalized).

.         C14N 2.0  has some extra features

o   Removal of extra whitespace. As we all know C14N 1.1 considers all whitespace inside element content as significant. This is source of a lot of confusion.  Unfortunately it is very difficult to determine if whitespace is really significant or not without knowing the schema. But we settled with the simple option of  "trimming", i.e. removing leading and trailing whitespace from all text nodes.  Whether to do this or not is controlled by a parameter.

o   QNames in content:  Exclusive canonicalization needed to determine if a namespace declaration is actually used. This is difficult if there is a QName in content.  C14N 2.0 is enhanced to look for QNames in content.  A) It looks inside xsi:type attribute, which is 90% usecase of QNames in content. B) It looks at the IncludedXPath  and ExcludedXPath, to prevent the wrapping attack that Meiko mentioned. C) Optionally it can look at the other elements/attributes too.  However there is no reliable way to search content for a qname. I am suggesting that we just do a simple regular expression search for a  alphabetic characters followed by a colon.

o   Namespace prefix rewriting:  Namespace prefix rewriting was considered pre C14 1.0 - see (http://www.w3.org/TR/2000/WD-xml-c14n-20000119.html#sec-namespaces) but it was removed because of QNames in content. I have put it back in C14N 2.0.  This scheme uses sequential prefixes -n0, n1, n2, . I also put in an alternative scheme -from Ed's presentation in the 2007 workshop, where the prefix is calculated from the digest of the URI.

 

 

Pratik

 

From: Pratik Datta 
Sent: Sunday, February 14, 2010 9:36 AM
To: XMLSec WG Public List
Subject: New draft of Canonical XML 2.0

 

Here is the new version of Canonical XML 2.0

http://www.w3.org/2008/xmlsec/Drafts/c14n-20/

 

Changes

.         Converted to respec format

.         All the pseudocode has been made non-normative and moved to section 4.

.         Section 2.3, 2.5 and 2.6 has the processing model described in non pseudocode, 

.         Section 2.4 has been copied over from exclusive canonicalization, it describes why we need exclusive canonicalization. There is a slight change in this section, with respect to xml attribute inheritance. 

.         Section 3 is a new section , which describes the schema for the canonicalization parameters.

 

Pratik
Received on Thursday, 25 February 2010 22:12:22 UTC