Konrad notes on Canonicalization

Canonicalization  notes entered by Konrad into chat during 14 January  
F2F, related to ACTION-175

http://preview.tinyurl.com/C14n-Intro

Some general stuff about C14n:

2.5 Canonicalization
Canonicalizing XML is hard! Tim Bray
To be able to digest XML we need a binary representation or
serialization, because only a series of bytes  (aka. octets) can be
signed. Certain aspects of XMLs serial representation are left open
and a canonical  and reproducible representation is hence required.

The goal of canonicalization is to remove any information, that is
considered certainly insignificant and to define an unambiguous
representation for aspects that can be represented in various
ways. Such negibilities  range from character encoding, line breaks,
order of attributes, whitespace in tags and between  attributes,
unutilized namespaces to value normalizations based on a DTD or
Schema.  Higher forms of canonicalization include the more primitive
ones.

The following forms of XML canonicalization currently can be found in
standards, drafts and other  sources. They are presented here by their
level of sophistication and ordered from simple to complex:

* Minimal Canonicalization (MC14n)
* Canonical XML Version 1.0 (C14n)
*  Canonical XML Version 1.1 (C14n11) fixing issues analyzed by us
*  and the XMLCORE   working group (WG).

* Exclusive XML Canonicalization Version 1.0 (Exc-C14n)
* Schema Centric XML Canonicalization Version 1.0 (ScC14n)

http://tinyurl.com/Why-C14n-is-inefficient :

Namespace Nodes - A namespace node N is ignored if the nearest
ancestor element of the nodes parent element [O] that is in the
node-set and has a namespace node in the node-set with the same local
name and value as N. Otherwise, process the namespace [. . . ]

replacing this text with :

Namespace Nodes - To process a namespace node [N] by find the first
output ancestor element [A] of the nodes owning element [O] in reverse
document order having an output namespace node [Na] with the same
local name as [N] (declaring the same prefix) and [A] and [Na] are in
the node-set. If [N] and [Na] have the same value [N] is ignored
otherwise, process the namespace [. . . ]

simple spec changes to c14n would help w/ namespace handling
(ns handling is the big problem)

consider adding some constraints on how nodes are connected in the
input to C14N, that could help simplify things too there are always
some types of nodesets that require that you keep all the namespace
prefixes. Can't just use a simple stack model b/c of these edge cases
this spec change targets the problems w/ canonicalizing namespace
nodes

https://online.tu-graz.ac.at/tug_online/voe_main2.getVollText?pDocumentNr=90836 
#page=60

suggests that maybe there could be a C14N v1.2 that is smarter w/
handling namespace nodes

Exc-C14n suffers not inheriting xml:base, xml:space, and other
inheritable attributes. Exc-C14n however is good at processing
namespace nodes

C14n is bad at processing namespace processing

klanz2: whitespace handling should be dropped in the general case.
try to establish some principals on how information should be dropped
when doing C14N

https://online.tu-graz.ac.at/tug_online/voe_main2.getVollText?pDocumentNr=90836 
#page=101

Be liberal in what you require but conservative in what you do

Translated to XMLDSIG this means: Refer only to what is necessary, and
canonicalize as much as possible by default!

Saying something is application dependant or expensive is a mere
excuse of engineers not trying hard to figure out to make it robust
and efficient. Principles for designers of user agents such as
browsers or XMLDSIG applications have to be proxy for their end
users. OASIS-DSS allows them to do this centrally in office
environments, but such should apply for decentralized application
developers as well:

* Signer, should be conservative in what they consider as being the
Information they want to have  secured.
* Intermediaries, are invited to process signatures with whatever
tools they find appropriate. Be conservative in what you have to touch
for processing, especially do not touch signed documents and use
opaque containers (subsection 3.2.3 on page 57). If yet available
<xml> ... </xml> (subsection 4.1.1 on page 79).

*Intermediaries and verifiers, do not touch what was meant to be
signed, and hence has been signed or the signature breaks.

* Verifiers, only what is signed (i.e. DigestInput) should be shown as
signed or processed as signed.

Balancing the trade-off between robustness, efficiency and simplicity
can not mean only to resign and hide behind a Do not touch signed
documents at all principle. This will hinder the spreading, processing
and passing on of signed content, yes signed information entities that
can be trusted, across the Internet.

@best practices: It is good practice to use Exc-C14n only for
connected node-sets and declare all used prefixes in the Best Practices

InclusiveNamespacePrefixList.
In general it is good practice to use Exc-C14n whenever possible,
especially if applications use namespace prefixes only to qualify
elements and attributes whose owning element is also in the document
subset. Despite the fact that document sub-sets (node-sets) containing
attributes and not their owning elements have a questionable semantic
and hence should be avoided, they are nonetheless allowed in XPath and
accepted by Exc-C14n. Such node-sets are however not suitable for
Exc-C14n with respect to the definition of visibly utilized namespace
declarations. Adding #default will assure the correct interpretation
of QNames without prefix.

from
https://online.tu-graz.ac.at/tug_online/voe_main2.getVollText?pDocumentNr=90836 
#page=60




regards, Frederick

Frederick Hirsch
Nokia

Received on Monday, 2 February 2009 22:46:55 UTC