streaming canonicalization proposal

Regarding my ACTION-562, I checked all the parameters of C14N20
regarding their streamability. Here are my findings and suggestions:

#1
Regarding "ignoreDTD" and "expandEntities" a streaming parser runs into
trouble if an entity or default attribute is used prior to its
declaration in document order. Hence, if "ignoreDTD" is set to false, or
if "expandEntities" is set to true, there are cases that make streaming
canonicalization very complex.
Hence, for the streaming canonicalization scheme I'd recommend to set
the defaults to:
**ignoreDTD="true";expandEntities="false"**
and suggest an additional sentence of "Entities and internal DTD
declarations SHOULD NOT be used if the streaming canonicalization
parameter set is applied."

#2
The "trimTextNodes" parameter is a little tricky. If set to "true", it
requires some content-generated events ("characters()" for SAX) to
vanish completely, others to be manipulated in order to remove leading
and trailing whitespaces. There is a threat of getting it wrong here,
for instance one may be mislead to remove all "characters()" events that
contain only whitespace characters (which is wrong if it was enclosed by
two other "characters()" events with non-whitespaces in it).
Another issue with implementing "trimTextNodes" is that SAX parsers
sometimes create "characters()" events, sometimes
"ignorableWhitespaces()", depending on whether an XML Schema is given or
not.
Hence, for the streaming canonicalization scheme I'd recommend to set
the defaults to:
**trimTextNodes="false"**

#3
Regarding "xml*Ancestors" I'd say that we should try to prevent any
requirement for caching as far as possible.
Hence, for the streaming canonicalization scheme I'd recommend to set
the defaults to:
**xml*Ancestors="none"**
and suggest an additional sentence of "An application SHOULD NOT use any
of the xml*Ancestors attributes when applying the streaming
canonicalization parameter set."

#4
The "sortAttributes" parameter actually made me ponder. On the one hand,
sorting attributes requires parsing, caching, and applying a sort
algorithm on all attributes of an element. Hence, this is contrary to
the idea of streaming (i.e. processing every token immediately once it
is parsed completely). However, the SAX paersers I know about
automatically do this, whether you want it or not. Frankly, this bears a
minor vulnerability for Denial-of-Service attacks (huge or infinite
number of attributes may DoS the parser), but since it became the
de-facto standard and since I would not assume that all SAX parsers
would return the set of attributes in document order I'd suggest to keep
the default of:
**sortAttributes="true"**

#5
Regarding all other parameters I didn't see a major argument for or
against any of the given options. Personally, I'd suggest to set
"exclusiveMode" to true, as this is an important requirement for the Web
Services use case (which I see as a major domain of application of the
streaming canonicalization parameter set). However, I do not insist on
this, since inclusiveCanonicalization can be implemented in a streaming
way as well.

regards

Meiko

Received on Friday, 30 April 2010 10:09:29 UTC