Action-539: review C14N2.0 from Meiko Jensen on 2010-04-19 (public-xmlsec@w3.org from April 2010)

From: Meiko Jensen <Meiko.Jensen@ruhr-uni-bochum.de>
Date: 19 Apr 2010 18:51:06 +0200
To: "XMLSec WG Public List" <public-xmlsec@w3.org>
Message-ID: <4BCC89FA.5030403@ruhr-uni-bochum.de>
Hi,

completing my Action-539 here are my findings (I apologize for not yet
having compared these to the remarks given by Scott and Ed).

Sec 2.2, "serialization": is it useful to restrict that value to two
predefined Strings ("XML", "EXI")? I'd thuink there are other
serializations out there (binaryXML, compressed or the like), maybe use
an URI identificator instead? Just for being able to extend this in
future versions...

Sec 2.2, "expandEntities": s/predefined entites/predefined entities/

Sec 2.2, "expandEntities": s/2 GB is length/2 GB in length/

Sec 2.2 (end): Are these three named parameter sets going to be the only
ones? How are they referred to? Are they just listing the deviations
from the defaults or must they be completed? However, I'd add one
profile for "streaming-based canonicalization", setting all options to
best support for these.

2.3 "Canonicalize each subtree" says "...to visit all the child
nodes...". Do you mean descendant nodes here?

2.3 "Element nodes": s/bracket >/bracket (>)/

2.3 "Attribute nodes": s/xsiTypeAware="true./xsiTypeAware="true"./

2.3 "Namespace Nodes": found this confusing, as the explanation is given
at the end of 2.3 at first. Might better be placed in here...

2.3 "Text Nodes": does the "trim" also apply to ignorable whitespaces
(e.g. in between "<A>   <B/>")? If yes, the canonicalized XML will be a
whole document in a single line (besides newlines in between
non-whitespace characters). No objections against this, but I see a
source of misunderstanding here. Maybe we should make this explicit.

Besides: what is the use for trim in the other case? Is it really needed
for "<A>  this is text  </A>"? Do we have to differentiate two distinct
types here, one trimming *all* nodes, one only removing pure-whitespace
text nodes?

2.3 "Text Nodes": s/xml:space=preserve/xml:space="preserve"/
s/declaration is in context/declaration in context/
s/into one/into one./

2.4.1 s/canoicalization/canonicalization/
s/which would break/would break/

2.4.2 2nd example: what is the "xml:lang="fr"" good for? the example is
complex enough, I'd vote for removing as much as possible here...

Is there a rationale behind the use of empty lines in the examples? They
tend to move...

2.5 s/whose locaName is/whose localName is/
s/begin declared/being declared/

2.5 "default namespace": does this imply the canonicalized XML to
contain sth. like ## xmlns:="http://default" ## ?? That colon would
irritate me!

2.5 "visibly utilized": s/its means it is/it means it is"/ or sth. like
that. Regarding the TBD item here: if used correctly this would fend the
namespace injection issue. However, it requires very careful
configuration on signature creation, and the responsible configurator
must be aware of the issue. This is not something we could address with
reasonable default values. For all types of XPath use we know about
(i.e. IncludedXPath and ExcludedXPath from XMLDSIG2.0) I'd strongly
recommend requiring this explicitly.

2.5 algorithm for namespace output: " an element between E_i and E": do
you mean E_j here? And shouldn't E itself be checked as well? I'm also
confused about the algorithm list. Maybe end the description with sth.
like "4. return the list of namespace declarations left on the list",
just to make clear that the following passages (on prefixRewrite) do not
belong to the algorithm. I found that one confusing.

2.5 prefix rewriting: we've been discussing this already, I'd just like
to add that besides PFC14N I'd prefer the digest method, though it poses
a strong requirement on using exactly SHA1, then do all this nasty
Base64+X stuff. I don't want to see this configurable though. I also
once came across sth. like "{http://some.namespace.uri}localName", but
this again results in non-wellformed XML.

2.6 s/all of E ancestors/all of E's ancestors/
s/amd so on/and so on/

2.7 s/combines it/combines them/ (didn't go into it though, since it
extends the external RFC document. Looks rather complicated though,
source of confusion for implementers)

Section 3: schema: s/name="trimTextNodes "/name="trimTextNodes"/
s#<xs:enumeration value=""/>##

Section 4: s/psuedo/pseudo/

4.2: s/addXmlattribs(ancestorElem,
xmlattributeContext)/addXMLAttributes(ancestorElem, xmlattributeContext)/

didn't investigate most of Section 4 though.

Section 6: Qnames in content. Searching all text nodes for potential use
of prefixes is a horribly bad idea. Besides the performance overhead
you'll get weird matches, resulting in different namespace declarations
being covered within structurally identical XML documents. Major source
of confusion and unexplainable signature invalidations.

6: significant whitespace: I'll discuss this in a separate mail, seems
to me like sth. more complex.

A.Remove Dot segments: this is about the algorithm of Section 2.7, not
2.4. Right? Do we really need this in such volume/complexity here?

best regards

Meiko







-- 
Dipl.-Inf. Meiko Jensen
Chair for Network and Data Security 
Horst Görtz Institute for IT-Security 
Ruhr University Bochum, Germany
_____________________________
Universitätsstr. 150, Geb. IC 4/150
D-44780 Bochum, Germany
Phone: +49 (0) 234 / 32-26796
Telefax: +49 (0) 234 / 32-14347
http:// www.nds.rub.de
Received on Monday, 19 April 2010 16:51:38 UTC