ACTION-519: review c14n 2.0 draft from Scott Cantor on 2010-04-09 (public-xmlsec@w3.org from April 2010)

From: Scott Cantor <cantor.2@osu.edu>
Date: Fri, 9 Apr 2010 14:04:49 -0400
To: <public-xmlsec@w3.org>
Message-ID: <005b01cad80f$25215150$6f63f3f0$@2@osu.edu>
Comments on the March 4th draft:
http://www.w3.org/TR/2010/WD-xml-c14n2-20100304/

Mostly grammar and wordsmithing. I would be willing to just do a pass for a
lot of them.

Abstract:

Suggest rephrasing "incorporates an update to Exclusive..." to explain that
it's now a single algorithm for both. Perhaps reword first sentence to say
that it's a major rewrite of both Canonical XML 1.1 and Exclusive Canonical
XML 1.0.

Sec 1.3:

Is the last sentence still operational? It suggests XML-INFOSET is "under
development" but the reference seems to be to a W3C Rec.

Sec 1.4:

Suggest s/most/many, just to avoid over-promising.

Sec 1.4.1:

s/nodeset based/nodeset-based

s/cannot be solved/cannot be addressed
s/edge use cases/edge cases
s/input of the c14n alg/input to the c14n alg

s/Nodeset/nodeset
s/spec/specification
s/it only visits/only visits

Sec 1.4.2:

Reword:
A streaming implementation is required to be able to process very large
documents without holding them all in memory; it should be able to process
documents one chunk at a time.

Sec 1.4.3:

s/breakages/breakage

s/Remove leading/Optionally remove leading
s/content especially/content, particularly
s/Rewrite/Optionally rewrite

Sec 1.4.4:

s/depend a/depend on a
s/makes it very hard/increases the work required for
s/also it/it also

Sec 2.1:

The text says what the DOM Model XML subset is, but not the streaming case.
I'm actually somewhat unclear on what they formally are in any case. Is En
intended to connote an actual DOM node in a DOM tree? If so, I would clarify
that. If this isn't DOM, what is the equivalent for SAX? Is it essentially
an XPath that you have to dynamically evaluate as you go?
 
s/In a DOM model/In the DOM model
s/expressed as/expressed as:

s/If out of this list/(If out of this list
s/desclaration/declarations

Regarding the sentence on computing the XML subset, does this wording risk
implying that implementations need to actually pre-comnpute the subset?
Isn't that at odds with the goal of performing a simple tree walk? Should it
be reworded in more explanatory terms as "the XML subset consists of..."?

s/allow a high/allow for a high
s/allowing the essential/supporting the most essential
s/Specifically/Specifically:

s/purposely does not/does not
s/re-inclusion, i.e./re-inclusion; i.e.,
s/Think of it as a/It is effectively a
s/Reinclusion/Re-inclusion
s/attributes inheritance/attribute inheritance

s/
Exclusion is very limited, only complete subtrees and attribute nodes can be
excluded, other kinds of nodes like text nodes, comment nodes, PI nodes
cannot be excluded.
/
Exclusion is limited to complete subtrees and attribute nodes. Other kinds
of nodes (text, comment, PI) cannot be excluded.

s/
Even attribute exclusion is limited, namespace declaration and attributes in
XML namespace cannot be excluded.
/
Attribute exclusion is also limited, such that namespace declarations and
attributes from the xml namespace cannot be excluded.

s/1.x mode but not in this new model/1.x, but not in this version

Section 2.2:

s/
Instead of separate algorithms for each variant of canonicalization, this
specification goes with the approach of a single algorithm, which does
slightly different things depending on the parameters.
/
Instead of separate algorithms for each variant of canonicalization, this
specification takes the approach of a single algorithm subject to a variety
of parameters that change its behavior to address specific use cases.

Insert a sentence before the table:

The following is a list of the logical parameters supported by this
algorithm. The actual serialization that expresses the parameters in use may
be defined as appropriate to specific applications of this specification
(e.g., the <ds:CanonicalizationMethod> element in [XMLDSIG-CORE2]).

In trimTextNodes description, s/nodes descendants/node descendants

In serialization description, should "signed" be "canonicalized"?

Would it be clean enough (and simpler) to collapse the the xmlXAncestors
parameters into a single parameter and just apply "combine" to only
xml:base? Is there a need to use different rules for different attributes?
Seems like the various "modes" sort of go together given how the earlier
algorithms work.

Regarding xsiTypeAware, I would still like to see this expanded to something
at least a little more generic and just allow a list of qualified node names
to treat as QName-valued. Or perhaps leave xsiTypeAware and just add a
separate parameter for this, if it's important for conformance to make this
one MTI but not the other. Speaking for myself, I don't know that I would
want to implement prefix rewriting, but I really could use the ability to
handle QNames in other places.

s/The defaults are set to result in canonical 1.1 with no comments/The
defaults are chosen for equivalence to Canonical XML 1.1 with comments
ignored.

I assume the "named parameter sets" part is TBD, and we need to decide what
the sets are and what the MTI options are. Do we have somebody willing to
make a proposal on that? I guess I would be willing to define something I
could see using in profiles I'm involved with.

Section 2.3:

s/conisting/consisting
s/exlusion/Exclusion

Forgive my ignorance (I haven't ever implemented c14n), maybe I'm
overlooking the obvious...but is it necessary or even desirable to sort the
inclusion list or detect children of other nodes up front? Can't that be
derived on the fly to avoid more than one tree walk? e.g. do a traversal and
switch "on" when an element is a hit in the hash list, pull out descendents
in the list as you find them, etc.

I know we want to be abstract about implementation, but at the same time we
may be getting back into the problem of naïve implementations.

s/
While traversing if the current node is an element, and that element is in
the exclusion list
/
While traversing, if the current node is an element and that element is in
the exclusion list

s/Element nodes/Element Nodes
Under Element Nodes, s/should have written/will be written

re: Namespace Nodes, is it really true that no additional processing is
involved? I think we need more text here referencing the processing that
determines whether anything gets output. I think you just mean that *if*
it's output, it's done in the same way as attribute nodes.

Under Text Nodes, s/declaration is in context/declaration in context, 
s/
In that case be careful when trimming the leading and trailing space - the
net result should be same as if it the adjacent text nodes were concatenated
into one
/
Be aware when trimming whitespace in such cases; the net result should be
equivalent to doing so as if the adjacent text nodes were concatenated.

At the end of the section, s/xml models like DOM/XML models such as DOM

Sec 2.4:

I would consider moving this section up into section 1. It seems like
motivating material for the overall package of features, and could even be
supplemented by additional sections that motivate some of the other options
if we're so inclined.

Sec 2.5:

Per my earlier comment, I think we need a reference to this section in the
main processing rules to provide context.

s/special node/special node type or indication
s/Attribute/attribute

As a general comment, I'm not sure it's helpful to distinguish
Explicit/implicit here, but if we did, I think the key point is not that
some DOM serializers will add them for you but that the DOM itself will not
include them when the node is created. I think you're trying to say that
implementations need to account for this, but if that's the case, we
probably would need to reference the distinction somewhere in the processing
rules, and I don't see that now. Maybe you just need to add language
referring to "both explicit and implicit" in some of the later text.

Under Visibly utilized, clarify that the bullets are OR conditions, maybe
just say "if any of the following hold:"

In step 2, s/any of the namespace declaration/any of the namespace
declarations
s/E j/Ej
Also, I think Ei in that last bit should be Ej?

s/If the prefixRewrite is specified/If the prefixRewrite option is set to
other than "none"

In the sequential text, is sorting the URIs well-defined? Do we need a
formal reference on that? Cue rathole...3,2,1

Silly question...do we really need the complexity of digest-based rewriting?
If we do, is there a simpler way? Maybe just hex-encode the digest octets?
Yes, it's a bit longer, but it's also faster and easier...

Didn't exactly follow the second note about exclusive c14n and the
rewriting. That doesn't seem likely given that exclusive doesn't change the
fact that you only output it once for a given subtree...what's the case
being worried about here?

Section 2.6:

s/consist of/consists of the following steps:

s/If E is an apex node examine/If E is an apex node, then examine
s/not already there/not already present
s/temporily/temporarily
s/parametes/parameters
s/inherit/"inherit"

Should we reword "all element nodes along E's ancestors" to something like
"all ancestor element nodes of E"?

s/combining then two/then combine them two

Add forward reference to the join function in sec 2.7.

s/the join/then join

s/
Sort all the attribute lexicographically (increasing)
/
Sort all the attributes in increasing lexographic order,

Replace informal "if prefixes are rewritten" with a reference to the option
being other than "none".

Section 3:

Reword:
Exclusive Canonicalization may be used as a canonicalization algorithm in
XML Digital Signature [XMLDSIG-CORE2], via the
<ds:CanonicalizationAlgorithm> element.

Identifier:
...

Canonical XML 2.0 supports a set of parameters, as enumerated in Section
2.2. All parameters are optional and have default values. When used in
conjunction with the <ds:CanonicalizationMethod> element, each parameter is
expressed with a decicated child element. They can be present in any order.
A schema definition for each parameter follows:

In the schema, I believe NMTOKENS is the wrong type for the prefix list.
That was the error made in the old spec and had to be fixed in errata,
because #default isn't a legal NMTOKEN. The type should be a list of
strings.

Section 4:

I didn't review this heavily yet.

In section 4.1, the sort process is somewhat unclear to me. It seems like it
would take a full tree walk, and since I can't think how the inputs in the
DOM case could be other than logical pointers to actual DOM nodes, I don't
see why would need to sort them ahead of time. SAX is different, but the
sorting is clearly implicit there, right?

Section B.1

Is C14N 1.x a normative reference? Probably informative, no? Same for the
XPath Filter?

Section B.2?

Are URI and XMLBASE normative?
Received on Friday, 9 April 2010 18:05:14 UTC