Interop meeting report from Thomas Roessler on 2007-10-04 (www-xml-canonicalization-comments@w3.org from October 2007)

From: Thomas Roessler <tlr@w3.org>
Date: Thu, 4 Oct 2007 05:50:24 +0200
To: www-xml-canonicalization-comments@w3.org
Message-ID: <20071004035024.GZ27442@raktajino.does-not-exist.org>
The XML Security Specifications Maintenance Working Group held an
interoperability testing meeting for the XML Digital Signatures and
Canonical XML 1.1 specifications in Mountain View, California, on 27
September 2007. The meeting was hosted by VeriSign.

The participating implementors were IBM, Oracle, UPC, Sun, IAIK.

A full interoperability report is not available at this time.


The following three issues with the Canonical XML 1.1 specification
were identified.



1. The change back to language from C14N 1.0 that is suggested in
[1] should be applied, as it matches implementation behavior.



2. The fix-up for the xml:base attribute that is specified in
section 2.4 [2] was not implemented interoperably.
  
A single implementation was found to have implemented the
specification's normative text correctly.  Four implementations were
found to be consistent with the example in section 3.8 [3]. The
example in section 3.8 was found to be inconsistent with the
normative text.

After discussion, there was consensus that the normative text is
correct (but in need of clarification), and that the example
provided in the specification is indeed incorrect.  

The issue at hand can best be seen by considering a slight variant
of the example in section 3.8.  Instead of using the following input
document:

| <!DOCTYPE doc [
| <!ATTLIST e2 xml:space (default|preserve) 'preserve'>
| <!ATTLIST e3 id ID #IMPLIED>
| ]>
| <doc xmlns="http://www.ietf.org" xmlns:w3c="http://www.w3.org"
| xml:base="http://www.example.com/something/else">
|    <e1>
|       <e2 xmlns="" xml:id="abc" xml:base="../bar/">
|          <e3 id="E3" xml:base="foo"/>
|       </e2>
|    </e1>
| </doc>

... consider this:

| <!DOCTYPE doc [
| <!ATTLIST e2 xml:space (default|preserve) 'preserve'>
| <!ATTLIST e3 id ID #IMPLIED>
| ]>
| <doc xmlns="http://www.ietf.org" xmlns:w3c="http://www.w3.org"
| xml:base="something/else">
|    <e1>
|       <e2 xmlns="" xml:id="abc" xml:base="bar/">
|          <e3 id="E3" xml:base="foo"/>
|       </e2>
|    </e1>
| </doc>

It is the participants' reading of the normative language that,
since e1 is preserved in the document subset, the fix-up for e3 will
only take e2 into account, but not e1 or doc. Canonicalization
consistent with this reading of the specification text will lead to
the following output (line breaks for convenience):

| <e1 xmlns="http://www.ietf.org" xmlns:w3c="http://www.w3.org"
| xml:base="something/else"><e3 xmlns=""
| id="E3" xml:base="bar/foo"
| xml:space="preserve"></e3></e1>

Canonicalization consistent with the current material in example
3.8, however, will lead to this output:

| <e1 xmlns="http://www.ietf.org" xmlns:w3c="http://www.w3.org"
| xml:base="something/else"><e3 xmlns=""
| id="E3" xml:base="something/bar/foo"
| xml:space="preserve"></e3></e1>

When base URI resolution is performed on this output, the string
"something" would be duplicated in e3's base URI.  That is not
consistent with e3's base URI in the input document.

In the normative specification text, the key phrase is this one:

	Let E be an element in the node set whose ancestor axis
	contains successive elements E_n ... E_1 (in reverse
	document order) that are omitted and E=E_n+1 is included.
	...

The crucial word for a correct reading of this language is
"successive"; in the example given, it causes the sequence E_n ...
E_1 of omitted elements for E = e3 to consist the single element e2.

The experience gathered suggests that this aspect needs to be called
out much more prominently and clearly.


Additionally, the introductory paragraph ("The xml:base
attribute...") was found to be confusing, since it can be misread as
a (redundant) description of where the "join URI" function is to be
applied. We recommend shortening this paragraph to a simple
statement to introduce the "join URI" function.  We also recommend
renaming the "join URI" function into a "join URI references"
function, as that is what it does.

Further, that paragraph, the bullet list, and the subsequent
paragraph cause confusion by talking about "base URIs": The objects
of the canonicalization process are the string values of the various
xml:base attributes (which might be relative URI references).  They
are *not* the base URI properties of the element nodes in question
(which are always absolute URIs, and can depend upon the document's
context). We recommend clarifying this point and the terminology
used.

In the paragraph that starts with the words "Given this 'join URI'
function...", the following phrase causes further confusion:

	The element nodes along E's ancestor axis are now examined
	for all occurences of xml:base, that have been omitted.

This might be read to suggest that the fix-up might also be
applicable if the document subset includes an ancestor element F,
but lacks an xml:base attribute that was present on F's attribute
axis in the input document.  We recommend clarifying that this
phrase only deals with the removal of element nodes.

We further recommend including a general remark to note that the
various fix-up steps must be performed IF AND ONLY IF relevant
*element* nodes are removed, and that fix-up MUST NOT occur if an
element node is preserved in the document sub-set, but loses a
relevant attribute node.



3. Appendix A was found to be complex to the point of being
unimplementable.

While all participants were able to implement some algorithm with
the desired effect, that implementation was typically based on
analysis of test cases and reading of the overall specification, as
opposed to being a faithful implementation of the text in Appendix
A.  A characteristic remark by one implementer (which resonated with
the rest of the group) was that it was "easier to produce the
desired code than to attempt understanding Appendix A."

We recommend to rewrite Appendix A in a clear and simple fashion.
Where the (commendable!) aim of staying close to RFC 3986's language
gets into the way of clarity or simplicity, the latter should be
given priority.



1. http://lists.w3.org/Archives/Public/public-xml-core-wg/2007Aug/0018.html
2. http://www.w3.org/TR/xml-c14n11/#DocSubsets
3. http://www.w3.org/TR/xml-c14n11/#Example-DocSubsetsXMLAttrs
Received on Thursday, 4 October 2007 03:50:35 UTC