Re: C14N and namespace undeclaring

Hi Richard,
Dear all,

sorry to be lengthy please see below,
for folks short of time - see the summary at the end.

<?xml version="1.1"?>
<!--
The following example is to demonstrate the issues that arise when
using C14n (XPath) and XML 1.1
=======================================================================
-->
<a xmlns="http://www.example.org/default">
  <pre1:b xmlns:pre1="http://www.example.org/ns1" 
xmlns:pre2="http://www.example.org/ns1" xmlns="">
    <c xmlns:pre2="" xmlns="http://www.example.org/default">
      <d xmlns="">
        <e URI="#xpointer(//pre2:f)" 
URI2="#xpointer(xmlns(d=http://www.example.org/default) //e | //d:a)">
          <pre1:f xmlns:pre1="http://www.example.org/ns1"/>
        </e>
      </d>
    </c>
    <c xmlns:pre2="">
      <d>
        <e1 URI="#xpointer(//pre2:f)" 
URI2="#xpointer(xmlns(d=http://www.example.org/default) //e1 | //d:a)">
          <pre1:f xmlns:pre1="http://www.example.org/ns1"/>
        </e1>
      </d>
    </c>
  </pre1:b>
</a>
<!--
In this example the attributes URI and URI2 have been added to
demonstrate the impact of different namespaces in the canonicalized
output.

In the example given above one can see that the URI attributes
are actually not resolving any nodes as they contain invalid xpointers
not having defined pre2. The URI2 attribute in e however refers to e
and a and the one in e1 to e1 and a.
(the XPointers are not percent escaped for the sake of readability
cf. http://www.w3.org/TR/xptr-framework/#escaping)

* Now let's canonicalize the nodeset identified by the following XPath
omitting the second undeclaration of pre2 (assuming  xmlns:prefix=""
namespace undeclarations behave just like normal redeclarations
producing XPath nodes that simply "overwrite" a prefix with a
"non-namespace"  - "Assumption 1") :
=======================================================================

pre1=http://www.example.org/ns1
d=http://www.example.org/default

//d:a |
//d:a/child::text() |
//pre1:b |
//pre1:b/child::text() |
//pre1:b/namespace::*[name() = 'pre1' or name() = 'pre2'] |
//e/descendant-or-self::* |
//e/descendant-or-self::text() |
//e/descendant-or-self::*/@* |
//e/descendant-or-self::*/namespace::*[name()='pre1' or name()='pre2']|
//e1/descendant-or-self::* |
//e1/descendant-or-self::text() |
//e1/descendant-or-self::*/@* |
//e1/descendant-or-self::*/namespace::*[name() = 'pre1']

So the XPath 1.0 node set not showing text nodes and would look as
follows (under "Assumption 1") :

a - xmlns="http://www.example.org/default"
|
+ - pre1:b - xmlns="" - ( missing )
      |    - xmlns:pre1="http://www.example.org/ns1"
      |    - xmlns:pre2="http://www.example.org/ns1"
      |
      + - e - URI="#xpointer(//pre1:e)"
      |   | - xmlns="" - ( missing )
      |   | - xmlns:pre1="http://www.example.org/ns1"
      |   | - xmlns:pre2="" - ( or missing )
      |   |
      |   + - pre1:f - xmlns:pre1="http://www.example.org/ns1"
      |              - xmlns:pre2="" - ( or missing )
      |
      + - e1 - URI="#xpointer(//pre1:e)"
          | - xmlns:pre1="http://www.example.org/ns1"
          | - (xmlns:pre2 is not in the node set)
          |
          + - pre1:f - xmlns:pre1="http://www.example.org/ns1"
                     - (xmlns:pre2 is not in the node set)

And C14n as currently specified would return:

<a xmlns="http://www.example.org/default" 
xmlns:xml="http://www.w3.org/XML/1998/namespace">
  <pre1:b xmlns="" xmlns:pre1="http://www.example.org/ns1" 
xmlns:pre2="http://www.example.org/ns1">
    <e xmlns:pre2="" URI="#xpointer(//pre2:f)" 
URI2="#xpointer(xmlns(d=http://www.example.org/default) //e | //d:a)">
          <pre1:f></pre1:f>
        </e>
    <e1 URI="#xpointer(//pre2:f)" 
URI2="#xpointer(xmlns(d=http://www.example.org/default) //e1 | //d:a)">
          <pre1:f></pre1:f>
        </e1>
  </pre1:b>
</a>

Note that e has xmlns:pre2="" because this undeclaration is in the
input node set. That e1 lost xmlns:pre2="" could be interpreted as
being intentional (it is not in the input nodeset).

If however undeclarations do not behave like normal redeclarations
then nodes marked with ( or missing ) do not exist at all in the
XPath 1.0 data model and cannot be removed - "Assumption 2".

The information that e carries and discriminates it from e1 under
"Assumption 1" would not be there under "Assumption 2".

C14n would then require additional text specifying how to deal with
such an input nodeset (just like the processing of xmlns="").

For the undeclaration of the default namespace xmlns="" the XPath 1.0
data model decided to have missing nodes
(cf 3rd Example in http://www.w3.org/TR/xml-names/#defaulting)
to represent the undeclaration which is similar to "Assumption 2".
(see also http://www.w3.org/TR/xml-c14n#PropagateDefaultNSDecl)

To summarize
============

Assumption 1 allows the XPath 1.0 data model to remove namespaces from
a nodes's scope and also to remove an undeclaration.
It would require no change to c14n and XPath 1.0, but contradicts the
behavior of how xmlns="" is treated.

For Assumtion 1 speaks that the current text of XPath 1.0 indicates that
many implementations may actually treat undeclarations like this.
http://www.w3.org/TR/xpath#namespace-nodes
  * for every attribute on an ancestor element whose name starts xmlns:
  unless the element itself or a nearer ancestor redeclares the prefix;

If however Assumtion 1 is the choice then maybe xmlns="" should also be
represented by a node in a XPath 1.1 data model.

Assumption 2 allows the XPath 1.0 data model to remove namespaces from
scope of a node, but it does not allow to remove an undeclaration.
It requires to add new text to c14n and XPath 1.0 and it would be aligned
with the way xmlns="" is treated.
-->

Received on Monday, 5 March 2007 15:58:38 UTC