Trouble Ahead: Normative references to 2001 XPointer CR.

While preparing the Last Call Working Draft for the decrypt
transform (per ACTION-53), I noticed a somewhat surprising normative
reference to a 2001 Candidate Recommendation for XPointer, present
in both the decryption transform and xmldsig-core recommendations:

  XML Pointer Language (XPointer) Version 1.0
  W3C Candidate Recommendation 11 September 2001
  http://www.w3.org/TR/2001/CR-xptr-20010911/

Unfortunately, it turns out that this seemingly harmless editorial
error is indeed a substantive issue for both the decryption
transform and dsig-core, that I at least hadn't been aware of
before: That particular Candidate Recommendation actually didn't
make it to Rec.  Instead, there are now four different Technical
Reports on XPointer.

Only three of these are Recommendations:

- XPointer Framework
  W3C Recommendation 25 March 2003
  http://www.w3.org/TR/2003/REC-xptr-framework-20030325/

- XPointer Element Scheme
  W3C Recommendation 25 March 2003
  http://www.w3.org/TR/2003/REC-xptr-element-20030325/

- XPointer xmlns Scheme
  W3C Recommendation 25 March 2003
  http://www.w3.org/TR/2003/REC-xptr-xmlns-20030325/

The fourth is a Working Draft that has not been updated for a while:

- XPointer xpointer() Scheme
  W3C Working Draft 19 December 2002
  http://www.w3.org/TR/2002/WD-xptr-xpointer-20021219/

The XML Linking Working Group that has produced these Technical
Reports is disbanded, and I know of no current Working Group that is
chartered to take the XPointer xpointer() Scheme to Rec.



Impact on Decryption Transform


The normative impact on the decryption transform is localized to a
relatively narrow part of the processing rules, in the description
of the decryptNodeSet function. 

|      *  When dereferencing an exception URI in the context of the
|      original input node set, the implementation MUST behave as if
|      the document node of the input node-set is used to initialize
|      the XPointer evaluation context [XPointer], even if the node
|      is not part of the node-set. Unlike XML Signature
|      [XML-Signature], the exception URI may be evaluated against a
|      different document than the "root node of the XML document
|      containing the URI attribute." If the input is a different
|      document then, as per XPointer [XPointer], use of the here()
|      function is an error.
|      
|      * When dereferencing an exception URI in the context of a
|      replacement node-set, bare name [XPointer] exception URIs are
|      used to locate xenc:EncryptedData elements with matching Id
|      attributes. Implementors MAY attempt to resolve full XPointers
|      into replacement node-sets using appropriate techniques to
|      take into account the location of the replacement node-set in
|      the input document, see References Using Non-barename
|      XPointers (section 3.4.5).
|
|      * If an exception URI fails to dereference any nodes, then the
|      resulting error MUST be ignored; it may be the result of part
|      of the input document being encrypted.

 --  http://www.w3.org/2007/xmlsec/Drafts/xmlenc-decrypt/Overview#sec-xml-processing

Since we are preparing a version of this specification that will
have a new algorithm ID, and since we anticipate trouble getting
this through CR anyway, I'd suggest that changes needed to
accomodate the actual XPointer framework look like they are covered
(a) by our charter's allowance to make conformance-affecting changes
that "address incompatibilities with the evolving XML environment",
and (b) can be made without much negative impact on the world.

Roughly, these changes could consist in dropping most of the
material that deals with the evaluation context (which only applies
to an XPointer of scheme xpointer()), in the first bullet point, and
cutting the entire discussion down to shortname XPointers (aka
barenames); these are defined in the XPointer Framework
Recommendation that we can normatively reference without further
trouble.

Section 3.4.5 gives examples for failure conditions of references
using non-barename XPointers.  This material is informative, and we
could either keep it (but reference the ancient xpointer() working
draft), or we could simply drop it, without much negative impact.

  http://www.w3.org/2007/xmlsec/Drafts/xmlenc-decrypt/Overview#sec-References-Non-Barename




Impact on XML-Signature Syntax and Processing


The main impact on xmldsig-core seems to be in four sections:

- section 4.3.3.2, Reference Processing Model
  http://www.w3.org/2007/xmlsec/Drafts/xmldsig-core/#sec-ReferenceProcessingModel

- section 4.3.3.3, same-document URI References
  http://www.w3.org/2007/xmlsec/Drafts/xmldsig-core/#sec-Same-Document
  
- section 6.6.2, Base64
  http://www.w3.org/2007/xmlsec/Drafts/xmldsig-core/#sec-Base-64

- section 6.6.3, XPath Filtering
  http://www.w3.org/2007/xmlsec/Drafts/xmldsig-core/#sec-XPath

In section 4.3.3.2, the following language is relevant:

| When a fragment is not preceded by a URI in the URI-Reference, XML
| signature applications MUST support the null URI and barename
| XPointer. We RECOMMEND support for the same-document XPointers
| '#xpointer(/)' and '#xpointer(id('ID'))' if the application also
| intends to support any canonicalization that preserves comments.
| (Otherwise URI="#foo" will automatically remove comments before the
| canonicalization can even be invoked.) All other support for
| XPointers is OPTIONAL, especially all support for barename and other
| XPointers in external resources since the application may not have
| control over how the fragment is generated (leading to
| interoperability problems and validation failures).

 --- http://www.w3.org/2007/xmlsec/Drafts/xmldsig-core/#sec-ReferenceProcessingModel

It would seem that the MUST clause can be fixed by changing
"barename" to "shortname", and by referencing the XPointer Framework
Recommendation.

I'm not sure what the most appropriate course of action for the
RECOMMEND language concerning the #xpointer(/) and
#xpointer(id('ID')) approaches is. These are currently RECOMMENDED,
but not defined in a Recommendation.


As an aside, there's some language in 4.3.3.2 that references the
URI spec for the definition of a "same-document URI-Reference."

Unfortunately, the meaning of that has changed between the URI spec
that is referenced in xmldsig-core and the currently valid one (RFC
2396 vs. RFC 3986): Same-document references are now (in RFC 3986)
defined in terms of the base URI; the purely syntactic definition
("URI references with an empty URI", i.e., just a fragment
identifier) from RFC 2396 is no longer there.

However, XML Signature relies on that syntactic definition and
actually replays it in the specification text to a large extent.
I'd therefore propose to import the language from RFC 2396, but
normatively reference RFC 3986.


In section 4.3.3.3, we have the following rules:

| Dereferencing a same-document reference MUST result in an XPath
| node-set suitable for use by Canonical XML [XML-C14N]. Specifically,
| dereferencing a null URI (URI="") MUST result in an XPath node-set
| that includes every non-comment node of the XML document containing
| the URI attribute. In a fragment URI, the characters after the
| number sign ('#') character conform to the XPointer syntax [Xptr].
| When processing an XPointer, the application MUST behave as if the
| root node of the XML document containing the URI attribute were used
| to initialize the XPointer evaluation context. The application MUST
| behave as if the result of XPointer processing were a node-set
| derived from the resultant location-set as follows:
|
|   1. discard point nodes
|   2. replace each range node with all XPath nodes having full or
|      partial content within the range
|   3. replace the root node with its children (if it is in the node-set)
|   4. replace any element node E with E plus all
|      descendants of E (text, comment, PI, element) and all
|      namespace and attribute nodes of E and its descendant elements.
|   5. if the URI is not a full XPointer, then delete all
|      comment nodes
|	      
|   The second to last replacement is necessary because XPointer
|   typically indicates a subtree of an XML document's parse tree
|   using just the element node at the root of the subtree, whereas
|   Canonical XML treats a node-set as a set of nodes in which
|   absence of descendant nodes results in absence of their
|   representative text from the canonical form. 

 -- http://www.w3.org/2007/xmlsec/Drafts/xmldsig-core/#sec-Same-Document

Two points at least are noteworthy:

1. Even though a barename XPointer was, at the time, defined to
return the same location-set as an xpointer(id='barename') style
full XPointer, these two xpointers would yield different results in
the reference processing model of XML Signature.  If a barename is
used, comments are stripped; if a full xpointer [i.e.,
xpointer(id='barename') style] is used, comments are not stripped.  

2. The processing rules are defined in terms of location-sets.  A
location-set is a notion defined generically in the 2001 XPointer
CR, but then moved into the xpointer() scheme for XPointer (which
never made it to Rec).  It is a notion that is *NOT* defined in any
current recommenation.  Also, shortname xpointers (as they are now
known) are no longer defined in terms of the xpointer() scheme, and
do not result in a location-set in any useful way.

It would seem that it is possible (and, indeed, reasonably easy) to
restate most of the conformance requirements in terms of the
XPointer Framework, and the desired behavior for shortname XPointers
in terms of elements and node-set, avoiding any reference to point
nodes, ranges, and location-sets.  References to full xpointers
could then be dropped, except for noticing that *if* a full
("scheme-based" in the diction of the XPointer framework) XPointer
is used, comments are not removed.

There would probably be another change to RECOMMEND certain
element() XPointers (specifically, element(ID) and element(/1),
where the resource against which the XPointer is evaluated is the
document that contains the URI attribute), replacing the current
recommendation for the xpointer() XPointers with equivalent effect.

I'm, however, a bit wary about these changes; they seem to go
somewhat far for a PER.  I'd welcome feed-back from the group, and
will also solicit feedback in the Team.


In sections 6.6.2 and 6.6.3, there is some discussion of elements
identified by barename XPointers; it seems to me that we can simply
change "barename" to "shortname" here and be done with this.

  http://www.w3.org/2007/xmlsec/Drafts/xmldsig-core/#sec-Base-64
  http://www.w3.org/2007/xmlsec/Drafts/xmldsig-core/#sec-XPath



An alternative approach to all this could be to argue that, since
the XML Signature Rec has survived for a long time without fixing
this issue, we'd rather not fix it in this round of changes.  I
can't at this point predict the outcomes of a transition call with
that approach, and will continue to put out feelers about it.

Meanwhile, I'd appreciate feed-back on the proposed approaches.  I'd
also be interested to hear to what extent existing implementations
actually support some notion of a "full xpointer", and if so, what
specification they were implemented against.

PS: Yes, this is quite a mess.

Cheers,
-- 
Thomas Roessler, W3C  <tlr@w3.org>

Received on Thursday, 12 July 2007 18:58:21 UTC