XSLT 2.0: document identity, URI equivalence

XSLT 1.0, in the description of document(), says "Two documents are treated as 
the same document if they are identified by the same URI. [...] One root node 
is treated as the same node as another root node if the two nodes are from the 
same document."

While there's nothing difficult to understand about that, URI schemes 
sometimes mandate additional criteria for the equivalance of URIs (and, 
implicitly, of documents). For example, RFC 1738 says that the 'file' scheme 
considers file://localhost/ and file:/// to be equivalent, regardless of what 
localhost might otherwise normally mean when mapped to an OS-specific path. I 
took the file://localhost/ exception into account when writing the conformance 
test at http://skew.org/xml/stylesheets/doc-id/, but I didn't worry about 
performing similar checks for the 'http' scheme, which, according to RFC 2616, 
says *should* be implemented such that quite a few exceptions are made to the 
usual character/bytewise comparison of URI equivalency.

The reason URI equivalence is important is for the processing of URI 
references found within a document. These are processed by first resolving 
them to absolute form by merging them with some base (which is typically, but 
not always, the URI of the document containing the reference), and then seeing 
if the reference can be satisfied within the context of the current 
representation of the document -- i.e., if the absolutized reference (minus 
its fragment) is equivalent to the current document's URI, then you are *not* 
to go out looking for a new representation of the document. You only 
dereference (fetch, resolve, whatever you want to call it) the resulting URI 
if it's definitely not the current document. And then you deal with the 
fragment part of the reference, if any.

The reason I bring this up here is because perhaps some mention should be made 
in XSLT 2.0 about how much is expected of an XSLT processor with respect to 
document identity.

For example, is an XSLT processor in error if the root node of a document with 
URI file://localhost/foo.xml is not the same node as the root node of a 
document with URI file:///foo.xml? How about http://somehost/~user/foo.xml vs 
HTTP://SomeHost/%7euser/foo.xml?

My feeling is that it should be an error, to maintain compatibility with the 
specs and to allow same-document URI references (most commonly the empty 
string, or just a fragment) to be implemented properly, even though for most 
folks it's a purely academic exercise.


Mike

-- 
  Mike J. Brown   |  http://skew.org/~mike/resume/
  Denver, CO, USA |  http://skew.org/xml/

Received on Monday, 6 January 2003 20:37:19 UTC