RE: XSLT 2.0: document identity, URI equivalence from Kay, Michael on 2003-01-07 (public-qt-comments@w3.org from January 2003)

From: Kay, Michael <Michael.Kay@softwareag.com>
Date: Tue, 7 Jan 2003 15:59:45 +0100
To: Mike Brown <mike@skew.org>, public-qt-comments@w3.org
Message-ID: <DFF2AC9E3583D511A21F0008C7E621060453DF1D@daemsg02.software-ag.de>
The current rules permit an implementation to use a more intelligent
definition of URI equivalence for the document function, but they do not
require it. (I say this on the basis that there's nothing in the spec that
prohibits it, and what is not prohibited is implicitly permitted).

I don't think the spec should require a more intelligent definition to be
used, because the rules are so open-ended. For example, file:///a.xml is
equivalent to file:///A.XML on some operating systems but not on others. The
chances are that

  http://example.com/servlet?a=x&b=y 

is equivalent to

  http://example.com/servlet?b=y&a=x

but you have to do some very careful reading of poorly-written RFCs to be
sure.

Most other specs have dodged this issue for similar reasons, for example the
definition of equality between xs:anyURI values in XML Schema. 

I would have no objection (personally) to the addition of a note pointing
out the permissiveness of the current spec, and perhaps encouraging
implementations to move in that direction.

Michael Kay


> -----Original Message-----
> From: Mike Brown [mailto:mike@skew.org] 
> Sent: 07 January 2003 01:37
> To: public-qt-comments@w3.org
> Subject: XSLT 2.0: document identity, URI equivalence
> 
> 
> 
> XSLT 1.0, in the description of document(), says "Two 
> documents are treated as 
> the same document if they are identified by the same URI. 
> [...] One root node 
> is treated as the same node as another root node if the two 
> nodes are from the 
> same document."
> 
> While there's nothing difficult to understand about that, URI schemes 
> sometimes mandate additional criteria for the equivalance of 
> URIs (and, 
> implicitly, of documents). For example, RFC 1738 says that 
> the 'file' scheme 
> considers file://localhost/ and file:/// to be equivalent, 
> regardless of what 
> localhost might otherwise normally mean when mapped to an 
> OS-specific path. I 
> took the file://localhost/ exception into account when 
> writing the conformance 
> test at http://skew.org/xml/stylesheets/doc-id/, but I didn't 
> worry about 
> performing similar checks for the 'http' scheme, which, 
> according to RFC 2616, 
> says *should* be implemented such that quite a few exceptions 
> are made to the 
> usual character/bytewise comparison of URI equivalency.
> 
> The reason URI equivalence is important is for the processing of URI 
> references found within a document. These are processed by 
> first resolving 
> them to absolute form by merging them with some base (which 
> is typically, but 
> not always, the URI of the document containing the 
> reference), and then seeing 
> if the reference can be satisfied within the context of the current 
> representation of the document -- i.e., if the absolutized 
> reference (minus 
> its fragment) is equivalent to the current document's URI, 
> then you are *not* 
> to go out looking for a new representation of the document. You only 
> dereference (fetch, resolve, whatever you want to call it) 
> the resulting URI 
> if it's definitely not the current document. And then you 
> deal with the 
> fragment part of the reference, if any.
> 
> The reason I bring this up here is because perhaps some 
> mention should be made 
> in XSLT 2.0 about how much is expected of an XSLT processor 
> with respect to 
> document identity.
> 
> For example, is an XSLT processor in error if the root node 
> of a document with 
> URI file://localhost/foo.xml is not the same node as the root 
> node of a 
> document with URI file:///foo.xml? How about 
> http://somehost/~user/foo.xml vs 
> 
HTTP://SomeHost/%7euser/foo.xml?

My feeling is that it should be an error, to maintain compatibility with the

specs and to allow same-document URI references (most commonly the empty 
string, or just a fragment) to be implemented properly, even though for most

folks it's a purely academic exercise.


Mike

-- 
  Mike J. Brown   |  http://skew.org/~mike/resume/
  Denver, CO, USA |  http://skew.org/xml/
Received on Tuesday, 7 January 2003 10:00:03 UTC