RE: [F&O] 11.1, 7.3, 15.4.4, and 15.4.5 from Kay, Michael on 2003-05-15 (public-qt-comments@w3.org from May 2003)

From: Kay, Michael <Michael.Kay@softwareag.com>
Date: Thu, 15 May 2003 05:23:46 +0200
To: Kian-Tat Lim <ktl@ktlim.com>, public-qt-comments@w3.org
Message-ID: <DFF2AC9E3583D511A21F0008C7E62106073DCE35@daemsg02.software-ag.de>
Your comment repeats some points made in the message that you added to the
fn:doc() thread, so my reply will repeat some of the points I made in my
reply to that message. This is a provisional reply - we are all very busy at
W3C meetings this week, and doing our email in snatched moments. This
comment was too late to be included in the agenda for this week's meetings,
so it may be some weeks before you get an official response.

> 
> Section 11.1 of "XQuery 1.0 and XPath 2.0 Functions
> and Operators" (F&O), describing fn:resolve-uri,
> states that the function "resolves the relative URI
> $relative against the base-uri $base and returns
> the resulting absolute URI".  It does not give
> an algorithm for doing so.  Since RFC 2396 is
> cited as a normative reference in section A.1,
> it would seem that the algorithm given there in
> section 5.2, "Resolving Relative References to
> Absolute Form", is the appropriate one for executing
> this function.

I agree with you, it would be appropriate to reference RFC 2396 for the
definition of this algorithm. We might have to add some careful caveats,
though. For example, we disallow resolving a relative URI against another
relative URI (I don't recall why), although RFC 2396 allows it.

RFC 2396 unfortunately mixes its description of the algorithm for resolving
a relative URI against a base URI (which is purely a syntactic operation on
two strings) with a great deal of material about the intended meaning of the
two URIs and the intended purpose of the operation. This makes it quite
difficult to reference the parts of the RFC that are actually relevant.
Section 5.1 of the RFC, which discusses different ways of obtaining a base
URI, is not relevant to our resolve-uri() function, because in our case the
base URI is either supplied as an explicit argument to the function, or
obtained as a property of a node in a way that is specified by the Data
Model. Section 5.2 is largely relevant, but occasionally relies on concepts
defined elsewhere in the RFC in an inappropriate way.

> 
> That algorithm states, in part:
> 
> For each URI reference, the following steps are performed in order:
> 
>     1) The URI reference is parsed into the potential four
>        components and fragment identifier, as described in
>        Section 4.3.
> 
>     2) If the path component is empty and the scheme, authority, and
>        query components are undefined, then it is a reference to the
>        current document and we are done.  Otherwise, [...]
> 
> In Appendix C of that RFC, "Examples of Resolving Relative
> URI References", Section C.2, "Abnormal Examples", states
> explicitly:
> 
>      An empty reference refers to the start of the current document.
>           <>            =  (current document)
> 
> Both of these appear to be in conflict with the last 
> paragraph of F&O section 11.1 (before the Note), which states:
> 
>     If the $relativeURI is the zero-length string, returns the
>     value of the base-uri property from the static context in
>     the first form and $base in the second form.

The RFC unfortunately uses the term "current document" without defining it.
I think the general style of the RFC makes it fairly clear that section 5.2
is describing URI resolution as a function that takes two URIs (character
strings) as input, and produces a single URI as output. I think therefore
that "a reference to the current document" can only be interpreted as
meaning "the value of the base URI". Certainly in our data model we have no
concept of "the current document"; our nearest equivalent to that concept is
the base URI from the static context. 

> 
> Section 2.1.1 of "XML Path Language (XPath) 2.0" does not 
> provide for the current document's URI in the static context, 
> only an environment-specified base URI.

RFC 2396 uses a great deal of language which reveals its origins as a spec
for hyperlinking between static documents. In an environment where a query
or XPath expression may be contained in a Java or XSLT source program, and
compiled into an object form which is executed on a completely different
machine, we can't get away with loose concepts like "the current document".
This is why the "Base URI" is formalized and abstracted as a part of the
static context for expression evaluation. We simply can't assume that
everyone is operating in a simple interpretive world where the XPath
expression is always contained in a stylesheet and the stylesheet is always
available for read access at execution time. There are already situations
where compiled stylesheets have been included in commercial software
products, and the purchasers of those products have no more access to the
source stylesheet than they have to any other part of the source code. If
you are operating in a simple interpretive world, then it's likely that the
Base URI will indeed be the URI of the XML entity containing the relevant
part of the stylesheet, but we have to allow for other possibilities.

> 
> There appear to be two alternatives:
> 
> 1) Add text to section 11.1 stating that the URI resolution 
> algorithm to be used differs from that in RFC 2396 in the 
> case of a zero-length string $relativeURI, and that the base 
> URI is the result instead of the current document's URI.

Yes, we can add such text, but I don't think our algorithm differs. All we
need to say is that concept of a "current document", which RFC 2396 leaves
undefined, is defined in our environment as meaning the base URI from the
static context.
> 
> Finally, a typo: section 15.4.4 of F&O specifies
> "fn:doc($uri as xs:string?) as document?", but its
> third paragraph begins "If $srcval is the empty
> sequence".  This should be replaced with "If $uri
> is the empty sequence".

Thanks for this. And for the comments generally.

Michael Kay
Received on Wednesday, 14 May 2003 23:24:39 UTC