RE: XML Base and XPath absolutizing of URIs from John Boyer on 2000-06-09 (w3c-ietf-xmldsig@w3.org from April to June 2000)

From: John Boyer <jboyer@PureEdge.com>
Date: Thu, 8 Jun 2000 18:39:42 -0700
To: "Paul Grosso" <pgrosso@arbortext.com>, "Joseph M. Reagle Jr." <reagle@w3.org>
Cc: "XML DSig" <w3c-ietf-xmldsig@w3.org>, <elm@east.sun.com>, <www-xpath-comments@w3.org>, <www-xml-linking-comments@w3.org>, <Daniel.Veillard@w3.org>, <connolly@w3.org>
Message-ID: <BFEDKCINEPLBDLODCODKIEDLCDAA.jboyer@PureEdge.com>
Hi Paul,

Good, I am glad we agree on so much.

It seems we agree XPath is currently missing the following requirements:

A) the parser that provides the XML document to XPath via the initial
context node must provide the document in the form defined by infoset.

B) the infoset must provide absolutized URIs.

My earlier assertion that applications were free to do or not to do URI
absolutization was based on the assumption that one could use any XML
processor to construct the initial context node-set (because the above
requirements are not stated in XPath).  Because, as we agree, the XPath
evaluator has no access to a base URI, it has no ability to enforce the
absolutization, which is why the implementations I've encountered (not from
my company) do not do it.

Most importantly, though, I am sure you are aware that the latest infoset
draft (20 dec 1999) does not require absolutized URIs to be available.  The
infoset is free to provide either the literal string or the absolutized URI.
Hopefully this will change because, as it stands, I prefer Xpath's choice to
always absolutize.  Every place in infoset where it says 'implementations
can do one thing or another and we won't nail down which' makes it that much
harder to create a canonical form that can be fed to a digest algorithm.

John Boyer
Software Development Manager
PureEdge Solutions Inc. (formerly UWI.Com)
Creating Binding E-Commerce
jboyer@PureEdge.com


-----Original Message-----
From: w3c-ietf-xmldsig-request@w3.org
[mailto:w3c-ietf-xmldsig-request@w3.org]On Behalf Of Paul Grosso
Sent: Thursday, June 08, 2000 5:08 PM
To: John Boyer; Joseph M. Reagle Jr.
Cc: XML DSig; elm@east.sun.com; www-xpath-comments@w3.org;
www-xml-linking-comments@w3.org; Daniel.Veillard@w3.org; connolly@w3.org
Subject: RE: XML Base and XPath absolutizing of URIs


[I removed xml-uri from the distribution.]

At 16:33 2000 06 08 -0700, John Boyer wrote:
>Yes, absolutely no problem with XBase. ...

>As for whether XPath defines a method for specifying a base URI, it does
>not.

You are right--I misread you to say that XPath doesn't specify
an absolutization algorithm, and I suggested it does by reference
to 2396.  Also by same reference, XPath assumes 2396 ways of
determining the base URI, but you are correct that it does not
specify any way to do so via the document content (per section
"5.1.1. Base URI within Document Content" of 2396).

>[XPath] says that a namespace declaration can be a URI reference, and
>that URI-references are defined by RFC2396.  The conversion from relative
to
>absolute URIs is claimed to occur during namespace processing. The
>namespaces spec does not define this!

You are right, it would be the Infoset that specifies this, and
the Infoset is stuck right now.

>Moreover, the problem with claiming
>that RFC 2396 defines how to do this is that RFC2396 only describes the
>rules for establishing a base URL for a document and how to convert from
>relative to absolute URI *given a base URL* (sections 5.1 and 5.2
>respectively).  There is nothing to say how an Xpath evaluation is supposed
>to receive the base URL.

I'm not sure what it means for XPath "to receive the base URL".
XPath works on a data model that was described within XPath
only because the Infoset wasn't yet ready, but it was supposed
to match that of the Infoset.  The right thing to happen is for
the absolutized URI to be in the infoset and for XPath to work
off the infoset.  Then XPath doesn't need to concern itself
with this issue at all.

>Put another way, consider the following quote from the XPath
Recommendation:
>
>"Expression evaluation occurs with respect to a context. XSLT and XPointer
>specify how the context is determined for XPath expressions used in XSLT
and
>XPointer respectively. The context consists of:
>
>a node (the context node)
>a pair of non-zero positive integers (the context position and the context
>size)
>a set of variable bindings
>a function library
>the set of namespace declarations in scope for the expression"
>
>Where is the base URL in this input specification?

No where.  It shouldn't be.  Rather, "the set of namespace declarations
in scope for the expression" should all be in already absolutized form.

>The only thing I can think of is that software external to an XPath
>implementation must know the base URL using the rules established by RFC
>2396.  Further, since there is no way to communicate the base URL to XPath,
>the external software must apply the relative-to-absolute conversion rules
>defined in RFC2396 to the data structures it creates in support of setting
>up the context node.  Therefore, by the time you get to code that is
>actually part of the XPath implementation, the namespace absolutization has
>already been done by the external code, and the XPath implementation just
>treats them like strings.

Precisely.

>Conclusion: Since there is no way defined by the XPath spec to provide the
>base URL as part of the initial evaluation context, there is no way for the
>XPath evaluation to enforce absolute URIs.  They're just strings to the
>XPath evaluator.

Correct.

>Thus, the external, application-dependent code that must
>absolutize can also choose not to do it.

Huh?  It's not application-dependent code, it would be the
underlying parser layer that generates the infoset, and it
can't choose how to do it, it has to do it however we decide
it gets done.

>Since XPath is in violation of the
>namespaces spec anyway for trying to absolutize URIs, the feature should be
>removed by an erratum.

This is the issue up for discussion on xml-uri.  What XPath is doing
wrong is assuming that it has anything to do with absolutization
instead of just relying on what's in the Infoset.  (But since XPath
was written before the Infoset, this isn't surprising.)

>Alternately, XPath could be modified by an erratum
>to indicate either that the base URL is provided by XBase or as an
>additional component of the evaluation context.

XPath should never need the base URI.

>One way or the other, something about XPath needs to be changed.

Once this namespace question is resolved, the Infoset can be
completed, and then XPath should probably be rewritten in terms
of the infoset.

paul
Received on Thursday, 8 June 2000 21:39:48 UTC