More XPath and namespace musings

We're currently designing an API based around XPath for a legacy
databse system (Model 204, if you're interested) and have generally
found XPath to be a wonderful way of mapping XML documents.
Unfortunately, as we've started looking at namespace support
we're suddenly filled with concerns and trepidation. First, I'll
summarize some of our concerns (one of which I mentioned in a previous
note) and a proposal for how are concerns can be met.

The concerns:

A. XPath's reliance on prefixes (largely) to get at namespace nodes makes
   it very difficult to get at default namespace nodes. Because the default
   namespace for a node is used to generate the expanded name but is not used
   for node tests one has to use a prefix to get at any node that uses the
   default namespace. But there might not be such a prefix in a document.
   As noted in a previous note, the seemingly innocuous action of adding
   an explicitly specified default namespace to an XML document breaks
   existing XPath applications written against that document.

B. XPath's reliance on prefixes encourages heavy dependence on what I would
   consider an artifact of the XML document representation, namely the prefixes.
   The prefixes are simply handles for referring to the underlying URI's but
   take on a life of their own with XPath. Even worse, a prefix might have
   different meanings at different levels of a document and might shift
   meanings without warning (from Xpath) at any level.

Both of these issues present major forward compatibility issues for writing
applications using XPath. It seems that one of the appeals of namespaces is
that I can write an application that processes a document. If later, someone
decides to add information to the document with perhaps some local-names and
prefixes that conflict with the ones in a document it should be OK as long as
those local-names and prefixes are in a different namespaces.

Note that the namespace-uri function allows applications to be written that are
robust and forward compatible but XPath's insistence that the default namespace
on any node test is null make this largely unworkable forcing one to use expressions
like

  /*[namespace-uri()="http://sirus-software.com/namespaces" and local-name()="whatever"]

And that's just for one level in the tree. Oy.

To my mind, XPath's utility in a namespace intensive world rests on its
ability to hide information in a document for namespaces in which an application
is not interested from that application. Along these lines, I think the XPath
spec with some relatively minor tweaking could accomplish this. I understand
that some of the things I'm proposing are backward incompatible with the
existing XPath spec but I would posit that the current XPath spec will make
adoption of namespaces very difficult where XPath is in use because of the
forward compatibility issues.

So the proposal:

1. The context node at every location step would have associated with it
   a context namespace. The context namespace would be the default namespace
   (the namespace used if an NCName is used in the node test) in evaluating
   the location step. This seems more XPathy than the current approach.
   XPath is highly location step context oriented but it seems to drop
   namespace context way too eagerly.

2. One exception to 1 would be for location steps along the attribute axis
   where the default namespace would be null since attribute expanded names
   use a null namespace unless one is explicitly specified via a prefix.

3. The prefix * would mean "any namespace" as in /*:grandpa/*:pa/*:youngun.

4. The semantics of "*" would be any local-name in the context namespace.

5. "*:*" would mean any local-name in any namespace.

6. The namespace of the root node would be defined as being the same
   as the document node. This would allow an expression like "//youngun"
   to only return youngun elements for the document node's namespace
   by default. If you want namespace independent elements in such an
   expression, simply specify "//*:youngun"

There a few open issues that I think are debatable:

7. Should parent/ancestor/ancestor-or-self axes use a default namespace
   of "*"?

8. Depending on the answer to 7 should ".." be changed to mean "parent::*:*"?

9. How would one insist on a null namespace. ":name"? While I don't believe
   this produces any parsing ambiguities it does produce ugly looking
   things like ":::child". An option is to us an invalid NCname character as
   a placeholder to mean "null namespace" as in "child::.:name". OK, not
   particularly pretty either.

In any case, I think this approach allows non-namespace conversant XPath
apps to continue working even when namespaces are introduced. The expression
"/pa/youngun" in a non-namespace document simply uses a null namespace
as the context namespace at every step. If a default namespace is added
to the document node, no problem, the context namespace for each step
simply becomes that namespace. If a "youngun" element is added for
a different namespace it will be invisible to the old app until it is
fixed with something like:

   /pa/*:youngun[namespace:uri()="http://troublemakers.com/junk"]

or, to my mind a little less nicely

   /pa/trouble:youngun

Less nicely because it adds a dependency on the rather arbitrary prefix
used in the doc. I think this approach makes the use of the
namespace-uri function more palatable because once that function has
established a namespace context it doesn't have to be respecified
unless the namespace changes again.

Again, I understand this approach is backward incompatible with XPath
1.0 but we believe that the XPath 1.0 approach is so frought with forward
incompatibility problems for namespaces that we might provide this
incompatible behavior as an option to our users and maybe even make it
the default.

The reason I'm sending this to this listserv is that while namespaces
are probably still in the early stage of heavy adoption, XPath implementers
must be a bit worried about the implications of namespace adoption by
users writing XPath based aplications and must be scratching their
heads about some of the issues that are concerning us (especially default
namespaces).

Thanks and sorry about the long note.

Alex Kodat
Sirius Software
Cambridge, MA

Received on Tuesday, 6 November 2001 10:56:18 UTC