Re: Single Node XPath Evaluation Is Ambiguous

Hi Ray,

Thanks for the response. Yes, I completely agree with what you are saying.
You definately do not always want to have to return the first node in
document order, that would be very inefficient.

But, I am making a completely different point. I am saying, that when the
result is not returned in document order, then it should be returned in
evaluation order. Here, I am defining evaluation order as being the order
that a node would be found if the XPath is evaluated as the spec reads; the
way all but very exotic XPath processors will have been implemented.

Left to there own devices, it is likely that *most* implementations *will*
return the first node in evaluation order. However, it *may* occur, that
some intelligent, highly optimized XPath processors will find it more
efficient to provide the result in some other order. I am asking that this
behaviour be explicitly stated, as it is already for XSLT 1.0 processors.

If this does not happen, then an XPath processor that works differently to
the rest, will mean the node returned from an unordered, single node
evaluation will be ambiguous. For example, if I want the first ancestor that
satisfies a given requirement, then the second or third ancestor that
satisfies the same requirement will not be useful to me.

I am just asking that DOM XPath is aligned with XSLT for its unsorted,
single node behaviour.

Oh, thank you for the XPath implementation in Mozilla if that is your great
work!


Message-ID: <3DC26FC6.3050904@netscape.com>
Date: Fri, 01 Nov 2002 04:12:54 -0800
From: rayw@netscape.com (Ray Whitmer)
To: www-dom@w3.org
Subject: Re: Single Node XPath Evaluation Is Ambiguous


Dominic Chambers wrote:

>The DOM Level 3 XPath module allows single nodes to be selected,
>where the returned node can be either the first node in document
>order, or any node from the result set. The ability to select
>the first node in document order is certainly useful, but the
>second definition is definately not.
>
>
I believe you missed common valid use cases.

If I search for a node that I know occurs once in a document (because of
control over the application that created it, validity checking, or
whatever), it is not necessary to require that it is the first node that
occurs in document order of the result set, which may potentially be
quite complex and require significantly more processing to produce.

If the implementation is free to return the first node it finds that
matches the criteria, rather than guaranteeing that a
potentially-complex expression does not return in later processing some
other node that preceeds this one in document order, then it may be able
to avoid significant processing.  This may vary greatly depending upon
the implementation and the particular expression being evaluated.

Let me try to give more-concrete examples.

1.  Let's say I have a simple xpath expression that looks for a
particular node in two different places.  If due to constraints on the
document, I know either that a. The item does not exist at both places,
or b. the information derived from either location is equivalent, then I
can stop searching if I find the item at the first location.  If, on the
other hand, the implementation is required to return the first node
sorted in document order, then the implementation will be obligated to
look for the information in both places.

2.  Let's say that I am trying to make a simple implementation of this
API without building the logic to know whether a particular expression
may naturally produce nodes out of order.  In this case, if I receive a
request that requires the first node, sorted in document order, then I
will require the expression to completely process to make sure it did
not produce another node that preceeded the first one produced.  The
implementation may well not be sophisticated enough to tell the
difference between a simple expression that could have guaranteed order
by the way it did the processing and a complex one that did not.  If I
know that the application does not care, for a variety of reasons
including, that the document is valid and contains only one, the pieces
of information are equivalent, etc. then I can stop when I find the
first one and return it.

In my experience, if I am looking for a single piece of information,
finding two different pieces of information would present me with a huge
headache of deciding which piece of information to use.  This is why I
would rely on a model where there cannot be contradictory information in
the document.  Once I have guaranteed this through validation or some
other means, I want my XPath evaluation to just look until it finds the
one piece of information and not care whether there might be another
piece of information that precedes it in document order.  This is why
this use case seems more common to me than the case where I want to do
the extra processing that may be required to also ensure that it is the
first node found within the document.

I do not doubt that others may have use cases that they consider more
common where they want to do the extra processing required, perhaps to
completely process the expression in a simple implementation, to
guarantee that the node they found was the first one.  But just because
XSLT forces the first result to be returned does not mean that that is
all that should be made available to DOM applications.  If I am writing
a DOM application, I will use XPath if I know it is both efficient and
convenient.  But if I think it may search the entire document just
because of the ordering constraint for something that a
more-conventional search would terminate as soon as it found the first
one, then I will hand-code it rather than using XPath API.  The point is
to make it so that even simple implementations can be expected to
terminate instantaneously when they find a matching result rather than
worrying about order.

Clearly the single result that produces an object not in any order would
not be used in an XSLT implementation.  Also, the simple examples you
cited would not typically be used in a DOM application except
occasionally parametrically, because simpler accessors exist to achieve
those results.

Ray Whitmer
rayw@netscape.com

Received on Wednesday, 6 November 2002 16:17:51 UTC