Re: Document Object Model (DOM) Level 3 XPath Specification from Bob Foster on 2001-06-26 (www-dom@w3.org from April to June 2001)

From: Bob Foster <bob.foster@webgain.com>
Date: Tue, 26 Jun 2001 04:55:37 -0400 (EDT)
To: <www-dom@w3.org>
Message-ID: <046201c0fe21$2c7b2e20$75181eac@itools.symantec.com>
Very nice to see this! As an implementor of a DOM-based XPath, it is always
nice to think that someday I will be able to stop maintaining it. I tried to
make my comments brief, but there are 10 of them.

1) [1.2.1 Text Nodes] This is very imprecise. Does the section describe the
behavior of the text() node test or are there other incompatibilities? Each
incompatibility should be specifically identified.

2) Irrespective of the data model differences the XPath text() node test is
defined as true for all text children, not for "the text child" and
certainly not for "the first text child". Arbitrarily redefining text() in
this way fixes nothing and introduces further incompatibilities. If the
definition of text() is left alone a normalized DOM document written without
CDATA sections should be processed compatibly.

3) If you are going to add a method to the DOM, it would be far better to
introduce a variant of normalize() that coalesces adjacent text (that is,
Text and CDATA) nodes exactly as described in the XPath specification.

4) [Interface XPathEvaluator] There should be a single evaluate() method
returning an Object. The number of possible types from XPath 2.0 expressions
will make enumerating them unproductive; might as well get it right the
first time. A general-purpose method may have no way to determine the type
beforehand, and would need an inelegant switch statement to make use of the
type if it knew it. I did it wrong in a similiar way myself thinking it
would be a convenience for programmers before discovering that a) the
Object-returning variant is needed in any case and b) in most cases the
convenience amounts to the absence of a cast and, for nodeset values, a call
to getFirst(). Programmers can provide this level of sugar for themselves.

6) Some responses seem to think that the Node-returning variant is meant as
a hint to XPath that at most one node need be returned. If this is the sly
intention ("There is nothing to stop an XPath implementor from taking
advantage...") it should be made explicit. I agree that this is a common
case and a useful optimization (you can slap a /.[1] at the end of any node
locator, but you can't stop most XPath implementations from grinding out and
testing all n nodes). It just shouldn't sneak in the back door.

7) [Interface ActiveNodeSet] For simplicity and concurrency reasons,
ActiveNodeSet should be eliminated entirely in favor of StaticNodeSet.
Without explicit synchronization of access to the DOM, the useful lifetime
of an ActiveNodeSet cannot be determined. It is possible a returned instance
might already be invalid.

Obviously, if this were done some methods should transfer to StaticNodeSet,
esp. getDocumentOrderedSet().

8) [Interface StaticNodeSet] A getFirst() method defined as returning the
first element or null would be a handy addition. A getOnly() method defined
as getFirst() but throws if there is more than one node in the set might be
even better.

9) [Issue getDocumentOrdered-1] DOM ordering and XPath ordering should be
the same. For better or worse, document order is fundamental to correct
XPath evaluation. As the editor correctly points out, this would clear up
the namespace node issues, as well.

10) [1.2.2. Namespace Nodes] Seems like the wrong answer, esp. in light of
the ordering issue. A separate interface to find the replicated namespace
nodes could be provided, maintaining compatibility. A good DOM
implementation wouldn't instantiate the nodes unless they were needed. It
could, however, arrange to correctly _count_ the virtual nodes in a
space-efficient way as the DOM is constructed.

Bob Foster
WebGain
Received on Tuesday, 26 June 2001 13:04:05 UTC