Re: The DOM XPath interface -- what is its status? from Frans Englich on 2005-05-10 (www-dom@w3.org from April to June 2005)

From: Frans Englich <frans.englich@telia.com>
Date: Tue, 10 May 2005 03:00:15 +0000
To: Robin Berjon <robin.berjon@expway.fr>
Cc: Philippe Le Hegaret <plh@w3.org>, www-dom@w3.org
Message-Id: <200505100300.16301.frans.englich@telia.com>
I apologize for following this up first at this point. Also, I don't know if 
it's an appropriate time for bringing these issues up(but hey.. it's so cheap 
to send mails nowadays).


On Monday 04 April 2005 22:18, Robin Berjon wrote:
> Philippe Le Hegaret wrote:
> > On Mon, 2005-04-04 at 21:36 +0200, Robin Berjon wrote:
> >>We discussed it briefly in November and indeed we may have some. In
> >>particular, our users tend to think that it's overly complicated,

I am too of the personal opinion that there is complexity in usage, more 
specifically the XPathResult interface. Accessing the result "feels" 
cumbersome and I don't percept it as straight forward. One way to view the 
DOM XPath interface as, is as a convenient method for document navigation in 
hand-written code, hence promoting XML adoption, and complicated 
result-introspection defeats that. I have no suggested solutions, or can 
discuss various approaches to solutions.

>> >>which 
> >>we have attributed (perhaps wrongly) to trying to be forward-compatible
> >>with XPath 2.0.
> >
> > I don't believe the current complexity should be attributed to XPath 2.0
> > (which is a fine and compelling technology for XSLT 2.0 btw but this is
> > not the place for XPath 2.0 advertisement :).
>
> I didn't mean to say anything wrong about XP2, simply that adding
> support for it seemed *perhaps* a touch premature :)
>
> > The only complexity added by XPath 2.0 is the result of the
> > XPathEvaluator.evaluate method. It is a DOMObject instead of a
> > XPathResult object.
>
> Ok, that makes sense. It's an extra level of indirection but I guess it
> makes no difference in Ecmascript so our primary constituency won't ever
> care, or in fact notice. I'll note however that where our users use Java
> instead of Ecmascript, it tends to be on mobile devices and there
> generally try to avoid casting as it's an operation with measurable cost
> in those environments. We also tend to try to contain interface
> proliferation so as to be as implementable on limited devices as
> possible (see our work on the SVG uDOM for instance).
>
> > The other added complexities were:
> > - the type and result parameters on the XPathEvaluator.evaluate method;
> > - the iterator vs snapshots in the XPathResult interface;
> > - the XPathExpression object;
> > - the support for XPath namespace nodes.
> > The first three were done due to performance consideration. The last one
> > was done for full XPath 1.0 support.
>
> I'd forgotten about those, thanks, they came up too. Some serious
> concerns were raised during discussion that the iterator results were
> rarely available in existing XPath implementations that exposed an API.
> For the most part, SVG implementors are happy to add DOM 3 XPath if most
> of the work is in fact mostly glueing an existing implementation into
> theirs. If those are dropped, we can then drop the type parameter to
> XPathEvaluator.evaluate. This in turn drops a bunch of iterator-related
> fields, and the need for the implementation to maintain iterator validity.
>
> We can then look at the result field, which is also something that isn't
> always available, and strikes me as the kind of optimization that users
> either ignore entirely or misuse.
>
> The XPathExpression object is ok, though I'm not sure it's such a huge
> gain in performance. An implementation that doesn't want to recompile
> XPath expressions every time can cache them itself. XML::XPath for
> instance caches its own internal version of XPathExpression in a hash
> table keyed on the XPath expression it's being fed and check that before
> compiling. It's hardly rocket science, and it works much more reliably
> than putting the onus on the user to remember to pre-compile her
> expression.

That is one of my reasonings too, and was initially my motivator for the 
opinion that the XPathExpression interface could be skipped, however: there's 
the XPathEvaluator.evaluate(DOMString, ...) function, and hence does the user 
have the alternative to not use XPathExpression.

Nevertheless I think it is valid to ask why to have it all. Here's my 
reasoning to why:

* It do provide a mechanism for optimizing(more specifically, saving a 
dictionary/hash lookup). This might seem neglectable, but having right, which 
dictating what's needed is, seldom succeeds. Who knows, perhaps a client-side 
Javascript XSL-T engine pops up...

* While not a reason, it is to my experience that implementations already have 
some form of internal representation, meaning the interface can easily be 
implemented(and even if not). 

>
> The namespace bit is perfectly logical and not a big deal.
>
> Hey, it looks like I could implement this atop XML::LibXML in a couple
> hours, docs included ;)
>
> Now that thanks to you refreshing my memory I've had fun with my
> chainsaw, there's one thing that we found to be lacking in the
> interfaces: a way to register functions. Many implementations support
> that feature, and it's a very, very useful one (eg it would allow us to
> implement SVG's Extensions to XPath directly, as well as all the XForms
> functions).

My experience confirms that too. It have also been expressed on xml-dev that a 
standardized way of registering functions is of interest.

How would such an interface look?

I'm writing an XPath implementation exported via DOM's API, and it have an 
factory for creating "FunctionCallImpl" instances(AST nodes). Hence, 
extension would happen by inheriting from the factory, and register that 
factory for use in expression compilation(which then returns FunctionCallImpl 
instances for custom functions).

However, I find this a complex discussion, because it easily goes close on the 
implementation approach. With my approach described above, such an 
"XPathFunctionCall" interface should describe its function signature: return 
value(if any), its arguments(data types, and which ones that are optional); 
from what I can tell. Hence, it must be able to express itself in somekind of 
data model. If the data model is not close to XPath 2.0, it would require 
heavy extension(but that could on the other hand be easily fixed with a new 
interface). Perhaps it could be close to a potential XPath 2.0 FunctionCall 
interface, but restrictive(say, in allowed QNames for data types..).

Merely mentioning another topic:

In some cases it is of interest to restrict what is an allowed XPath 
expression; such as in W3C XML Schema's identity constraints, or XSL-T's 
match patterns. In the factory-approach I described above, I implemented this 
as an "ExpressionFilter" registered on the factory, similar to Traversal's 
NodeFilter, that is asked for what xpath components that are allowed. I have 
not full insight in what is the requirements for the XPath API, if it is 
merely for "user programming", I doubt this is needed.

>
> >> Noise was made according to which it would be simpler to
> >>drop XP2 support entirely now and revisit it later, and add (optional)
> >>CSS support instead.
> >
> > I doubt you would simplify the current proposal by dropping XP2 and
> > adding CSS, unless you reuse the XPathResult for CSS results as well.
>
> Again, this is preliminary as I haven't had time to do full research
> into this yet but so far I have no reason to believe that the
> XPathResult couldn't be reused for CSS as well (though a different name
> may be an option :). The only exceptions I can think of are some of the
> pseudos, though an obvious mapping could be found. For first-letter and
> first-line a string could be returned, and for ::after and ::before the
> matching element with perhaps an additional field in XPathResult. But
> even the latter seems overkill and could simply be declared unsupported
> without anyone complaining about it (famous last words ;).
>
> You would need some extra wording indicating that some queries won't
> ever return anything in some implementations (eg the dynamic ones such
> as :hover in non interactive agents and the rendered ones like
>
> ::first-line in implementations that don't have a rendering) and also to
>
> provide the simple mapping between the two models (CSS doesn't talk
> about node sets even though they're there) but neither changes the
> actual implementation. This part would be optional and may be deferred
> to a later version depending on support for it (though bridging XPath
> and CSS is on my list of things I'd like to see more consensus about
> this year).

This sounds foreign to me, where can I find more information about this? 
Why/how does SVG need XPath?(documents somewhere?) I understand XBL uses 
XPath and/or CSS for selector mechanisms, but I don't see how that would 
require DOM interfaces.


Regards,

                Frans


Frans Englich
KDE Developer
Received on Tuesday, 10 May 2005 02:50:57 UTC