Re: Document Object Model (DOM) Level 3 XPath Specification from Ray Whitmer on 2001-06-22 (www-dom@w3.org from April to June 2001)

From: Ray Whitmer <rayw@netscape.com>
Date: Fri, 22 Jun 2001 11:02:10 -0700
To: Christian Nentwich <c.nentwich@cs.ucl.ac.uk>
CC: Philippe Le Hegaret <plh@w3.org>, bradford@dbxmlgroup.com, www-dom@w3.org
Message-ID: <3B338822.8060606@netscape.com>

Christian Nentwich wrote:

>>>>I have further requirements that require the repeated evaluation of
>>>>XPaths (>= 100000 times). I find the Xalan approach quite useful here,
>>>>which parses the path once to be evaluated multiple times. I see quite a
>>>>bad performance hit coming along if I use the interfaces from the draft
>>>>
The feedback is good, and must be weighed.  If it could be shown that 
the gain were worth the added complexity for most users, a factory based 
approach for compiled expressions could be workable.  If, on the other 
hand, this is a usage pattern that most anticipated usages or 
implementations will not benefit from, then the cost must be considered.

>>>>
>>Either of these cases can be improved in the implementation by very
>>simple caching schemes:
>>
>
>I agree completely. However, 100000 hash table lookups take longer than 0 
>hashtable lookups. More importantly, the application programmer will
>know a lot better when a parsed path object is not needed anymore. I would 
>like to be able to control this behaviour, not have it hidden behind the 
>interface. I can already see myself editing the processor source code because 
>it discards my objects too frequently :) One size does not fit all...
>
This is the premise of manual allocation and deallocation schemes -- 
that the application will control it manually better.  This is quite 
likely to NOT be true.  Clearly one size never fits all.  Standards is 
about making some compromises to handle a good deal of the cases well. 
 A robust cache might adjust itself based upon actual usage, and make a 
better decision than even a conscientious application coder.  It might 
never discard entries until space got tight.  This obviously makes the 
application coder's life easier and the implementation coder's life 
harder, if he produces a quality implementation.  This is the way it 
should be in many cases.

I do not think that 100000 hashtable string key lookups will be 
significant compared to 100000 XPath expression evaluations.  This 
reminds me of problems in the Java libraries with font creation, which 
is an extremely expensive operation.  But that doesn't stop applications 
from creating fonts in loops or in events or wherever they happen to 
find themselves and not keeping around fonts, slowing down many 
applications incredibly.  A little bit of caching can go a very long way 
to solving the problem.

If it were decided that common use cases made a factory mechanism for 
the compiled expression a good idea, it would still be very prudent for 
the implementation of the factory mechanism to cache for the majority of 
applications that did not keep them between uses but often produced the 
same expressions.  The hashtable lookup hit would be more than worth the 
gain for users who commonly may not carefully manage their compiled 
expressions.

Ray Whitmer
rayw@netscape.com

Received on Friday, 22 June 2001 13:58:15 UTC