Re: Walking the DOM (was: XML APIs)

John Cowan <cowan@locke.ccil.org> writes:

> Stephen R. Savitzky wrote:
> 
> > [T]he classic algorithm for traversing a tree is:
> > 
> > traverse(node) {
> >   visit(node);
> >   if (node.firstChild != null) traverse(node.firstChild);
> >   if (node.nextSibling != null) traverse(node.nextSibling);
> > }
> 
> The trouble with that algorithm is that it is recursive.  It will
> blow up if the tree is sufficiently deep.  Indeed, in
> languages that cannot be relied on to do tail recursion, like
> Java, it will blow up if the tree is merely sufficiently wide.

It is trivial to transform this into a non-recursive algorithm, which in
effect I do in the next paragraph where I define an iterator.  The
iterative algorithm then becomes:

  TreeIterator i = new TreeIterator(node);
  while (node != null) { visit(node); node = i.nextNode(); }

Given a few more methods on TreeIterator, here omitted in the interest of
brevity, it is trivial to extend this to include end-of-node processing.

> Furthermore, if there is any end-of-node processing to do, such as
> emitting an end tag indication, then the algorithm is no longer
> even partly tail recursive and will blow up on both depth and
> width even in safe-tail-recursion languages.

I find this quite insulting.  My intent was to illustrate a rhetorical
point, not to give an optimal, industrial-strength algorithm for tree
traversal under all circumstances.  

You _do_ further illustrate my point, however: the algorithm you give fails
as miserably as mine if your visit or revisit function modifies the tree.
Under the assumptions you have expressed elsewhere, then, it could not be
used as the basis for the implementation of an Iterator.

My main point is that I _would_ like it to be used as the implementation for
iterators.  (The iterators I actually implement are somewhat more elaborate
because they take into account the possibility that the entire tree may not
be resident in memory at any one time.)

> Because of the reliability of this algorithm vis-a-vis the recursive
> one, I believe it should be the standard way of walking DOM trees,
> and therefore it is essential that DOM implementations make the
> structural access methods fast.

We are in agreement here, but I think you may have missed something.  An
iterator, the state of which consists of a reference to the ``current
node'', and all of whose methods are inlineable, can be compiled by a good
optimizing compiler into the exact equivalent of your algorithm.

An iterator that has to take structural changes into account _cannot_ be so
compiled; it is necessarily going to be _FAR_ less efficient.  Similarly,
your algorithm cannot be used as the basis for a live NodeList.  I believe
it is also essential that DOM implementations make structural _modification_
methods fast, which is also incompatible with the kind of non-locality of
effects imposed by live NodeLists and ``robust'' (in your definition)
iterators. 

QED.
-- 
 Stephen R. Savitzky   Chief Software Scientist, Ricoh Silicon Valley, Inc., 
<steve@rsv.ricoh.com>                            California Research Center
 voice: 650.496.5710   fax: 650.854.8740    URL: http://rsv.ricoh.com/~steve/
  home: <steve@starport.com> URL: http://www.starport.com/people/steve/

Received on Tuesday, 3 November 1998 14:54:37 UTC