Re: Type-safe iteration over the DOM in DOM 2 & 3? from Peter Meyer on 2001-03-20 (www-dom@w3.org from January to March 2001)

From: Peter Meyer <petermeyer99@hotmail.com>
Date: Tue, 20 Mar 2001 19:06:09 -0000
To: www-dom@w3.org
Message-ID: <F129qZr5chn3g2CNqNQ00006230@hotmail.com>
Thanks for the answer. Much appreciated.

Please see below...


> >I am sure it has been considered by the W3C
> >to add the ability to use a visitor pattern
>
>Yes, we did consider the Visitor pattern. It didn't seem to fit the use
>cases we were considering at the time.
>
>The Gang Of Four book says one of the indications of when to use visitor is
>that you have many different interfaces intermixed and want to perform
>operations that depend on the conrete classes. In our case, we have a
>shared interface, Node, from which the others are subclassed, and the
>nodetypes are clearly self-identifying, so this advantage is largely
>negated.
>

I am not sure I agree with that. As soon as an application uses a class 
factory to create subclasses at runtime, I do not necessarily know the 
classes that I need to react on. This might be especially true if I have 
software that extensibly and generically operates on DOM structures 
generated from a variety of XML files.
The main advantage, as I see it, of the visitor pattern is to use  actual 
object type at runtime to make dispatching decisions, instead of relying on 
comparisons of runtime field values in objects with constants that I need to 
know in advance, at compile time.



>Outside of coding style, there really isn't a lot of difference between
>"traverse, switch, and call appropriate subroutine method" versus "visit,
>accept(), and call back to appropriate subroutine method". Performance is
>likely to actually be better with the switch, especially if the particular
>DOM implementation is using a single class for multiple nodetypes and
>switching internally to decide which behavior to apply.
>


I agree that you can implement the same functionality using a switch 
statement and using a visitor pattern. What I personally dislike about the 
switch statement is that I have to rely on information stored in a field to 
switch, instead of type information of the classes I am traversing.
If I never create my own classes for nodes (i.e. I rely on the basic DOM 
classes), this works well, but the approach tends to be fragile if I need to 
have application dependent node subclasses based for example on element 
types. In this case, I would much rather rely on the polymorphic mechanism 
of the language than on a switch statement that always needs to be 
maintained.




>Note that you could easily implement an ObjectStructure mechanism on top of
>the current Traversal objects which accepted Visitor objects and dispatched
>to them, with results essentially indistinguishable (for a basic DOM) from
>those of implementing ConcreteElement and accept() on the nodes.
>

This, I believe, is only true as long as I do not subclass the node classes. 
As soon as I do this, I need to hand-maintain the switch statement, which is 
what I want to avoid in the first place.




>The only real difference would arise if you wanted to change the
>dispatching. In the Visitor mechanism you would override that by
>subclassing the nodes and changing where accept() calls back to -- which
>actually is a significant DISADVANTAGE in that it risks breaking other
>visitors to the same data structure. In the Traverse-and-call-back
>alternative, the dispatching is encapsulated in that interface object --
>and hence you can subclass that and create an extended version without
>adverse affect on those operations which want to use the basic behavior.
>

I am not sure I understand you here. Do you refer to dispatch in node 
classes with children to be pre- or postorder traversal? If you do, the GoF 
book proposes to have the actual traversal of children being handled by the 
visitor, which is an easy way to avoide the fragility. It puts a little bit 
more burden on the visitor classes, although a lot can be done in a common 
base class.
It would still retain the main reason for the visitor pattern: To be able to 
defer decisions on dispatching to the runtime system, which can make the 
decision based on actual object type, instead of value fields that I have to 
know in advance.


>If you've got a specific use case that the Traverse/callback approach would
>not address, or if you can show that my concerns about fragility of Visitor
>as behaviors are extended are unfounded,  I'm certainly willing to
>reconsider this. I can see the aesthetic attraction of the Visitor pattern.


The main use case I am interested is DOM-based applications where the DOM 
classes are subclassed heavily to provide application-specific 
functionality. In this case, I believe, the case for visitors is very strong 
- using the inherent type information of the system is just much more robust 
and maintainable then relying on compile-time values.

This is aggravated in situations where the DOM engine in my application does 
not know about the classes it might encounter at runtime, because they are 
generated by a class factory, and they may even be supplied by plugin 
modules (together with the appropriate visitors for the set of classes to be 
used). Of course this could be overcome, as you describe, with traversal 
mechanisms that are also pluggable, but I, for my part, find that a 
relatively inelegant and unsafe solution.

>But I really think it's the wrong level of abstraction given the DOM's
>design and the realities of how the DOM is being used
>

I do not quite understand (although I would love to hear your thoughts about 
it) why the level of abstarction is wrong. Using field values to switch 
instead of using built-in polymorphism seems to me almost always a 
disadvantage, especially if providing the ability to use polymorphism does 
not disable by any means any other method a programmer might choose (for 
example switch statements on field values). Similar to the "goto" situation 
in procedural languages: as complexity rises it is much safer to rely on the 
procedural mechanism of the language runtime to handle the detailed 
dispatching.
In my opinion, I would feel more confident to use the DOM model as the base 
for the in-memory data structures of my application (which uses XML as its 
basic out-of-memory storage format) if I could avoid the use of explicit 
compile-time type information. As it stands, I know many authoring 
applications just use DOM for reading in and parsing XML, but translate into 
private in-memory representations. I think that DOM could be great for 
in-memory models even in very complex applications, if I could fit it easier 
with the dynamic object models I am using anyway.

What do you think?

Thanks again for your answer, I enjoy this discussion :-)




>[DOM WG: Do we need a FAQ on this topic?]
>
>______________________________________
>Joe Kesselman  / IBM Research
>

_________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.
Received on Tuesday, 20 March 2001 14:06:46 UTC