critique of WD-DOM-19980416

I've just looked over the most recent DOM. It seems like a good start, but 
the design appears to be weak in quite a few places. No doubt this
is the result of an attempt to try to make conflicting views of a
document fit into the same model.

Here are a few of my concerns:


UNNECESSARILY WIDE TYPES

None of the containers in the DOM are typesafe. Most of them deal with 
really high level abstractions like "Node". This complicates the interface
for the implementor, introduces the possibility of programmer error,
and creates a performance problem:

      Implementation is more difficult since the API must perform 
      extra checks to determine whether an object actually has a 
      valid type when passed as an argument. (eg: setParent).
      
      Programmer error is possible wherever an object of too abstract
      a type is returned from a method; and programmers unknowingly 
      pass invalid types to methods. 

      A performance problem may be created by excessive type casting. Type
      casts and calls to "instanceof" are not free in languages
      such as Java, but instead are actually somewhat costly
      operations.

It would be much better if argument and return values were specified as 
narrowly as possible. This would reduce the number of dangerous casts 
performed by users, reduce the number of error conditions that API 
implementors must test for, and eliminate some of the performance 
problems associated with run time type identification and casting.

The primary culrit in the DOM are the containers: things which hold 
other objects routinely use "Node" as the type of their children. But 
also methods such as getParent and setParent, getDocument, etc., return
a very abstract type (Node) even though that doesn't seem necessary.



INCORRECT ABSTRACTIONS

Narrowing the type scheme, however, wouldn't fit in well with the general
structure of the DOM as it stands now. "Node" appears to be overloaded 
with two separate meanings: (1) a root object type for the DOM, (2) the 
generic type for objects which can appear in an XML document. 

Note that these are not the same thing, for example it doesn't make any 
sense to have a Document inside an Element.

Node is at the wrong level of abstraction, and has been given far 
too many responsibilities. It appears to be a class which exists for the sole 
purpose of containing some method which didn't fit anywhere else. This is not 
good. 

It would be better to intercede an "XML_Node" class as the supertype of 
Element, Text, PI, Comment, and possibly CDATA. This type plays
the role of "things that can be children". As such, it is the
natural type to use as arguments for methods that get and set
children in Element. Similarly, all XML_Node objects would have a 
getParent() method that would naturally return an object of type Element.

The general principle here is to push methods and properties as far down 
the inheritence tree as possible. This reduces the number of points in the 
code where dangerous casts occur, and increases the exactness of the types 
returns by a methods.


CONCATENATED APIS

The DOM has thrown together two obviously different views of the structure
of a document, without doing anything at all to integrate them. It appears
as though two separate APIs were simply concatenated together to produce 
much of the content of the DOM. 

It would be better to try and integrate these views. Having multiple views
makes the DOM seem unnecessarily complex, and will only confuse (and probably
scare away) newcomers. One of the goals of XML was to provide a simple 
language that unsophisticated users could exploit. I think that the goal 
of the DOM should be similar: provide a simple, straight forward, 
unconfusing interface to objects. 

Looking at the way Attributes are handled by Element is a good
example -- there are two totally different views here of what an
Attribute is, and how one should be manipulated.

In one version, attributes are directly manipulated through the Element API.
In the other version, you must get an intance of an AttributeList and 
manipulate that. Either of these seem equally good to me, and clearly 
if you had one you could implement the other trivially. 

So why have both?

I can't think of any good reason. The argument, "Some of our WG members 
have done it one way, and others have done it the other in their 
existing product" doesn't fly. They're both going to have to go and 
implement the other version in terms of what they already have in order
to support the DOM. Picking one version over the other reduces the 
amount of work, overall, that must be done in order to have multiple 
compliant implementations. 

And more significantly: People starting from scratch now have a larger 
(thus more time-consuming) project ahead of them if they wish to create
a DOM implementation.



DOCUMENT FRAGMENTS ARE UPSIDE DOWN AND TOO EARLY

Obviously the document fragment concept has been stuck into DOM because 
people think its interface will be spelled out a bit more clearly in some
future version. But then it would be better just to leave it out of the 
specification entirely until some future version.

What if it turns out that you have got it in the wrong place? By specifying
it now, you're only going to cause trouble later on.

I think that it may even be fairly likely that it is in the wrong place. 
It seems to me that a Document is NOT a type of DocumentFragment. I am not 
sure it is sensible to view this as an inheritence relationship at all. Or 
if it is, it might make more sense to consider a DocumentFragment to be a
type of Document.

If none of this can be determined yet because none of it is well thought
out enough -- well then just leave the whole thing out. It can't hurt. It
won't break anything to introduce it in a future version as either an 
extra interface above Document, or a new subclass of Document, or something
with no blood relationship to Document that just happens to know about a 
Document.

The same comments apply to DocumentContext -- it has no useful methods as
yet, so it doesn't really make sense to include it in the API. Putting 
it in now restricts future design choices, without providing any 
current benefit.



XMLNode is tedious

Why are all these methods in XMLNode there asking me to specify, each and
every time I want to do something, whether I want entities expanded or not?
Why can't I just set this once and be done with it?

While I can see that I might want to expand some entities but not others, 
I think I am unlikely to want to do this based on where I am in the tree.
Instead I am likely to want to do this based on what type the entity is.

These are not a big deal in the design sense. I just think that it's kind 
of strange that I would have to keep passing it down. It's likely to make
programming with these methods somewhat tedious.



ATTRIBUTE LIST HAS SOME PROBLEMS

Also as a minor point, why has getSpecified()/setSpecified() crept in here when
all of the other DTD oriented material has been kept at bay? If the users of 
Elements are to know whether the attribute was specified or not, shouldn't they
also know whether elements are required? Whether Text is parsed or not? I think
the best route is to leave ALL of this material out at this level, including
whether or not an attribute is "specified".

Secondly, I see no good reason for the "Attribute item(int index)" method
of AttributeList. It seems to raise some troubling questions, since it 
fundamentally asserts that there is a well defined order to an attribute
list, and my understanding is that there is not. If you have a bunch of 
implied attributes on an element, who is to say what order they should be
represented in? Unless this is a well defined concept, the Attribute API 
should make no assumption that a list of attributes has an order.

Note that this method eliminate the use of a hashtable to represent
an attribute list.


Justin Wells
Semiotek Inc. 

(I am subscribed and posting via my list reader account. To mail
me personally, address mail to the user justin at the same domain. I do 
read my list mail, but I will not respond in as timely a manner to mail
addressed to reader. All this to avoid spam.)

Received on Saturday, 27 June 1998 02:32:15 UTC