- From: Justin Wells <reader@semiotek.com>
- Date: Sat, 27 Jun 1998 02:32:36 -0400
- To: www-dom@w3.org
I've just looked over the most recent DOM. It seems like a good start, but
the design appears to be weak in quite a few places. No doubt this
is the result of an attempt to try to make conflicting views of a
document fit into the same model.
Here are a few of my concerns:
UNNECESSARILY WIDE TYPES
None of the containers in the DOM are typesafe. Most of them deal with
really high level abstractions like "Node". This complicates the interface
for the implementor, introduces the possibility of programmer error,
and creates a performance problem:
Implementation is more difficult since the API must perform
extra checks to determine whether an object actually has a
valid type when passed as an argument. (eg: setParent).
Programmer error is possible wherever an object of too abstract
a type is returned from a method; and programmers unknowingly
pass invalid types to methods.
A performance problem may be created by excessive type casting. Type
casts and calls to "instanceof" are not free in languages
such as Java, but instead are actually somewhat costly
operations.
It would be much better if argument and return values were specified as
narrowly as possible. This would reduce the number of dangerous casts
performed by users, reduce the number of error conditions that API
implementors must test for, and eliminate some of the performance
problems associated with run time type identification and casting.
The primary culrit in the DOM are the containers: things which hold
other objects routinely use "Node" as the type of their children. But
also methods such as getParent and setParent, getDocument, etc., return
a very abstract type (Node) even though that doesn't seem necessary.
INCORRECT ABSTRACTIONS
Narrowing the type scheme, however, wouldn't fit in well with the general
structure of the DOM as it stands now. "Node" appears to be overloaded
with two separate meanings: (1) a root object type for the DOM, (2) the
generic type for objects which can appear in an XML document.
Note that these are not the same thing, for example it doesn't make any
sense to have a Document inside an Element.
Node is at the wrong level of abstraction, and has been given far
too many responsibilities. It appears to be a class which exists for the sole
purpose of containing some method which didn't fit anywhere else. This is not
good.
It would be better to intercede an "XML_Node" class as the supertype of
Element, Text, PI, Comment, and possibly CDATA. This type plays
the role of "things that can be children". As such, it is the
natural type to use as arguments for methods that get and set
children in Element. Similarly, all XML_Node objects would have a
getParent() method that would naturally return an object of type Element.
The general principle here is to push methods and properties as far down
the inheritence tree as possible. This reduces the number of points in the
code where dangerous casts occur, and increases the exactness of the types
returns by a methods.
CONCATENATED APIS
The DOM has thrown together two obviously different views of the structure
of a document, without doing anything at all to integrate them. It appears
as though two separate APIs were simply concatenated together to produce
much of the content of the DOM.
It would be better to try and integrate these views. Having multiple views
makes the DOM seem unnecessarily complex, and will only confuse (and probably
scare away) newcomers. One of the goals of XML was to provide a simple
language that unsophisticated users could exploit. I think that the goal
of the DOM should be similar: provide a simple, straight forward,
unconfusing interface to objects.
Looking at the way Attributes are handled by Element is a good
example -- there are two totally different views here of what an
Attribute is, and how one should be manipulated.
In one version, attributes are directly manipulated through the Element API.
In the other version, you must get an intance of an AttributeList and
manipulate that. Either of these seem equally good to me, and clearly
if you had one you could implement the other trivially.
So why have both?
I can't think of any good reason. The argument, "Some of our WG members
have done it one way, and others have done it the other in their
existing product" doesn't fly. They're both going to have to go and
implement the other version in terms of what they already have in order
to support the DOM. Picking one version over the other reduces the
amount of work, overall, that must be done in order to have multiple
compliant implementations.
And more significantly: People starting from scratch now have a larger
(thus more time-consuming) project ahead of them if they wish to create
a DOM implementation.
DOCUMENT FRAGMENTS ARE UPSIDE DOWN AND TOO EARLY
Obviously the document fragment concept has been stuck into DOM because
people think its interface will be spelled out a bit more clearly in some
future version. But then it would be better just to leave it out of the
specification entirely until some future version.
What if it turns out that you have got it in the wrong place? By specifying
it now, you're only going to cause trouble later on.
I think that it may even be fairly likely that it is in the wrong place.
It seems to me that a Document is NOT a type of DocumentFragment. I am not
sure it is sensible to view this as an inheritence relationship at all. Or
if it is, it might make more sense to consider a DocumentFragment to be a
type of Document.
If none of this can be determined yet because none of it is well thought
out enough -- well then just leave the whole thing out. It can't hurt. It
won't break anything to introduce it in a future version as either an
extra interface above Document, or a new subclass of Document, or something
with no blood relationship to Document that just happens to know about a
Document.
The same comments apply to DocumentContext -- it has no useful methods as
yet, so it doesn't really make sense to include it in the API. Putting
it in now restricts future design choices, without providing any
current benefit.
XMLNode is tedious
Why are all these methods in XMLNode there asking me to specify, each and
every time I want to do something, whether I want entities expanded or not?
Why can't I just set this once and be done with it?
While I can see that I might want to expand some entities but not others,
I think I am unlikely to want to do this based on where I am in the tree.
Instead I am likely to want to do this based on what type the entity is.
These are not a big deal in the design sense. I just think that it's kind
of strange that I would have to keep passing it down. It's likely to make
programming with these methods somewhat tedious.
ATTRIBUTE LIST HAS SOME PROBLEMS
Also as a minor point, why has getSpecified()/setSpecified() crept in here when
all of the other DTD oriented material has been kept at bay? If the users of
Elements are to know whether the attribute was specified or not, shouldn't they
also know whether elements are required? Whether Text is parsed or not? I think
the best route is to leave ALL of this material out at this level, including
whether or not an attribute is "specified".
Secondly, I see no good reason for the "Attribute item(int index)" method
of AttributeList. It seems to raise some troubling questions, since it
fundamentally asserts that there is a well defined order to an attribute
list, and my understanding is that there is not. If you have a bunch of
implied attributes on an element, who is to say what order they should be
represented in? Unless this is a well defined concept, the Attribute API
should make no assumption that a list of attributes has an order.
Note that this method eliminate the use of a hashtable to represent
an attribute list.
Justin Wells
Semiotek Inc.
(I am subscribed and posting via my list reader account. To mail
me personally, address mail to the user justin at the same domain. I do
read my list mail, but I will not respond in as timely a manner to mail
addressed to reader. All this to avoid spam.)
Received on Saturday, 27 June 1998 02:32:15 UTC