Re: latest DOM spec 19980720

On the whole the new DOM specification seems to be a great improvement, but
I'm sorry to see the DTD stuff go.  My group has been spending the last
several months building a large document-processing application using the
DOM as its basis; we had to make several extensions to the core to get it to
handle generic SGML.  I think it's very important to be able to represent
_any_ SGML document using the DOM core. 

The reason for this is simple: if any SGML document can be represented by
the core, extending the DOM for specific document types (e.g. HTML) becomes
a convenience rather than a necessity.  (And by the way, this appears to be
why the core is now sufficient to represent XML.) It would mean that _any_
document would be representable without having to create a new, extended API
for it, and would make it much easier to produce SGML-to-XML conversion
utilities (for example).  The main missing node type, I believe, is
Declaration.

A few more specific notes:

o There is no type-safe way to convert a Node to any of its major
  subclasses.  The newly-added nodeName, nodeValue, and attributes
  attributes help a great deal, but it would be good to have conversion
  methods as well.  We have, e.g., "asElement", which returns the node if it
  is an Element, null otherwise.  This is _much_ more efficient than
  casting, which involves run-time type checking in Java. 

o I'm very sorry to see NodeIterator, TreeIterator, and their create methods
  disappear.  It's easier to create iterators when you know the type (and
  hence the implementation details) of the objects you're iterating over;
  the resulting type-specific iterator can much more efficient than a
  generic one, and its class need not be exposed to the programmer.

o If the nodeName of an Element is the tagname, why do you need the
  getTagName method?  (One possible justification is for HTMLElement, where
  the tag name is supposed to be returned in uppercase.)

o There doesn't seem to be any way to distinguish an HTMLCollection indexed
  by name from one indexed by ID.  In any case, shouldn't the item and
  namedItem methods return HTMLElement rather than Node?

o This is closely related to the more generic problem of NamedNodeMap not
  permitting what the specification calls `aliasing'.  You probably need
  both NamedNodeMap and a more generic associative array.

o Making the parent of an Attribute refer to the Element that contains it is
  almost certainly a mistake when coupled with the idea that the value of
  the attribute is its children.  We tried it.  The problem is that when an
  attribute has a default value, you have to copy the entire tree from the
  DTD to each Element where the attribute appears.  The best solution would
  be to return the effective value of an Attribute as a NodeList rather than
  as a wstring.  

o A similar (but worse) problem occurs with EntityReference. 

o Because of the previous two problems, it would be best if there were two
  different ways of getting a node's value: as a wstring and as a NodeList.
  (Perhaps two attributes: nodeValue (the NodeList) and nodeData (the
  String).  This would even make sense for Text; the nodelist could have
  character entity references unexpanded, which would significantly simplify
  output conversion of Text nodes.)

Finally, I note that there are no comments in the Java bindings.  While this
is well-optimized for the appendix to the specification, it would be best if
the compiled version (javabindings.zip) had the comments, so that JavaDoc
and other documentation-extraction and source-code-browsing software could
make use of them. 

-- 
 Stephen R. Savitzky   Chief Software Scientist, Ricoh Silicon Valley, Inc., 
<steve@rsv.ricoh.com>                            California Research Center
 voice: 650.496.5710   fax: 650.854.8740    URL: http://rsv.ricoh.com/~steve/
  home: <steve@starport.com> URL: http://www.starport.com/people/steve/

Received on Monday, 20 July 1998 18:48:51 UTC