- From: Stephen R. Savitzky <steve@crc.ricoh.com>
- Date: 20 Jul 1998 15:52:46 -0700
- To: www-dom@w3.org
On the whole the new DOM specification seems to be a great improvement, but I'm sorry to see the DTD stuff go. My group has been spending the last several months building a large document-processing application using the DOM as its basis; we had to make several extensions to the core to get it to handle generic SGML. I think it's very important to be able to represent _any_ SGML document using the DOM core. The reason for this is simple: if any SGML document can be represented by the core, extending the DOM for specific document types (e.g. HTML) becomes a convenience rather than a necessity. (And by the way, this appears to be why the core is now sufficient to represent XML.) It would mean that _any_ document would be representable without having to create a new, extended API for it, and would make it much easier to produce SGML-to-XML conversion utilities (for example). The main missing node type, I believe, is Declaration. A few more specific notes: o There is no type-safe way to convert a Node to any of its major subclasses. The newly-added nodeName, nodeValue, and attributes attributes help a great deal, but it would be good to have conversion methods as well. We have, e.g., "asElement", which returns the node if it is an Element, null otherwise. This is _much_ more efficient than casting, which involves run-time type checking in Java. o I'm very sorry to see NodeIterator, TreeIterator, and their create methods disappear. It's easier to create iterators when you know the type (and hence the implementation details) of the objects you're iterating over; the resulting type-specific iterator can much more efficient than a generic one, and its class need not be exposed to the programmer. o If the nodeName of an Element is the tagname, why do you need the getTagName method? (One possible justification is for HTMLElement, where the tag name is supposed to be returned in uppercase.) o There doesn't seem to be any way to distinguish an HTMLCollection indexed by name from one indexed by ID. In any case, shouldn't the item and namedItem methods return HTMLElement rather than Node? o This is closely related to the more generic problem of NamedNodeMap not permitting what the specification calls `aliasing'. You probably need both NamedNodeMap and a more generic associative array. o Making the parent of an Attribute refer to the Element that contains it is almost certainly a mistake when coupled with the idea that the value of the attribute is its children. We tried it. The problem is that when an attribute has a default value, you have to copy the entire tree from the DTD to each Element where the attribute appears. The best solution would be to return the effective value of an Attribute as a NodeList rather than as a wstring. o A similar (but worse) problem occurs with EntityReference. o Because of the previous two problems, it would be best if there were two different ways of getting a node's value: as a wstring and as a NodeList. (Perhaps two attributes: nodeValue (the NodeList) and nodeData (the String). This would even make sense for Text; the nodelist could have character entity references unexpanded, which would significantly simplify output conversion of Text nodes.) Finally, I note that there are no comments in the Java bindings. While this is well-optimized for the appendix to the specification, it would be best if the compiled version (javabindings.zip) had the comments, so that JavaDoc and other documentation-extraction and source-code-browsing software could make use of them. -- Stephen R. Savitzky Chief Software Scientist, Ricoh Silicon Valley, Inc., <steve@rsv.ricoh.com> California Research Center voice: 650.496.5710 fax: 650.854.8740 URL: http://rsv.ricoh.com/~steve/ home: <steve@starport.com> URL: http://www.starport.com/people/steve/
Received on Monday, 20 July 1998 18:48:51 UTC