Re: New DOM Level 2 Working Draft from Stephen R. Savitzky on 1999-09-23 (www-dom@w3.org from July to September 1999)

From: Stephen R. Savitzky <steve@rsv.ricoh.com>
Date: 23 Sep 1999 14:46:36 -0700
To: www-dom@w3.org
Message-ID: <qcvh91uwqr.fsf@congo.crc.ricoh.com>
Congratulations!  After seeing the previous version I was very worried about
NodeIterator and TreeWalker; the present implementation definitely does the
right thing, defining them exactly the way an implementor would expect. 

Now for the criticisms:

1. The ownerElement method of Attr conflicts with the statement that
   "attributes are not part of the document tree".  Unless I'm missing
   something, ownerElement is essentially the same as parentNode, and means
   that Attr's cannot be shared. 

2. There doesn't seem to be any way to create an Iterator from a NodeList or
   NamedNodeMap.  In this regard it is also unfortunate that NamedNodeMap is
   not an extension of NodeList. 

3. Ideally, all methods that return NodeList should be deprecated and
   replaced by methods that return an equivalent Iterator.

4. A Document should be able to import a TreeWalker.

5. The semantics of nextSibling on TreeWalker are unclear if the next
   sibling exists but does not match the filter criteria.

6. There still doesn't seem to be any way to represent all of the
   information in a DTD. Among other things, shouldn't there be a "notation"
   attribute of "Attr"?  I believe that not including all of the DTD
   information violates the claim that "... For XML, this is specified by
   the W3C XML Information Set [Infoset].  The DOM is simply an API to this
   information set", since the XML Information Set includes the complete DTD.

And finally, one comment:

7. Consider the consequences if Iterator and TreeWalker added the following
   methods derived from attributes of the reference Node:

	unsigned short getNodeType()
	DOMString getNodeName()
	DOMString getNodeValue()
	NamedNodeMap getAttributes()
	NodeIterator getAttributeList()
	DOMString getNamespaceURI()
	DOMString getPrefix()
	DOMString getLocalName()

   It would then be possible to traverse and process a document of arbitrary
   size without ever instantiating the whole tree, and indeed without
   creating any Node objects at all!  (Of course, calling getNode() on such
   an iterator would force creation of a subtree, at least.)  

   This technique also neatly sidesteps all issues of structure sharing,
   liveness, lazy evaluation, and so on.  In effect, the entire document
   tree is lazily traversed.  It also has significant advantages in a system
   with limited memory (e.g. an embedded application), or in which large
   documents are accessed in document order over a network.

   My current document-processing application is based on a somewhat more
   restricted version of this kind of extended TreeWalker: it can only go
   forward (i.e. it lacks lastChild, lastSibling, previousSibling, and
   previousNode), and getNode is guaranteed only to return the equivalent of
   referenceNode.cloneNode(false).  For example, my parser uses this
   interface. 

   To avoid proliferating interfaces, it would be sufficient to extend the
   semantics of the backwards-going methods on TreeWalker to allow them to
   return null if the previous node was simply inaccessible rather than
   nonexistant. 

   My application also includes a ``virtual tree constructor'' interface
   that allows arbitrary-sized documents to be generated. in document order,
   again without necessarily instantiating any nodes -- it simply has all of
   the node-construction methods of Document, plus methods to start and end
   the children of the node under construction.  This interface is used both
   for tree construction and for output.

   Note that these two interfaces between them have exactly the same
   expressive power as the entire DOM (i.e. they can represent the same
   document information), but can be considerably simpler to implement.
   They can also be used in applications (limited memory, large documents,
   databases, streams (e.g. log files), networks, etc., for which the DOM is
   unsuited. 

Obviously I don't expect these interfaces to show up in DOM Level 2 (though
I wouldn't object if they did!), or indeed in _any_ future level; they
merely represent an alternative view of documents that works better for
certain kinds of application.

For the curious, the entire application mentioned above is available as open
source at <http://RiSource.org/PIA/>.

-- 
Stephen R. Savitzky  <steve@rsv.ricoh.com>  <http://rsv.ricoh.com/~steve/>
Quote of the month:  Death is nature's way of telling you to slow down.
Chief Software Scientist, Ricoh Silicon Valley, Inc. Calif. Research Center
 voice: 650.496.5710  front desk: 650.496.5700  fax: 650.854.8740 
  home: <steve@theStarport.org> URL: http://theStarport.org/people/steve/
Received on Thursday, 23 September 1999 17:47:14 UTC