Re: The DOM is not a model, it is a library! from Stephen R. Savitzky on 1999-10-05 (www-dom@w3.org from October to December 1999)

From: Stephen R. Savitzky <steve@rsv.ricoh.com>
Date: 05 Oct 1999 13:55:32 -0700
To: "Michael Champion" <michael_champion@ameritech.net>
Cc: <www-dom@w3.org>
Message-ID: <qcd7utsf23.fsf@congo.crc.ricoh.com>
"Michael Champion" <michael_champion@ameritech.net> writes:

> > THE DOM IS NOT AN OBJECT MODEL!  It is a specification (API) for a class
> > library.  Specifically it is the API for the class library of Javascript.
> > The Infoset is much closer to being a real object model, in that it
> > specifies the necessary and sufficient set of interfaces that _any_
> > implementation of documents must, somehow, provide.
> 
> That's quite true. On the other hand, the Infoset activity came into being
> partly because the DOM working group struggled with the fact -- which was
> not clear to us a couple of years ago -- that XML itself did not define a
> specific object model (in your sense, the result of an analysis phase that
> defines the basic set of objects and relationships, but not the API).

The problem is that all of the DOM literature, including the introduction,
suggests that the DOM is a great deal more universal than it actually is.
It is no longer a basic set of objects and relationships, it's just an API.

> > For the near term I will continue
> > to base my application on my partial implementation of the DOM, and
> because
> > of its architecture it will always be able to manipulate DOM trees, but
> > eventually my internal representation will cease to look anything like the
> > DOM.
> 
> I think this is more or less what EVERYONE who's implementing the DOM on top
> of an existing application does.

That's not what I was doing.  What I was trying to do was take an existing
document-processing application and make it use the DOM as its API for
documents.  This included changing my internal representation to one
matching the DOM's expectations, and changing all my interfaces to an
extension of the DOM.  It turns out that this represented a disastrous
mismatch. 

My goal is to totally revamp my internal interfaces so that that nobody will
mistake what I have for an implementation of the DOM, which it never really
was.  I will be able to read and create DOM documents, but I won't ever have
a representation for them, and I won't be trying to maintain a set of
classes that attempt to implement the DOM interfaces. 

> There is no assumption, or even suggestion, that the DOM interfaces should
> look anything like the underlying data structures.

That's true in theory, but not in practice.  The DOM is specified in a way
that makes the choice of underlying data structures quite narrow if you want
anything approaching efficiency.  In some cases it's probably
overconstrained to the point where no efficient implementation exists.

> That's what distinguishes one DOM implementation from another -- the
> suitability of the internal representation and manipulation of the XML
> data for some specific type of task, be it browsing, editing, workflow,
> database, etc.  It makes sense to me to have a common API for common
> operations in disparate applications, but it makes no sense to me to try
> to find a common INTERNAL representation that meets such a range of
> requirements.

The DOM's API is insufficient for many of these tasks, and overspecified for
the rest.  Just to take one example, an editor needs access to the DTD, and
it needs to be able to modify entity definitions.  It needs to be able to
redefine the default value of an attribute.  It may need to let the author
specify whether to leave off a redundant end tag in HTML.  And on and on. 

Sure, I can do all of these things using the DOM _interfaces_, and I do, but
my implementation does not conform to the DOM _specification_, and there's
no way that it can or ever will.

> > The
> > reference set of applications should be specified -- applications outside
> > this set _might_ be able to use the DOM, but if their requirements differ
> > from those of the reference set their needs will simply not be considered.
> > It should be made _very_ clear that extensions of any sort are not
> > encouraged, perhaps not even permitted, and that implementors with a
> > different set of requirements should seek elsewhere.
> 
> Here I pretty much totally disagree. How could one usefully specify a
> "reference set of applications" in an industry that's evolving as rapidly as
> the Internet?  Applications, and even the terms "e-commerce" and "portal"
> were not in common use when work on the DOM began, but I submit that the DOM
> is very relevant in both these areas.

That's not what I mean by the reference set.  I mean the types of programs
for which the API is suitable: scripting languages, mostly in browsers,
operating on a limited number of small documents all of which use the same
well-known DTD.  The DOM contains many components, features, and behavioral
constraints that are not merely useless but totally wrong for applications
that involve databases, large documents, small memory, streaming, editing,
and so on.

> Virtually all DOM implementors have added extensions (and I respectfully
> disagree with Arnaud that this is "useless"), although this obviously
> limits interoperability.

It's Arnaud's view that seems to have prevailed.  The DOM is pervaded by its
need to be the specification for a library.  Interoperability is a great
thing, but it limits the range of applications.

Even extensions don't work, because most of the things I want to do require
_removing_ functionality and constraints from the DOM, not extending it by
adding more.  I don't want to add something like ownerElement to Attr, I
want to allow structure sharing wherever it makes sense.  I don't want magic
children on EntityReference, I want firstChild to return null so that I can
look up the Entity myself.  I don't want the insane machinery that goes
along with live nodelists.  I want to be able to use a NodeIterator to
iterate through the rules of a style sheet.  And so on.

> As many people have noted, one must use proprietary extensions to the DOM
> to load and save data!  I assure everyone that it was explicitly assumed
> that implementors would extend the DOM to provide access to operations --
> such as DTD navigation -- that the DOM doesn't (yet) support.

I have seen no evidence of this, and the DOM has consistently evolved to
make this kind of thing less and less feasible.  Putting it another way, the
DOM evolves by _adding_ features and constraints.  Each new addition further
constrains the set of feasible implementations.

  <aside> This needs proof, of course, or at least support.  OK:

          Adding ownerElement precludes structure sharing for Attr nodes,
          constraining the implementation.  It is impossible to _add_
          anything that _relaxes_ an existing constraint, so the set of
          constraints must be monotonically increasing.
  </aside>

> All I suggest is that implementors who have functionality that overlaps
> the DOM functionality provide DOM interfaces, add proprietary APIs where
> their functionality exceeds the DOM's, and work with the W3C to
> standardize the APIs as the base of overlapping functionality grows.

The problem is that overlap isn't good enough.  Every time some new
constraint gets added to the DOM, some piece of your implementation is going
to become non-compliant.  Then you have to explain to whoever is maintaining
the application that, no, the interfaces are DOM level 1 with a couple of
extensions, but don't use getElementsByTagName because it's hopelessly
inefficient. 

The DOM is OK if you are building an application from scratch that can live
within the DOM's constraints, and you have an existing DOM implementation to
start with.  If you need to relax any of the DOM's constraints it's a
mistake to base your implementation on the DOM's interfaces (as I did),
because it creates the expectation that you will eventually come up with a
full implementation.

-- 
Stephen R. Savitzky  <steve@rsv.ricoh.com>  <http://rsv.ricoh.com/~steve/>
Platform for Information Applications:      <http://RiSource.org/PIA/>
Chief Software Scientist, Ricoh Silicon Valley, Inc. Calif. Research Center
 voice: 650.496.5710  front desk: 650.496.5700  fax: 650.854.8740 
  home: <steve@theStarport.org> URL: http://theStarport.org/people/steve/
Received on Tuesday, 5 October 1999 16:56:06 UTC