The DOM is not a model, it is a library!

I am finally beginning to understand what Arnaud Le Hors <lehors@w3.org>
meant when he wrote:

> Jeff Mackay wrote:
> > 
> > Are implementors allowed to extend the NodeType and Exception lists?
> 
> Implementors can do whatever they want. However, the whole purpose of
> the DOM is to provide users with an interoperable API. Implementing
> and/or using any extension makes this pretty useless.

THE DOM IS NOT AN OBJECT MODEL!  It is a specification (API) for a class
library.  Specifically it is the API for the class library of Javascript.
The Infoset is much closer to being a real object model, in that it
specifies the necessary and sufficient set of interfaces that _any_
implementation of documents must, somehow, provide.

An object model is the product of the analysis phase of an OO project; an
API is the product of the design phase.  An API is the specification for a
library, which is the product of the implementation phase.

In a library's specification, like the Java class library or the DOM, it is
vitally important to specify a complete set of functions and to nail down
the implementation as much as possible, so that application-writers have a
rich set of operations with semantics they can count on.  That's what an API
(the programmer's view of a library) is all about.  Extensions and
experimentation are out of place in this context.  The typical application-
writer is using a canned library supplied by a language vendor, and expects
a consistent environment on every platform of interest.  That's a good
thing.

In a _real_ object model like a GUI toolkit or the Infoset, on the other
hand, it is important to provide only the _minimum_ interface, and to
constrain the implementation as _little_ as possible, so as to provide for
the widest possible range of applications.  A real object model is
essentially the basis for a framework; extensions are allowed for and indeed
expected.  With a framework, application writers are expected to get their
hands dirty and at least _look_ at the code, if not modify it; usually the
application and the classes that implement the object model are written by
the same person or group.  The object model's main function is to ensure
that no important details are left out of the implementation.  That's a good
thing, too, but it's a _different_ thing.  


An object model is a specification, just as an API is, but at a different
level.  It is further removed from the implementation, and is not directly
useable by an application-writer.


A good object model simply defines the set of interfaces that are necessary
and sufficient in order to to _represent_ the data (documents, in this case)
being modeled -- the objects and attributes that any implementation must
provide, and that any application can count on having available.  The object
model, in other words, specifies the objects' attributes, very little about
their behavior and as little as possible their implementation.

An object model should make no claims about whether nodelists are ``live''
or static, about whether or not nodes can be freely moved between documents,
about whether documents may be traversed in any particular order, about
whether structure can be shared, or whether a node remains accessible after
an application has abandoned all references to it.  It should simply ensure
that, if you are looking at a node in a document, you can tell what sort of
node it is and determine _all_ of the relevant information about it.


The DOM, by contrast, makes no real attempt to be a complete object model
for documents.  Converting a document to a DOM tree loses information; it is
no longer possible to recover the original document.  It is impossible to
create an arbitrary XML or HTML document, say inside of an editor, and write
it out as its author intended.  There may be some documents that cannot be
represented at all, perhaps due to their size or to their dynamic nature.

There are many plausible representations for documents that do not conform
to the DOM but are nevertheless useful, and which would benefit from a
unifying standard to guide their implementors.  In fact, the DOM itself
would have benefitted greatly from such a standard, not to mention a
vigorous application of Occam's Razor.


All of this suggests that, for my own sanity and for the sake of my
application, I should probably abandon any hope or pretense of using the
DOM.  I need DTD's, I need SGML, I need late-bound entities and entity
references without content, I need application-specific, strongly-typed
metadata, I need the ability to stream large documents through a document
processor with limited memory, and so on.  For the near term I will continue
to base my application on my partial implementation of the DOM, and because
of its architecture it will always be able to manipulate DOM trees, but
eventually my internal representation will cease to look anything like the
DOM.  As the official Javascript class library for browsers, the DOM is
simply irrelevant for an XML-based extensible server.

Most of my comments in this list over the last year or so have been based on
the mistaken belief that the DOM was an object model.  I think a great deal
of confusion could have been avoided if the introduction clearly stated
that, although the DOM may be moderately language-neutral, it is far from
implementation neutral and that the primary goal is to provide a stable
class library optimized for a certain specific class of applications.  The
reference set of applications should be specified -- applications outside
this set _might_ be able to use the DOM, but if their requirements differ
from those of the reference set their needs will simply not be considered.
It should be made _very_ clear that extensions of any sort are not
encouraged, perhaps not even permitted, and that implementors with a
different set of requirements should seek elsewhere.


It is far too late to rename DOM -> Browser Scripting Document API, but it
would have been far more accurate.

I think that a document object model (note the indefinite article and
lower-case letters) would be a good idea, and I will gladly support and
contribute to an effort to construct one.

-- 
Stephen R. Savitzky  <steve@rsv.ricoh.com>  <http://rsv.ricoh.com/~steve/>
Platform for Information Applications:      <http://RiSource.org/PIA/>
Chief Software Scientist, Ricoh Silicon Valley, Inc. Calif. Research Center
 voice: 650.496.5710  front desk: 650.496.5700  fax: 650.854.8740 
  home: <steve@theStarport.org> URL: http://theStarport.org/people/steve/

Received on Tuesday, 5 October 1999 14:51:35 UTC