Re: The DOM is not a model, it is a library! from Stephen R. Savitzky on 1999-10-06 (www-dom@w3.org from October to December 1999)

From: Stephen R. Savitzky <steve@rsv.ricoh.com>
Date: 06 Oct 1999 08:38:44 -0700
To: Philippe Le Hegaret <plh@w3.org>
Cc: WWW DOM <www-dom@w3.org>
Message-ID: <qc1zb8sdmj.fsf@congo.crc.ricoh.com>
Philippe Le Hegaret <plh@w3.org> writes:

> Stephen R. Savitzky wrote:
> > THE DOM IS NOT AN OBJECT MODEL!  It is a specification (API) for a class
> > library.
> 
> In http://www.w3.org/TR/REC-DOM-Level-1/introduction.html 
> "The Document Object Model (DOM) is an application programming interface
>  (API) for HTML and XML documents."

QED.

> > The Infoset is much closer to being a real object model, in that it
> > specifies the necessary and sufficient set of interfaces that _any_
> > implementation of documents must, somehow, provide.
> 
> In http://www.w3.org/TR/xml-infoset#intro
> 
>  "This document specifies an abstract data set called the XML information set
> (Infoset), a description of the information available in a well-formed XML
> document"
> 
>  So, it's not _any_ implementation of documents, but _any_ implementation
> of XML documents.

Point taken.  It's still a lot closer to a general object model for
documents than the DOM is. 

> > It is impossible to create an arbitrary XML or HTML document, say inside of
> > an editor, and write it out as its author intended.
>   Do you have an example ?

Sure.  As a web author, I might want to attempt to foil spammers by
representing my e-mail address as &lt;steve&#x40;rsv.ricoh.com&gt; -- note
also the symmetric use of &lt; and &gt;.  A conforming DOM implementation
will render this as  &lt;steve@rsv.ricoh.com>, defeating my intentions.

To take another example, I may want HTML lists to be output in the
``traditional'' format with omitted end tags:

  <ol>
      <li>
      <li>
  </ol>

The DOM has no way to represent the fact that the end tags have been
omitted.  For various reasons I may wish to omit end tags in one place, but
keep them in in another (perhaps as a flag to some string-based Perl script
that modifies the file in some way).  This is a perfectly legitimate thing
to do in a text editor, but it's impossible in an editor based on the DOM.

Similarly, for stylistic reasons I may wish to distinguish between XML
elements that are declared as empty, and those that are not but are simply
empty ``by accident''.  I would represent the first as <foo/> and the second
as <bar></bar>.  This has no effect on the semantics of the document, of
course, but as an author using an editor it serves as an invaluable reminder
of which empty elements it is permissible to fill in later. 

> > There may be some documents that cannot be represented at all, perhaps
> > due to their size or to their dynamic nature.
>   If you mean a document which is not XML or HTML, you're right. It's out
> of the scope of the DOM.

No.  In the first case I mean a document which is too large for its tree to
fit in memory.  It may even be effectively infinite; for example, the output
of a process such as a web crawler. 

In the second case, I mean a document in which external entities may have
their value changed because of the actions of some other process.  Possibly
the simplest example of this is &time;, which I might want to reflect the
exact time when the entity is expanded.  Another example might involve an
external file.

>   But, if you really want to add your <% script %> node in the DOM,
> write an extension, it's very easy to do :
> 
> interface StephenNode : Node {
>    readonly attribute unsigned short stephenType;
> }
> 
> const unsigned short SCRIPT_NODE = 0;
> // ...
> 
> interface ScriptNode : StephenNode {
>    // whatever you want
> }
> 
>  I don't see a statement in the DOM about "you should not create your own
> extension based on the DOM core".

Then I would have to rewrite my application to cast all nodes as StepenNode
and test stephenType instead of nodeType.  It's ugly.

> > I need DTD's, 
> 
>   It's in our requirements.
>   See http://www.w3.org/TR/WD-DOM/requirements#ID-1072425801

But if I can't, within the specification, define new node types I can't
write experimental code that won't have to be rewritten if you finally get
around to fulfilling those requirements.  Also, a frozen set of node types
will influence you to use a brand-new interface class that doesn't descend
from Node (which has already been done for CSSRule and CSSRuleList).  Why
aren't these descended from Node?

> > I need SGML
> 
>   It's out of our scope.
>   See http://www.w3.org/TR/REC-DOM-Level-1/introduction.html

Exactly.  The scope of the DOM is too limited; I need an object model that
can be extended to handle other situations and still be compliant with its
specification. 

> > I need late-bound entities and entity references without content, I need
> > application-specific, strongly-typed metadata
>   Once again, it's in our list. But How can we address stronglgy-typed
> metadata without a recommandation ? The XML Schema datatype is not yet
> a recommandation :
>   http://www.w3.org/TR/xmlschema-2/

If you had the ability to define new application-specific node types, you
could simply add

  attribute NodeList metadata;

to Node and let the application take care of it. 

> > I need the ability to stream large documents through a document processor
> > with limited memory, and so on.
> 
> In http://www.w3.org/TR/REC-DOM-Level-1/introduction.html 
> 
>   "One important objective for the Document Object Model is to provide a
> standard programming interface that can be used in a wide variety of
> environments and applications."
>   Our main goal is interoperability, not memory. But if we can have both,
> it's better.

This is exactly my point.  It's no longer possible to have both -- the DOM
has taken a memory-intensive path in order to provide a rich interface.
There needs to be an alternative for those of us who want to make different
design decisions without giving up compliance with _some_ non-DOM standard.

> > It is far too late to rename DOM -> Browser Scripting Document API, but it
> > would have been far more accurate.
> 
>  Browsers represent 10% in the number of participants in the DOM WG. There
> are several implementations of DOM in Java, C++, Delphi, Perl, Python, C. The
> DOM is definitively not only a Browser Scripting Document API.
>  Browser scripting is one of our goals, but not the only one.

It is the _reference_ goal.  Whatever the DOM becomes, one of its ironclad
requirements is that it has to remain the document-processing API for
Javascript.  Everything else may be subject to reconsideration, but not
that.  That's as it should be: Javascript needs an API for documents, the
DOM is it, and if any other application finds it useful, that's great.

But don't expect _every_ application to find it a good match.

-- 
Stephen R. Savitzky  <steve@rsv.ricoh.com>  <http://rsv.ricoh.com/~steve/>
Platform for Information Applications:      <http://RiSource.org/PIA/>
Chief Software Scientist, Ricoh Silicon Valley, Inc. Calif. Research Center
 voice: 650.496.5710  front desk: 650.496.5700  fax: 650.854.8740 
  home: <steve@theStarport.org> URL: http://theStarport.org/people/steve/
Received on Wednesday, 6 October 1999 11:39:21 UTC