Re: The DOM is not a model, it is a library! from keshlam@us.ibm.com on 1999-10-06 (www-dom@w3.org from October to December 1999)

From: <keshlam@us.ibm.com>
Date: Wed, 6 Oct 1999 13:35:48 -0400
To: www-dom@w3.org
Message-ID: <85256802.0060A921.00@D51MTA03.pok.ibm.com>
>A conforming DOM implementation will render this as  &lt;steve@rsv.ricoh.com>,
>defeating my intentions.

I presume you meant  <steve@rsv.ricoh.com> ... XML's definition says that
character entities are converted to characters as the document is read in.
Presumably the parser will have made that replacement before the DOM ever sees
it. Even if you define your own entities, the DOM has allowed the parser to
flatten entity references, so some parsers may expand those too before the DOM
has any say in the matter. If you want to preserve the distinction between &lt;
and the < character, you have to process your document as something other than
XML.

I don't know if HTML entity expansion follows similar rules; I'm not an
authority on HTML4 by any means. If it does, there's nothing the DOM can do
about it, and you simply have a proposed solution that doesn't work in today's
web tools. If they _do_ allow late resolution of these, there's no reason the
parser couldn't create EntityReference nodes in the DOM for LT and GT. Whether
your parser will agree to do that for you or not is outside the DOM's scope.

>The DOM has no way to represent the fact that the end tags have been
>omitted.

That's true. The solution I'd recommend for this, if it's important to you, is
to (a) use an output routine that understands how to omit end-tags, and (b)
extend the DOM by adding a custom flag to Elements which indicates whether in
this case you'd like the end-tag omitted (if possible) or not.

A similar flag could be used to distinguish "empty by accident" and "empty by
intention".

The distinction here is that you're extending your model implementation, but not
changing the behavior of its DOM API. The flags shouldn't affect normal DOM
operation, but will produce the right result when output. Code that wants to
take advantage of the custom features will of course be tied to your
implementation, but only that code is affected.

I'm sure that if we work hard enough we can come up with an example that really
does absolutely require alterations to the DOM API's behavior, but I think a lot
of these problems can be solved via either clever application of the existing
DOM or non-DOM behaviors attached to DOM nodes.

>Then I would have to rewrite my application to cast all nodes as StepenNode
>and test stephenType instead of nodeType.  It's ugly.

It's common practice for custom extensions on top of standard APIs... Though
this might also be an example of why I want to add the ability to attach
user-supplied objects to DOM nodes; that can be thought of as a kind of
"lightweight subclassing".

>But don't expect _every_ application to find it a good match.

Speaking only for myself, that's not my expectation. I do expect it to be a good
match for a wide range of applications, and an adequate match for others
bolstered by the benefits of modularity and code reuse. But I also expect there
to be problems for which a SAX stream is the right answer, others for which
simply processing the document as text makes more sense, and there may be some
where the concept of a document isn't a good fit in the first place.

______________________________________
Joe Kesselman  / IBM Research
Received on Wednesday, 6 October 1999 13:35:59 UTC