Re: DOM Level 1 doesn't do everything

Lauren Wood <lauren@sqwest.bc.ca> writes:

> At 28/07/98 11:56 AM , Stephen R. Savitzky wrote:
> 
> >My point is that there is more than one kind of application.  Applications
> >that _consume_ XML files do not need to "see" entity references.  However,
> >applications that _produce_ or _edit_ XML files _must_ be able to manipulate
> >entity references directly. 
> 
> An XML editor can add to the specification. You're not limited to the Level
> 1 DOM interfaces.

The specification clearly states: 

  EntityReference objects are inserted into the DOM by the XML processor
  whenever the processor sees a reference to an entity other than the
  pre-defined character entities in the XML specification. The replacement
  value, if it is available, will appear in the child list of the
  EntityReference.

One might hedge this by saying that the replacement value of an entity is
``not available'' to an editor even if it's in the DTD, but that's a pretty
severe distortion of the spec.  The requirement to replace and re-insert
character entities is also a pain in the neck.  We're ignoring it, of
course, in the interests of efficiency, but it's another distortion, and
another reason our application simply doesn't conform.

> >What's even worse, an XML editor cannot make use of an XML parser or DOM
> >class library that conforms to the specification (unless I have seriously
> >misread something).  I think that this fact should be sufficient to prove
> >that the specification is inadequate.

> I don't think that this is true - my company is working on an XML editor,
> for example, and there are other companies working on the DOM specification
> that are also working on XML or SGML editors. It is more likely that we
> haven't documented it all properly. (But of course, it's not impossible
> that we completely missed something vital).

So your editor lets you define entities in the DTD and put corresponding
references in your document.  What does it do when it encounters a reference
in the course of saving or displaying the document?  The only correct thing
according to the spec would appear to be to replace it.  (I can see
replacing the entity when you do a search; this just demonstrates that both
behaviors have to be permitted.)  

It would be sufficient to mention in the spec that not every application is
_required_ to put the replacement value of every EntityReference in its
children, nor to invisibly replace references to character entities.

> We know that an XML editor would need to add functionality to the DOM
> interfaces to be able to be an editor. And it is planned to add this
> functionality to the DOM as soon as we can. 

The problem is not _adding_ functionality to the DOM, but _removing_ it.  In
the interests of efficiency and of correct behavior in a document-processing
application, we're simply _not doing_ many of the things the specification
says we have to do.  If they were optional instead of mandatory, we would be
conforming to the spec.  

We will be able to implement every interface (they're a subset of what we
need), but not all of the methods will have the specified behavior.  That's
probably what you are doing, too.

> But for now, we need to get the current spec out, limited though it may be
> in functionality. We can always add functionality later. It's harder to
> undo the mistakes that would likely happen if we tried to do too much too
> fast. 

This is correct, and what I'm saying is that there is specified behavior that
is both difficult to implement and incorrect for some applications.  Remove
it, and add it back _correctly_ later.

> No DOM implementation is *limited* to the DOM interfaces - you are free to
> add what you think you need. When we start work on the rest of the DOM, it
> would be good if you could then share with us any information as to what
> worked, and what didn't, in your implementations. And what you think we
> should add, or not add.

I'll be glad to.  There's an excellent chance (>90% probability) that our
application will be released as open source later this year.  At that point
we'll have 50-100K lines of Java code to share, and some interesting war
stories. 

-- 
 Stephen R. Savitzky   Chief Software Scientist, Ricoh Silicon Valley, Inc., 
<steve@rsv.ricoh.com>                            California Research Center
 voice: 650.496.5710   fax: 650.854.8740    URL: http://rsv.ricoh.com/~steve/
  home: <steve@starport.com> URL: http://www.starport.com/people/steve/

Received on Tuesday, 28 July 1998 16:43:32 UTC