Re: XML, DOM and "ignorable" whitespace from David Brownell on 2000-06-02 (www-dom@w3.org from April to June 2000)

From: David Brownell <david-b@pacbell.net>
Date: Fri, 2 Jun 2000 08:37:56 -0700
To: "DOM" <www-dom@w3.org>
Message-ID: <010b01bfcca8$85f967b0$6500000a@brownell.org>

> The short answer: No support until DOM Level 3.

... because everything related to creating a DOM tree remains
in the area of proprietary, or at least nonstandard, APIs.
Including policy about inclusion of ignorable whitespace.


> "Ignorable" whitespace is, unfortunately, a misnomer. The more useful
> concept is  whitespace-in-element-context (ie, whitespace that is not
> expected by the DTD grammar and hence not a meaningful part of the
> document's contents.)
> 
> The XML spec _requires_ that this whitespace be passed along by the XML
> Processor -- which is usually taken to mean the parser and the DOM.
                        ^^^^^^^                          ^^^^^^^^^^^
Usually-to-never ...

If you look at the requirements the XML spec places on what
an "XML Processor" does, you'll notice that parser APIs like
SAX2 are almost a perfect match.  But DOM doesn't add anything,
and has fundamental omissions (like error reporting, whitespace,
catalog hooks, and much more).

I've found it much more useful to view DOM as a library, which
can be used with or without an XML processor.  Notice that the
DOM implementations which are backed by databases have no real
need for an XML processor; they are easily populated and accessed
without any interactions with such a processor, either.

That perspective makes it a lot easier to use DOM, too ...
the presence or absence of annoying features like "ignorable"
whitespace, entity/entityRef nodes, CDATA nodes, and so on
is just a policy for filtering parser output when building a
DOM tree corresponding to some data syntax.

- Dave

Received on Friday, 2 June 2000 11:38:04 UTC