Re: "Empty" Text Nodes from Arkin on 1999-03-01 (www-dom@w3.org from January to March 1999)

From: Arkin <arkin@trendline.co.il>
Date: Mon, 01 Mar 1999 14:57:59 -0500
To: Lauren Wood <lauren@sqwest.bc.ca>
CC: www-dom@w3.org
Message-ID: <36DAF147.35E0DFF4@trendline.co.il>

Lauren Wood wrote:
> 
> David Brownell wrote:
> >
> > I'd say it's clear that as written, DOM is attached to the application
> > rather than to the "XML Processor" (not parser!) level.
> 
> Terminology time: the DOM talks about the hosting implementation
> (e.g. browser, editor, server) and the client application (e.g. the
> script). The DOM defines the interface between these (some if it, at
> least). The DOM does not define what an XML parser/processor does;
> it can't define whether an XML processor chooses to pass on
> ignorable whitespace, for example, or comments. I could imagine a
> browser that does not pass on comments, or that completely expands
> entity references before the DOM tree is even built, so that the DOM
> interfaces have no idea that the comments or parsed entities were
> present in the original source document.

To quote the XML specification:

"An XML processor must always pass all characters in a document that are
not markup through to the
application. A validating XML processor must also inform the application
which of these characters
constitute white space appearing in element content."

The application should somehow learn when whitespace appearing in the
element content is significant or not, and potentially be able to
control it as it is delivered from the parser. The XML specification
does not cover that because it does not describe the API, the DOM does
not cover that because it does not cover the parser/processor, and the
SAX API only covers that as far as document tree building goes.

In the end, this is an option question that is handled differently by
various parsers. This results in my code not working with your parser
and vice versa, a common intergration problems that manifest in the
software industry.

I do believe the DOM specification should clear up some issues that also
cover the parser/processor or at least an expected default behavior. It
should claim that such spaces are delivered or ignored by a conforming
default implementation, so if I just pick a parser at random for my
application, or get the tree from some third-party code, I can expect
consistent behavior.

Arkin

Received on Monday, 1 March 1999 15:03:53 UTC