Re: "ignorable white space" from Eric Richardson on 2000-06-12 (www-dom@w3.org from April to June 2000)

From: Eric Richardson <maxwell@telesoft.com>
Date: Mon, 12 Jun 2000 10:35:00 -0700
To: DOM <www-dom@w3.org>
Message-ID: <39451F44.8FD9D287@telesoft.com>

keshlam@us.ibm.com wrote:

> >DOM Level 3 calls for:
> >
> >     whitespace in element content (aka "ignorable
> >     whitespace"). Does this Text node only contain
> >     whitespace in element content?
> >
> >How will this be accomplished?  Will the Text class have
> >an "ignorable()" method, or will the interface hierarchy
> >change:
>
> It is EXTREMELY unlikely that the interface hierarchy will change. Current
> inclination is to have a pair of tests:
>      Via the Text node: Does this contain only whitespace?
>      Via the Element or the Element Declaration: Can this Element
>           contain only other Elements.

After really looking at this interesting question closely, I've come to the
following conclusions.

1. If the document is validated, it is easy to declare the content "ignorable"
since it is not in the content model.
2. Documents that are not validated could mark TEXT nodes as "whitespace only
" but you have to know the DTD to know if it can be ignored totally depending
on the application.
3. This makes the whole issue complex and truly makes the DOM harder to use.
4. Even filtering or load and store options would have different semantics
based on whether the document is validated or not.

Overall, this is a tough issue.
Eric :-)

Sorry to hear this from David,

> ... even though in key places, the XML 1.0 spec had made
> clear that CDATA just uses an alternate set of delimiters.
> The errata have gone to some work to preclude that reading.
> More SGML legacy has slipped into XML doctrine.
>
> Lesson for anyone doing syntax design in the future:  there
> is a reason for avoiding such context-sensitive mechanisms.
> In one phrase:  needless complexity.
>
>
>

Received on Monday, 12 June 2000 13:38:55 UTC