Re: XML, DOM and "ignorable" whitespace

Hi Joe,

keshlam@us.ibm.com wrote:

> The short answer: No support until DOM Level 3.

Yesterday, I got the pleasure of reading the reams of messages on this subject
(just before I started following this list). I realize the need for the XML
parser to pass the linefeeds along. I also realize that if DOM users need
these then the DOM can't ignore them. A method sort of like Element::normalize
would do the trick but is not elegant and not really scalable. Why check the
whole tree you just built or much worse for lazy initialization ...

The next big issue is the validated versus non-validated could lead to
different results if the node is marked in the DTD verified version.

The long and the short is that this is a tough problem but it is also
disappointing from a basic developer point of view none the less.

Thanks,
Eric

>
>
> "Ignorable" whitespace is, unfortunately, a misnomer. The more useful
> concept is  whitespace-in-element-context (ie, whitespace that is not
> expected by the DTD grammar and hence not a meaningful part of the
> document's contents.)
>
> The XML spec _requires_ that this whitespace be passed along by the XML
> Processor -- which is usually taken to mean the parser and the DOM.
>
> The spec also requries that this whitespace be easily recognizable as being
> in element context.  The DOM hasn't addressed this yet.
>
> Some parsers attempt to solve this by marking Text nodes that contain
> element-context whitespace at the time the node is created. That isn't
> really reliable -- if the DOM is edited and this node is moved to a new
> location, the parser can't help you keep this flag set properly.  And of
> course the flag is a custom feature, so it's nonportable.
>
> A better solution is to ask the DTD support whether a particular Text node
> is in Element Context, then ask the Text node if it contains only
> whitespace. Unfortunately the DOM hasn't yet designed DTD support. Some
> individual DOMs may have a custom feature for DTDs... but again, that's
> nonportable at this time.
>
> DOM Level 3's "Content Model" chapter is expected to address this. We're
> still not sure whether it'll be two seperate queries (one for Element
> Context, one for whitespace-only), or whether the latter is something you
> should determine yourself (probably not), or whether a convenience function
> should be provided that performs both tests.
>
> ______________________________________
> Joe Kesselman  / IBM Research

Received on Friday, 2 June 2000 10:01:02 UTC