Re: XML, DOM and "ignorable" whitespace

The short answer: No support until DOM Level 3.


"Ignorable" whitespace is, unfortunately, a misnomer. The more useful
concept is  whitespace-in-element-context (ie, whitespace that is not
expected by the DTD grammar and hence not a meaningful part of the
document's contents.)

The XML spec _requires_ that this whitespace be passed along by the XML
Processor -- which is usually taken to mean the parser and the DOM.

The spec also requries that this whitespace be easily recognizable as being
in element context.  The DOM hasn't addressed this yet.


Some parsers attempt to solve this by marking Text nodes that contain
element-context whitespace at the time the node is created. That isn't
really reliable -- if the DOM is edited and this node is moved to a new
location, the parser can't help you keep this flag set properly.  And of
course the flag is a custom feature, so it's nonportable.

A better solution is to ask the DTD support whether a particular Text node
is in Element Context, then ask the Text node if it contains only
whitespace. Unfortunately the DOM hasn't yet designed DTD support. Some
individual DOMs may have a custom feature for DTDs... but again, that's
nonportable at this time.


DOM Level 3's "Content Model" chapter is expected to address this. We're
still not sure whether it'll be two seperate queries (one for Element
Context, one for whitespace-only), or whether the latter is something you
should determine yourself (probably not), or whether a convenience function
should be provided that performs both tests.




______________________________________
Joe Kesselman  / IBM Research

Received on Friday, 2 June 2000 09:16:21 UTC