- From: Arkin <arkin@trendline.co.il>
- Date: Fri, 26 Feb 1999 11:32:22 -0500
- CC: www-dom@w3.org
Oliver Becker wrote: > > Ok, let's summarize: > > The XML spec states that "An XML processor must always pass all > characters in a document that are not markup through to the > application". > > My question: on which level the DOM is aligned? > Is it attached to a XML parser, or is it more > attached to a XML application? I think this is one point where the XML/DOM specification totally blew it off. The above specification works well if your application is responsible for constructing the document tree from a parsed document. That's how the SAX parser API works, and you application can accept or reject whitespaces. The parser will let your application know when text is just whitespace. In real life, most applications will just call some parser or other black box function and get a full DOM document tree in return. They will then iterate through the tree and either meet or won't meet whitespaces. I believe this specific use of the DOM is going to be very common, and is not clearly defined. Assume that you write an application that retrieves a DOM document tree in some way (say, from a CORBA server). You then start iterating through it looking for interesting information. When testing the application, you use a validating parser, so whitespaces are removed from the document. You then run the application but notice a degradation in performance, so you turn the validation off, figuring all documents are correct anyway. Now it breaks, because the DOM document tree contains whitespaces that previously did not exist, and your application does not skip them. It could even break if you opted to use a different parser, or reformatted the document. It's not big deal to either transform the DOM tree to lose whitespaces, use a suitable iterator to skip them (org.w3c.dom.fi.NodeIterator), or simply expect whitespaces. Only problem is, it does not appear in the specs, and so will most likely not appear in any books, articles or code examples we're likely to read, so we'll never know about it until the first time we write code. I think the W3C has the obligation to strictly clarify this point in the specification, so that: a) all parsers behave consistently, b) any undefined behavior is left as an option, not the default, c) applications can be written with more confidence. From the specification the issue will likely propogate to books, articles and knowledge bases, where even those not technically inclined to reading terse RFCs will find it. Arkin > In the latter case (validating parser with known DTD) the object > structure should not contain text nodes with whitespaces if the > content model of the element doesn't contain #PCDATA and xml:space > is not "preserve". > > On the other hand it's no big effort transforming a DOM tree with > whitespace text nodes into another one without them ... > > Regards, > Oliver > > /-------------------------------------------------------------------\ > | ob|do Dipl.Inf. Oliver Becker | > | --+-- E-Mail: obecker@informatik.hu-berlin.de | > | op|qo WWW: http://www.informatik.hu-berlin.de/~obecker | > \-------------------------------------------------------------------/
Received on Friday, 26 February 1999 11:38:23 UTC