- From: Paul Grosso <pgrosso@arbortext.com>
- Date: Fri, 21 May 1999 18:43:34 -0500
- To: www-dom@w3.org
At 15:05 1999 05 21 -0400, Arkin wrote: >> I'm afraid I remain unclear on why you think an RFC about whitespace >> in XML parsing is necessary or even a good idea. What about the XML >> spec are you trying to change (and why)? Or, if you're not trying to >> change something, what's the point of the RFC? > >I am not trying to change anything about the XML specs, they are fine as >they are. >Here's a very good example of what I mean. Suppose you build an >application that extracts a book list from a book catalog. It does so by >getting the first item in the node list. The input is: > > <book-list><book>Moby Dick</book><book>Ulysess</book></book-list> > >The application does getChildNodes().item(0) and gets the Moby Disk book >element. Now, suppose I format the same document to look different (but >still convey the exact same information): > > <book-list> > <book>Moby Dick</book> > <book>Ulysess</book> > </book-list> I still think your use of the word "format" to refer to the source document is confusing--even to yourself. Because it's making you think that those spaces, in some sense, "don't count" because they are "only there for formatting" and "formatting" isn't really part of the document content. You're wrong about that. The input is the input, spaces in data content of a document have nothing to do with "formatting," and those spaces are really there. >The application does getChildNodes().item(0) and gets an empty text >node. Not a book. It has to check for the empty text node and skip to >the next book. To what purpose? The solution is to use some kind of "filter" in the DOM to ask for the next element node if that's what you want. Just pretending the spaces aren't there--even if that made sense--wouldn't solve your problem given that things like comments and PIs could also be "in the way" between the elements you wish to see (to say nothing of the mess you'd have if some of your <book>...</book> elements really got there by being the replacement text of some entities).
Received on Friday, 21 May 1999 19:43:38 UTC