Re: RFC: White Space Handling In XML Parsing

At 15:05 1999 05 21 -0400, Arkin wrote:
>> I'm afraid I remain unclear on why you think an RFC about whitespace
>> in XML parsing is necessary or even a good idea.  What about the XML
>> spec are you trying to change (and why)?  Or, if you're not trying to
>> change something, what's the point of the RFC?
>
>I am not trying to change anything about the XML specs, they are fine as
>they are.

>Here's a very good example of what I mean. Suppose you build an
>application that extracts a book list from a book catalog. It does so by
>getting the first item in the node list. The input is:
>
>  <book-list><book>Moby Dick</book><book>Ulysess</book></book-list>
>
>The application does getChildNodes().item(0) and gets the Moby Disk book
>element. Now, suppose I format the same document to look different (but
>still convey the exact same information):
>
>  <book-list>
>    <book>Moby Dick</book>
>    <book>Ulysess</book>
>  </book-list>

I still think your use of the word "format" to refer to the source 
document is confusing--even to yourself.  Because it's making you 
think that those spaces, in some sense, "don't count" because they
are "only there for formatting" and "formatting" isn't really part of
the document content.

You're wrong about that.  The input is the input, spaces in data content
of a document have nothing to do with "formatting," and those spaces are 
really there.

>The application does getChildNodes().item(0) and gets an empty text
>node. Not a book. It has to check for the empty text node and skip to
>the next book. To what purpose?

The solution is to use some kind of "filter" in the DOM to ask for the
next element node if that's what you want.  Just pretending the spaces
aren't there--even if that made sense--wouldn't solve your problem given
that things like comments and PIs could also be "in the way" between the
elements you wish to see (to say nothing of the mess you'd have if some
of your <book>...</book> elements really got there by being the replacement
text of some entities).

Received on Friday, 21 May 1999 19:43:38 UTC