- From: Arkin <arkin@trendline.co.il>
- Date: Fri, 21 May 1999 15:05:05 -0400
- To: Paul Grosso <pgrosso@arbortext.com>
- CC: www-dom@w3.org
> I'm afraid I remain unclear on why you think an RFC about whitespace > in XML parsing is necessary or even a good idea. What about the XML > spec are you trying to change (and why)? Or, if you're not trying to > change something, what's the point of the RFC? I am not trying to change anything about the XML specs, they are fine as they are. There are XML developers that work on XML technologies, for example an XML editor or XSL processor. They see XML documents at a very technical level. This RFC is not intended for them. So all arguments about how XML editors will work with it are rather moot. But there is a growing majority of software developers that use XML as part of something else. They are mostly interested in getting the information model from an XML document, or communicting between two applications. For them, XML is usually represented in a DOM structure or similar content model. They do not care much for how it looks on paper, except when they have to manually write XML documents or debug them. The problem that I see is that XML documents need to contain redundant white spaces to make them readable and structured to the human eye when used with Notepad. The application, on the other end, does not require these spaces, it only requires the meaningful pieces of information that are conveyed in the document. Again, I am talking about applications that use XML, not XML-specific programs. When an application such as that recieves redundant white spaces, it has to discard them or ignore them. That requires extra code for getting around white space. It complicates the application, thus increasing the likelihood of bugs and making it more expensive to maintain. Imagine for a second that an application recieves the spaces between attributes. So, when it needs a specific attribute value, it has to wade its way through the spaces to extract the attribute value. Instead of one getAttribute() you have conditions to make sure you recieve the right value. The application becomes more complex, harder to maintain and more likely to contain a bug. Here's a very good example of what I mean. Suppose you build an application that extracts a book list from a book catalog. It does so by getting the first item in the node list. The input is: <book-list><book>Moby Dick</book><book>Ulysess</book></book-list> The application does getChildNodes().item(0) and gets the Moby Disk book element. Now, suppose I format the same document to look different (but still convey the exact same information): <book-list> <book>Moby Dick</book> <book>Ulysess</book> </book-list> The application does getChildNodes().item(0) and gets an empty text node. Not a book. It has to check for the empty text node and skip to the next book. To what purpose? Again, assuming that this is not an XML editor or XSL processor, it is clear that the white spaces do not convey any meaningful information to the application, or serve any purpose other than formatting the source document. And, since this application is not an XML editor, it does not care about preserving that original format. Arkin
Received on Friday, 21 May 1999 15:15:37 UTC