RE: RFC: White Space Handling In XML Parsing from Larry Watanabe on 1999-05-21 (www-dom@w3.org from April to June 1999)

From: Larry Watanabe <LWatanab@JetForm.com>
Date: Fri, 21 May 1999 15:24:38 -0400
To: Arkin <arkin@trendline.co.il>, Paul Grosso <pgrosso@arbortext.com>
Cc: www-dom@w3.org
Message-ID: <111CF63B7D2ED211830000805F65A2FFE6EFF5@OTTMAIL2>
> -----Original Message-----
> From:	Arkin [SMTP:arkin@trendline.co.il]
> Sent:	Friday, May 21, 1999 3:05 PM
> To:	Paul Grosso
> Cc:	www-dom@w3.org
> Subject:	Re: RFC: White Space Handling In XML Parsing
> 
> But there is a growing majority of software developers that use XML as
> part of something else. They are mostly interested in getting the
> information model from an XML document, or communicting between two
> applications. For them, XML is usually represented in a DOM structure or
> similar content model. They do not care much for how it looks on paper,
> except when they have to manually write XML documents or debug them.
> 
	[[Larry Watanabe]  
	I think that any XML that could have a) potentially have been
created by a user, or
	by a program whose input was created by a user,
	and b) whose output may be seen by a user, or whose output may be
processed by
	a program whose output may be seen by a user, should preserve
whitespace.

	This is by no means limited to editors or XSL processors. It could
be something as basic 
	as reading a set of preferences, and saving changed preferences out
again in XML format. 

	The application may not care about how the XML looks on paper, but
the user may care 
	very much.
	---------------------------

> The problem that I see is that XML documents need to contain redundant
> white spaces to make them readable and structured to the human eye when
> used with Notepad. The application, on the other end, does not require
> these spaces, it only requires the meaningful pieces of information that
> are conveyed in the document. Again, I am talking about applications
> that use XML, not XML-specific programs.
> 
> When an application such as that recieves redundant white spaces, it has
> to discard them or ignore them. That requires extra code for getting
> around white space. It complicates the application, thus increasing the
> likelihood of bugs and making it more expensive to maintain.
> 
> Imagine for a second that an application recieves the spaces between
> attributes. So, when it needs a specific attribute value, it has to wade
> its way through the spaces to extract the attribute value. Instead of
> one getAttribute() you have conditions to make sure you recieve the
> right value. The application becomes more complex, harder to maintain
> and more likely to contain a bug.
	[Larry Watanabe]  
	The code does not need to be made more complicated in general, using

	the usual software techniques. 
	-----------
> Here's a very good example of what I mean. Suppose you build an
> application that extracts a book list from a book catalog. It does so by
> getting the first item in the node list. The input is:
> 
>   <book-list><book>Moby Dick</book><book>Ulysess</book></book-list>
> 
> The application does getChildNodes().item(0) and gets the Moby Disk book
> element. Now, suppose I format the same document to look different (but
> still convey the exact same information):
> 
>   <book-list>
>     <book>Moby Dick</book>
>     <book>Ulysess</book>
>   </book-list>
> 
> The application does getChildNodes().item(0) and gets an empty text
> node. Not a book. It has to check for the empty text node and skip to
> the next book. To what purpose?
	[Larry Watanabe]  
	This is  easy enought to abstract away in various ways.
	-----------------

> Again, assuming that this is not an XML editor or XSL processor, it is
> clear that the white spaces do not convey any meaningful information to
> the application, or serve any purpose other than formatting the source
> document. And, since this application is not an XML editor, it does not
> care about preserving that original format.
> 
	[Larry Watanabe]  
	Each application may only be a single step in a complex chain of
processing, and 
	if the application discards its whitespace it may be unusable
because of 
	requirements for the process as a whole to retain whitespace.
	--------------------

> Arkin
Received on Friday, 21 May 1999 15:33:23 UTC