Whitespace, PRE, etc

I've checked the list archives since July, the FAQ, etc., but I find no answer 
to my question.

This has to do with the earlier queries about DOM implementations filling in 
missing parts of HTML 4.0 documents, e.g. HTML, HEAD, BODY, etc.

Now the XML spec is fairly descript about differentiating significant from 
non-significant whitespace in parsed files.  The HTML 4.0 spec doesn't seem to 
have anything on this matter.  The part about white-space only details the 
unicode chars that are HTML white-space and what a user interface should do to 
display such whitespace.

So if I am parsing an HTML doc into a DOM tree, e.g. (DOCTYPE omitted)

<HTML>
	<HEAD>
		<TITLE>A            Document</TITLE>
	</HEAD>
	<BODY>
		<PRE>This           is           a          document</PRE>
	</BODY>
</HTML>

Do I generate text nodes for, say the "\n\t" between "<HTML>" and "<HEAD>"?  I 
already know that I can't generate text nodes for any whitespace outside the 
<HTML> tag, because the DOM spec states that the Document node can't contain 
Text nodes.

Do I act as a user interface should do and collapse the "A            
Document" to "A document" before inserting into the tree as a text node?

I know that in any case, I should leave the text in the PRE element as is, but 
if so, wouldn't that create a special case for text nodes that are children of 
a PRE element, with attendant implementation complications?

Does anyone else see how this complicates the automatic completion of HTML 
documents?  If there is no <HEAD> tag, but a <TITLE> tag, and I have a bunch 
of text nodes representing pure white-space, where do I place the interpolated 
<HEAD> tag?  What if there are comment tags?

Thanks.

-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org

Received on Thursday, 7 January 1999 14:01:14 UTC