Re: "Empty" Text Nodes

Hi Arkin,

> A question that still buffles me, so comments on the following are more
> than welcome:

Ok, I have some remarks
(even if they are not DOM specific ...)

> 1. An HTML processor is a very specific case of an XML processor and
> must know something about HTML to process it right. This information
> falls outside the XML DTD. 

Strictly spoken is a HTML processor at present a specific SGML processor.
That means e.g. (according to the HTML DTD) some start or end tags of
elements may be omitted.

> For example, a Web browser may assume that
> any text inside a table belongs in some row, thus,
> 
> <TABLE>
>   <TR>
>     Explicit row
>   </TR>
>   Implicit row
> <TABLE>
> 
> is equivalent to
> 
> <TABLE><TR>Explicit row</TR><TR>Implicit row</TR></TABLE>

Well, your example is incorrect HTML. The start tag <TR> must appear.
But - what should a browser do while reading such code? Printing an
error message like a compiler seems not to be very reasonable. So it
tries to do its best ...

If we have a HTML DTD in XML then all tags must appear. Omitting tags
is not allowed any longer. For browsers this is again a theoretical
demand: what to do if an author doesn't play the game by the rules?

> 2. PRE, STYLE and SCRIPT are specific cases in HTML, unlike other
> elements. They are whitespace preserving and do not process elements in
> their content. 

Sorry, that's not correct. E.g. PRE may contain special elements like A
or IMG, phrase elements like EM and STRONG, and even form control elements.

> 4. With a validating XML processor, XML elements should preserve
> whitespaces only if the 'xml:space' attribute has a value of 'preserve',
> otherwise they may lose whitespaces by ignoring the trailing and leading
> whitespaces and consolidating multiple whitespaces to a single space
> (&#20;). Again, whitespace is assumed to be for human readbility.
> 
[...]
> 
> 5. With a validating XML processor, XML elements that have non-mixed
> content type (only elements, no text) should ignore all whitespaces and
> flag an error for any other text that appears in between elements.
> 
> 6. Without a validating XML processor, XML elements should attempt to
> ignore as much whitespace as possible, regarding it as human readable
> whitespace.

I agree.
But as I see from other postings the opinions, if whitespaces should be
reported or not, are quite different.

I should think about it a little while ...

Cheers,
Oliver


/-------------------------------------------------------------------\
|  ob|do        Dipl.Inf. Oliver Becker                             |
|  --+--        E-Mail: obecker@informatik.hu-berlin.de             |
|  op|qo        WWW:    http://www.informatik.hu-berlin.de/~obecker |
\-------------------------------------------------------------------/

Received on Thursday, 25 February 1999 08:57:05 UTC