W3C home > Mailing lists > Public > public-qt-comments@w3.org > February 2003

RE: HTML output method: indenting limitations

From: Kay, Michael <Michael.Kay@softwareag.com>
Date: Thu, 13 Feb 2003 16:24:36 +0100
Message-ID: <DFF2AC9E3583D511A21F0008C7E621060453E075@daemsg02.software-ag.de>
To: Mike Brown <mike@skew.org>, public-qt-comments@w3.org

I think I would like to move (as we have done recently with the XML
indenting rules) to a more prescriptive statement of exactly where an HTML
indenter may add whitespace, rather than describing it in terms of the
effect on visual appearance.

The rules, I think, are:

(a) whitespace must only be added before or after an element, or adjacent to
an existing whitespace character

(b) whitespace must not be added adjacent to an inline element, the inline
elements being those included in the %inline category in the HTML DTD

(c) whitespace must not be added inside a formatted element, the formatted
elements being pre, script, style, and textarea

and as you point out, "whitespace" in the above should mean "white space as
defined in HTML 4.01 section 9.1.

Michael Kay

> -----Original Message-----
> From: Mike Brown [mailto:mike@skew.org] 
> Sent: 13 February 2003 07:22
> To: public-qt-comments@w3.org
> Subject: HTML output method: indenting limitations
> 
> 
> 
> XSLT 2.0 section 20.3 says:
> 
>   "If the indent attribute has the value yes, then the html 
> output method
>   may add or remove whitespace as it outputs the result tree, 
> so long as it
>   does not change how an HTML user agent would render the output."
> 
> As I mentioned in a previous message (XSLT 1.1 era) [1], this 
> imposes undue burden on the XSLT processor to anticipate how 
> an HTML user agent "should"  
> (I disagree with the use of the word "would") render the 
> output. In order to follow the guideline to the letter, the 
> XSLT processor must be aware of what script (writing system) 
> is in use where the whitespace would be added -- an 
> impossible feat, given the overlap of character repertoires 
> among scripts -- so that it can guess at how the resulting 
> inter-word space would be rendered.  
> See http://www.w3.org/TR/REC-html40/struct/text.html#h-9.1 
> for details.
> 
> Along the same lines, I also mentioned:
> 
>   There is some ambiguity [in the HTML spec] about how to 
> determine where
>   an inter-word space needs to be rendered (for example, if 
> it appears on
>   one side of an inline image), so HTML user agents are not entirely
>   consistent in this regard.
> 
> Further, HTML has a different notion than XML of what 
> constitutes "white space" (note the, ahem, space between the 
> words there). These are the HTML "white space" characters:
>   #x9
>   #xA
>   #xC (form feed; disallowed in XML!)
>   #xD
>   #x20
>   #x200B (zero-width space)
> ...so you may want to clarify whether the reference to 
> "whitespace" that can 
> be added to HTML output includes those FF and ZWS characters.
> 
> In addition, HTML output rendering is also affected by CSS, 
> scripting, and non-HTML elements and processing instructions. 
> These factors should be mentioned.
> 
> Finally, the HTML spec makes various non-normative 
> recommendations of "good practice" for user agents to follow, 
> as mentioned in the third paragraph of HTML 4.01 section 4. 
> Some of these recommendations affect rendering, so you may 
> want to clarify whether they should be taken into account in 
> the determination of how an HTML user agent "should" render 
> the output.
> 
> In conclusion, it would leave less room for interpretation 
> and variance among the output produced by XSLT processors if 
> the guidelines for indenting HTML were changed to something 
> like the following:
> 
>   If the indent attribute has the value yes, then the html 
> output method
>   may add or remove HTML "white space" characters as defined 
> in HTML 4.01
>   section 9.1 as it outputs the result tree, so long as it does not
>   significantly change how an HTML user agent should render 
> the output.
> 
>   The html output method should assume that the user agent 
> will follow the
>   HTML 4.01 recommendations for good practice, and will 
> follow the SGML
>   line break rules mentioned in HTML 4.01 section B.3.1. The 
> method should
>   disregard unpredictable rendering factors such as CSS, client-side
>   scripting, and the effects of non-HTML elements and processing
>   instructions.
> 
>   It is not recommended that the form feed (#xC) character, which is
>   disallowed in XML, be used for indenting, even though it is 
> allowed in
>   HTML.
> 
>   The default value of the indent attribute is yes.
> 
> Thanks.
> 
> [1] 
> http://lists.w3.org/Archives/Public/xsl-> editors/2000JulSep/0041.html
> 
> Mike
> 
> -- 
>   Mike J. Brown   |  http://skew.org/~mike/resume/
>   Denver, CO, USA |  http://skew.org/xml/
> 
> 
Received on Thursday, 13 February 2003 10:24:44 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 March 2012 18:14:24 GMT