- From: Mike Brown <mike@skew.org>
- Date: Thu, 13 Feb 2003 00:22:26 -0700 (MST)
- To: public-qt-comments@w3.org
XSLT 2.0 section 20.3 says: "If the indent attribute has the value yes, then the html output method may add or remove whitespace as it outputs the result tree, so long as it does not change how an HTML user agent would render the output." As I mentioned in a previous message (XSLT 1.1 era) [1], this imposes undue burden on the XSLT processor to anticipate how an HTML user agent "should" (I disagree with the use of the word "would") render the output. In order to follow the guideline to the letter, the XSLT processor must be aware of what script (writing system) is in use where the whitespace would be added -- an impossible feat, given the overlap of character repertoires among scripts -- so that it can guess at how the resulting inter-word space would be rendered. See http://www.w3.org/TR/REC-html40/struct/text.html#h-9.1 for details. Along the same lines, I also mentioned: There is some ambiguity [in the HTML spec] about how to determine where an inter-word space needs to be rendered (for example, if it appears on one side of an inline image), so HTML user agents are not entirely consistent in this regard. Further, HTML has a different notion than XML of what constitutes "white space" (note the, ahem, space between the words there). These are the HTML "white space" characters: #x9 #xA #xC (form feed; disallowed in XML!) #xD #x20 #x200B (zero-width space) ...so you may want to clarify whether the reference to "whitespace" that can be added to HTML output includes those FF and ZWS characters. In addition, HTML output rendering is also affected by CSS, scripting, and non-HTML elements and processing instructions. These factors should be mentioned. Finally, the HTML spec makes various non-normative recommendations of "good practice" for user agents to follow, as mentioned in the third paragraph of HTML 4.01 section 4. Some of these recommendations affect rendering, so you may want to clarify whether they should be taken into account in the determination of how an HTML user agent "should" render the output. In conclusion, it would leave less room for interpretation and variance among the output produced by XSLT processors if the guidelines for indenting HTML were changed to something like the following: If the indent attribute has the value yes, then the html output method may add or remove HTML "white space" characters as defined in HTML 4.01 section 9.1 as it outputs the result tree, so long as it does not significantly change how an HTML user agent should render the output. The html output method should assume that the user agent will follow the HTML 4.01 recommendations for good practice, and will follow the SGML line break rules mentioned in HTML 4.01 section B.3.1. The method should disregard unpredictable rendering factors such as CSS, client-side scripting, and the effects of non-HTML elements and processing instructions. It is not recommended that the form feed (#xC) character, which is disallowed in XML, be used for indenting, even though it is allowed in HTML. The default value of the indent attribute is yes. Thanks. [1] http://lists.w3.org/Archives/Public/xsl-editors/2000JulSep/0041.html Mike -- Mike J. Brown | http://skew.org/~mike/resume/ Denver, CO, USA | http://skew.org/xml/
Received on Thursday, 13 February 2003 02:22:22 UTC