- From: Mike Brown <mike@skew.org>
- Date: Tue, 29 Aug 2000 22:54:21 -0600 (MDT)
- To: xsl-editors@w3.org
- CC: xsl-list@mulberrytech.com
I have reported to xsl-editors a few output related issues that I would like to see receive some attention in XSLT 1.1. When I mentioned these before, no discussion followed. I will summarize and restate them here. I believe these issues all fall into the realm of more fully specifying non-erroneous behaviors. Addressing these issues would serve to standardize features already implemented in several XSLT 1.0 processors, and thus they are within the scope of the stated general requirements. I understand that these are lower priority than the other issues in the requirements doc. On to the issues... 1. What HTML calls "white space" and what the XML and XSLT recommendations call "whitespace" are two different things. XML has 3 whitespace characters; HTML has 6 characters and 1 pair of characters that are considered white space. Caution must be used when inserting white space characters when indenting, because in most HTML elements, sequences of consecutive white space characters are collapsed into a single inter-word space, which is rendered according to the appropriate human language script for the adjacent spans of text. There is some ambiguity about how to determine where an inter-word space needs to be rendered (for example, if it appears on one side of an inline image), so HTML user agents are not entirely consistent in this regard. It would leave less room for interpretation and variance among the output produced by XSLT processors if the following guideline for indenting HTML were changed. I suggest changing this phrase in http://www.w3.org/TR/xslt#section-HTML-Output-Method: "If the indent attribute has the value yes, then the html output method may add or remove whitespace as it outputs the result tree, so long as it does not change how an HTML user agent would render the output. The default value is yes." to "If the indent attribute has the value yes, then the html output method may add or remove HTML white space as it outputs the result tree, as long as it does not significantly change how an HTML user agent following the HTML specification's informative recommendations for good practice should render the output. The default value is yes." 2. <script> and <style> elements are recommended as having output escaping disabled when emitted via the "html" output method, but no recommendation is made for "script data"-type attribute values -- attributes whose content model appears as %Script; in the HTML DTDs. There are too many of these to enumerate here, but they should be included in the recommendation. Again, this affects portability of stylesheets because processors could choose to escape attribute values with script content. 3. XSLT document authors often want to construct URI strings with XPath/XSLT functions and put them in certain attributes. It is not just limited to HTML; there are also various applications where URIs need to be used as the values of reserved attributes in XML based languages. RFC 2396 mandates that URI strings be escaped per certain conventions. Using pure XSLT there is no way to effectively achieve proper escaping when constructing the URI strings.* Consequently, a demand exists for XSLT processors to make some effort to perform URI escaping on the values of certain attributes, at least when the output method is "html". Implementors and users of XSLT processors have been debating how to achieve this, resulting in differing implementations and in turn, making stylesheets less portable, because output may be useless if all the href and src attributes are munged. If a pure XSLT solution for performing URI escaping on a given string (intended to be used while constructing URI strings, not after the fact) cannot be achieved in this next revision, then an informative statement should be added to the Output section of the XSLT 1.1 spec, saying something like this: Escaping of URI strings URI strings are by definition already escaped; if a string contains characters that are not allowed to exist in a URI, then it is not a URI. It is the responsibility of the document author to perform the appropriate escaping when constructing the string. Since XSLT does not offer a convenient mechanism for performing URI escaping, extension functions are necessary to achieve this goal. As a workaround, XSLT processors may, but are not required to, attempt to perform some degree of URI escaping, as specified in [RFC 2396], when outputting the values of certain attributes that are required to be URIs. For example, when the output method is "html", attributes whose content model appears as %URI; in the appropriate HTML DTD may be escaped upon output. Because such an attribute value may already be a properly escaped URI, double escaping may occur, possibly changing the meaning of the URI. Therefore, if an XSLT processor can perform automatic escaping, it should also provide a mechanism for disabling this behavior. Perhaps this suggestion, too, is insufficient? Original posts where these issues are explained further: http://lists.w3.org/Archives/Public/xsl-editors/1999OctDec/0033.html http://lists.w3.org/Archives/Public/xsl-editors/2000AprJun/0069.html However, please consider the suggestions as they are worded in this message to be more current than those in the old messages. Thanks and respect, - Mike ____________________________________________________________________ Mike J. Brown, software engineer at My XML/XSL resources: webb.net in Denver, Colorado, USA http://www.skew.org/xml/ * Actually it is not impossible to do URI escaping in pure XSLT, but after experimenting a bit I came to the conclusion that it would require building a lookup string consisting of all 1.2 million characters that can be in an XML document. Their relative positions in the string could then be used to deduce their Unicode scalar values, from which a UTF-8 octet sequence can be derived and converted to %xx escapes. A unicode-scalar() function for converting the first character of a given string to a number that is its Unicode scalar value would be most helpful, as would a hex() function for converting a number to a hexadecimal string equivalent. Then it would be a matter of pretty simple arithmetic to convert the scalar values to the appropriate %xx sequences...
Received on Wednesday, 30 August 2000 00:54:34 UTC