- From: Henry Zongaro <zongaro@ca.ibm.com>
- Date: Mon, 8 Sep 2003 16:21:22 -0400
- To: xsl-editors@w3.org
Hello, Consider applying the following stylesheet to any input XML document. Note the end-of-line that is part of the content of the xsl:text element. <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="/"> <out> <xsl:attribute name="attr"> <xsl:text>	 </xsl:text> </xsl:attribute> </out> </xsl:template> </xsl:stylesheet> Which of the following are ways in which a processor should serialize the "attr" attribute? The form "[U+xxxx]" indicates that the actual Unicode character appears at that point in the serialized result, as opposed to a character reference. (i) attr="[U+0009] " (ii) attr="	 " According to Section 7.1.3 of XSLT 1.0 [1], "Note: When an xsl:attribute contains a text node with a newline, then the XML output must contain a character reference. . . . This is because XML 1.0 requires newline characters in attribute values to be normalized into spaces but requires character references to newline characters not to be normalized." Is this note intended to be an exhaustive list of the situations in which character references must be used because of the Attribute-Value Normalization rules of XML 1.0 [2]? Some read it as exhaustive, and believe that either serialized form for the attribute is admissible; others read it as simply an example, and believe that only the form marked (ii) should be used to serialize the result, as the first form would not yield a document with the same Infoset. Would the answer be different for a stylesheet like the following, which has no xsl:attribute element? <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="/"> <out attr="	 "/> </xsl:template> </xsl:stylesheet> Section 4 of the XSLT 2.0 and XQuery 1.0 Serialization draft [3], of course, is explicit, stating in part that "certain whitespace characters should be output as character references, to ensure that they survive the round trip through serialization and parsing. Specifically, CR characters in text nodes should be written as 
 or an equivalent; while CR, NL, and TAB characters in attribute nodes should be output respectively as 
, 
, and 	, or their equivalents." But it's not clear whether that's a change in behaviour or a clarification of something that was not clearly described in XSLT 1.0. Thanks, Henry [1] http://www.w3.org/TR/xslt#creating-attributes [2] http://www.w3.org/TR/2000/REC-xml-20001006#AVNormalize [3] http://www.w3.org/TR/xslt-xquery-serialization/#xml-output ------------------------------------------------------------------ Henry Zongaro Xalan development IBM SWS Toronto Lab T/L 969-6044; Phone +1 905 413-6044 mailto:zongaro@ca.ibm.com
Received on Monday, 8 September 2003 16:21:50 UTC