RE: Attribute normalization

I came to the conclusion that the normative statement was the one in XSLT
1.0 section 16.1 which says (in effect) that serialization as XML should
round-trip. The note in 7.1.3 is just one example of something you have to
do to achieve this, and is incomplete. The statement in the XSLT 2.0 draft
is simply a more complete exposition of the consequences of the
round-tripping rule, but the basic rule was already present in 1.0.

Michael Kay

> -----Original Message-----
> From: Henry Zongaro [mailto:zongaro@ca.ibm.com] 
> Sent: 08 September 2003 21:22
> To: xsl-editors@w3.org
> Subject: Attribute normalization
> 
> 
> 
> Hello,
> 
>      Consider applying the following stylesheet to any input 
> XML document. 
>  Note the end-of-line that is part of the content of the 
> xsl:text element.
> 
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>                 version="1.0">
>   <xsl:template match="/">
>     <out>
>       <xsl:attribute name="attr">
>         <xsl:text>&#9;
> </xsl:text>
>         </xsl:attribute>
>     </out>
>   </xsl:template>
> </xsl:stylesheet>
> 
>      Which of the following are ways in which a processor 
> should serialize 
> the "attr" attribute?  The form "[U+xxxx]" indicates that the actual 
> Unicode character appears at that point in the serialized result, as 
> opposed to a character reference.
> 
> (i)   attr="[U+0009]&#10;"
> (ii)  attr="&#9;&#10;"
> 
>      According to Section 7.1.3 of XSLT 1.0 [1], "Note:  When 
> an xsl:attribute contains a text node with a newline, then the 
> XML output must contain a character reference. . . .  This is 
> because XML 1.0 requires newline characters in attribute values to 
> be normalized into spaces but requires character references 
> to newline 
> characters not to be normalized."
> 
>      Is this note intended to be an exhaustive list of the 
> situations in 
> which character references must be used because of the 
> Attribute-Value 
> Normalization rules of XML 1.0 [2]?  Some read it as exhaustive, and 
> believe that either serialized form for the attribute is admissible; 
> others read it as simply an example, and believe that only 
> the form marked 
> (ii) should be used to serialize the result, as the first 
> form would not 
> yield a document with the same Infoset.
> 
>      Would the answer be different for a stylesheet like the 
> following, 
> which has no xsl:attribute element?
> 
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>                 version="1.0">
>   <xsl:template match="/">
>     <out attr="&#9;&#10;"/>
>   </xsl:template>
> </xsl:stylesheet>
> 
> 
>      Section 4 of the XSLT 2.0 and XQuery 1.0 Serialization 
> draft [3], of 
> course, is explicit, stating in part that "certain whitespace 
> characters should be output as character references, to 
> ensure that they survive the round trip through serialization 
> and parsing. 
> Specifically, CR characters in text nodes should be written 
> as &#xD; or an 
> equivalent; while CR, NL, and TAB characters in attribute 
> nodes should be 
> output respectively as &#xD;, &#xA;, and &#x9;, or their 
> equivalents." But 
> it's not clear whether that's a change in behaviour or a 
> clarification of 
> something that was not clearly described in XSLT 1.0.
> 
> Thanks,
> 
> Henry
> [1] http://www.w3.org/TR/xslt#creating-attributes
> [2] http://www.w3.org/TR/2000/REC-xml-20001006#AVNormalize
> [3] http://www.w3.org/TR/xslt-xquery-serialization/#xml-output
> ------------------------------------------------------------------
> Henry Zongaro      Xalan development
> IBM SWS Toronto Lab   T/L 969-6044;  Phone +1 905 413-6044
> mailto:zongaro@ca.ibm.com
> 

Received on Monday, 8 September 2003 18:42:23 UTC