If a serialized XML document contains: <!--This is a comment — pbg--> or <?myproc pseudoatt="this is part of a pi — pbg"?> then when that is read by an XML processor, is the — considered to be a seven character string or the Unicode em-dash character? More precisely, in the infoset of such a document, when considering the comment or PI's [content] info item, would the length of the "string representing the content" be calculated with the "—" part contributing 1 or 7 to the length? Put another way, if the following XSLT template matched the above comment, should the xsl:if test succeed or fail: <xsl:template match="comment()"> <xsl:if test="string(.)='This is a comment - pbg'"> <!-- The above line's em-dash is the single U-2014 character --> </xsl:if> </xsl:template> paulReceived on Friday, 7 December 2007 21:53:18 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:21:37 GMT