XML question for the experts

If a serialized XML document contains:

<!--This is a comment &#x2014; pbg-->

or

<?myproc pseudoatt="this is part of a pi &#x2014; pbg"?>

then when that is read by an XML processor, is the
&#x2014; considered to be a seven character string 
or the Unicode em-dash character?

More precisely, in the infoset of such a document,
when considering the comment or PI's [content] info item,
would the length of the "string representing the content"
be calculated with the "&#x2014;" part contributing 1 or 7 
to the length?

Put another way, if the following XSLT template matched
the above comment, should the xsl:if test succeed or fail:

<xsl:template match="comment()">
  <xsl:if test="string(.)='This is a comment - pbg'">
    <!-- The above line's em-dash is the single U-2014 character -->
  </xsl:if>
</xsl:template>

paul

Received on Friday, 7 December 2007 21:53:18 UTC