W3C home > Mailing lists > Public > public-xml-core-wg@w3.org > December 2007

XML question for the experts

From: Grosso, Paul <pgrosso@ptc.com>
Date: Fri, 7 Dec 2007 16:52:19 -0500
Message-ID: <CF83BAA719FD2C439D25CBB1C9D1D30209AE9A29@HQ-MAIL4.ptcnet.ptc.com>
To: <public-xml-core-wg@w3.org>

If a serialized XML document contains:

<!--This is a comment &#x2014; pbg-->

or

<?myproc pseudoatt="this is part of a pi &#x2014; pbg"?>

then when that is read by an XML processor, is the
&#x2014; considered to be a seven character string 
or the Unicode em-dash character?

More precisely, in the infoset of such a document,
when considering the comment or PI's [content] info item,
would the length of the "string representing the content"
be calculated with the "&#x2014;" part contributing 1 or 7 
to the length?

Put another way, if the following XSLT template matched
the above comment, should the xsl:if test succeed or fail:

<xsl:template match="comment()">
  <xsl:if test="string(.)='This is a comment - pbg'">
    <!-- The above line's em-dash is the single U-2014 character -->
  </xsl:if>
</xsl:template>

paul
Received on Friday, 7 December 2007 21:53:18 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:16:38 UTC