- From: Grosso, Paul <pgrosso@ptc.com>
- Date: Fri, 7 Dec 2007 16:52:19 -0500
- To: <public-xml-core-wg@w3.org>
If a serialized XML document contains:
<!--This is a comment — pbg-->
or
<?myproc pseudoatt="this is part of a pi — pbg"?>
then when that is read by an XML processor, is the
— considered to be a seven character string
or the Unicode em-dash character?
More precisely, in the infoset of such a document,
when considering the comment or PI's [content] info item,
would the length of the "string representing the content"
be calculated with the "—" part contributing 1 or 7
to the length?
Put another way, if the following XSLT template matched
the above comment, should the xsl:if test succeed or fail:
<xsl:template match="comment()">
<xsl:if test="string(.)='This is a comment - pbg'">
<!-- The above line's em-dash is the single U-2014 character -->
</xsl:if>
</xsl:template>
paul
Received on Friday, 7 December 2007 21:53:18 UTC