W3C home > Mailing lists > Public > public-xml-core-wg@w3.org > December 2007

Re: XML question for the experts

From: John Cowan <cowan@ccil.org>
Date: Fri, 7 Dec 2007 17:03:52 -0500
To: "Grosso, Paul" <pgrosso@ptc.com>
Cc: public-xml-core-wg@w3.org
Message-ID: <20071207220352.GE3346@mercury.ccil.org>

Grosso, Paul scripsit:

> If a serialized XML document contains:
> <!--This is a comment &#x2014; pbg-->
> or
> <?myproc pseudoatt="this is part of a pi &#x2014; pbg"?>
> then when that is read by an XML processor, is the
> &#x2014; considered to be a seven character string 
> or the Unicode em-dash character?

Clearly the former.  Comments and PIs contain simply Chars, which means
that NCRs are not recognized in them.  Compare productions 15 (Comment)
and 16 (PI) with 10 (AttValue) and 43 (Content).

> More precisely, in the infoset of such a document,
> when considering the comment or PI's [content] info item,
> would the length of the "string representing the content"
> be calculated with the "&#x2014;" part contributing 1 or 7 
> to the length?


> Put another way, if the following XSLT template matched
> the above comment, should the xsl:if test succeed or fail:
> <xsl:template match="comment()">
>   <xsl:if test="string(.)='This is a comment - pbg'">
>     <!-- The above line's em-dash is the single U-2014 character -->
>   </xsl:if>
> </xsl:template>


John Cowan
                I am a member of a civilization. --David Brin
Received on Friday, 7 December 2007 22:04:02 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:40:35 UTC