- From: John Cowan <cowan@ccil.org>
- Date: Fri, 7 Dec 2007 17:03:52 -0500
- To: "Grosso, Paul" <pgrosso@ptc.com>
- Cc: public-xml-core-wg@w3.org
Grosso, Paul scripsit: > If a serialized XML document contains: > > <!--This is a comment — pbg--> > > or > > <?myproc pseudoatt="this is part of a pi — pbg"?> > > then when that is read by an XML processor, is the > — considered to be a seven character string > or the Unicode em-dash character? Clearly the former. Comments and PIs contain simply Chars, which means that NCRs are not recognized in them. Compare productions 15 (Comment) and 16 (PI) with 10 (AttValue) and 43 (Content). > More precisely, in the infoset of such a document, > when considering the comment or PI's [content] info item, > would the length of the "string representing the content" > be calculated with the "—" part contributing 1 or 7 > to the length? Seven. > Put another way, if the following XSLT template matched > the above comment, should the xsl:if test succeed or fail: > > <xsl:template match="comment()"> > <xsl:if test="string(.)='This is a comment - pbg'"> > <!-- The above line's em-dash is the single U-2014 character --> > </xsl:if> > </xsl:template> Fail. -- John Cowan cowan@ccil.org I am a member of a civilization. --David Brin
Received on Friday, 7 December 2007 22:04:02 UTC