W3C home > Mailing lists > Public > public-html@w3.org > October 2010

Re: Working Group Decision on ISSUE-103 srcdoc-xml-escaping

From: Julian Reschke <julian.reschke@gmx.de>
Date: Thu, 14 Oct 2010 12:02:31 +0200
Message-ID: <4CB6D537.8030104@gmx.de>
To: Simon Pieters <simonp@opera.com>
CC: HTML WG <public-html@w3.org>, Sam Ruby <rubys@intertwingly.net>
On 14.10.2010 11:35, Simon Pieters wrote:
> ...
> The new text says that U+0020 needs to be escaped.
>
> <p class="note">Due to restrictions of <span>the XML syntax</span>,
> - in XML a number of other characters need to be escaped also to
> - ensure correctness.</p>
> + in XML the U+003C LESS-THAN SIGN character (&lt;) needs to be
> + escaped as well. In order to prevent <a
> + href="http://www.w3.org/TR/REC-xml/#AVNormalize">attribute-value
> + normalization</a>, XML's whitespace characters &mdash; U+0009
> + CHARACTER TABULATION (HT), U+000A LINE FEED (LF), U+000D CARRIAGE
> + RETURN (CR) and U+0020 SPACE &mdash; also need to be escaped. <a
> + href="#refsXML">[XML]</a></p>
>
> My reading of the XML spec suggests space does not need to be escaped.
>
> http://www.w3.org/TR/REC-xml/#AVNormalize
>
> "For a white space character (#x20, #xD, #xA, #x9), append a space
> character (#x20) to the normalized value."
>
> i.e. a literal space and an escaped space results in the same thing.
>
> The paragraph "If the attribute type is not CDATA, then the XML
> processor MUST further process the normalized attribute value by
> discarding any leading and trailing space (#x20) characters, and by
> replacing sequences of space (#x20) characters by a single space (#x20)
> character." does not apply since srcdoc is a CDATA attribute.
>
> Should I file a bug report?
> ...

That's why I was asking Henri, and I agree with that conclusion.

I was going to file a bug once this is understood; but go ahead if you 
want to raise it :-)-

Note that the text wrt whitespace was added based on Philip's feedback 
in <http://lists.w3.org/Archives/Public/public-html/2010Mar/0429.html>:

> But in attribute values, U+000D and U+000A and U+0009 must be escaped
> too. (Depending on DTD you might also need to escape any leading or
> trailing U+0020 and at least one of any adjacent pair of U+0020s, I think.)

So the text in the CP may have been too conservative, taking the case 
that there may be a DTD changing the whitespace handling into account.

Best regards, Julian
Received on Thursday, 14 October 2010 10:03:12 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:17:15 GMT