- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Thu, 14 Oct 2010 12:02:31 +0200
- To: Simon Pieters <simonp@opera.com>
- CC: HTML WG <public-html@w3.org>, Sam Ruby <rubys@intertwingly.net>
On 14.10.2010 11:35, Simon Pieters wrote: > ... > The new text says that U+0020 needs to be escaped. > > <p class="note">Due to restrictions of <span>the XML syntax</span>, > - in XML a number of other characters need to be escaped also to > - ensure correctness.</p> > + in XML the U+003C LESS-THAN SIGN character (<) needs to be > + escaped as well. In order to prevent <a > + href="http://www.w3.org/TR/REC-xml/#AVNormalize">attribute-value > + normalization</a>, XML's whitespace characters — U+0009 > + CHARACTER TABULATION (HT), U+000A LINE FEED (LF), U+000D CARRIAGE > + RETURN (CR) and U+0020 SPACE — also need to be escaped. <a > + href="#refsXML">[XML]</a></p> > > My reading of the XML spec suggests space does not need to be escaped. > > http://www.w3.org/TR/REC-xml/#AVNormalize > > "For a white space character (#x20, #xD, #xA, #x9), append a space > character (#x20) to the normalized value." > > i.e. a literal space and an escaped space results in the same thing. > > The paragraph "If the attribute type is not CDATA, then the XML > processor MUST further process the normalized attribute value by > discarding any leading and trailing space (#x20) characters, and by > replacing sequences of space (#x20) characters by a single space (#x20) > character." does not apply since srcdoc is a CDATA attribute. > > Should I file a bug report? > ... That's why I was asking Henri, and I agree with that conclusion. I was going to file a bug once this is understood; but go ahead if you want to raise it :-)- Note that the text wrt whitespace was added based on Philip's feedback in <http://lists.w3.org/Archives/Public/public-html/2010Mar/0429.html>: > But in attribute values, U+000D and U+000A and U+0009 must be escaped > too. (Depending on DTD you might also need to escape any leading or > trailing U+0020 and at least one of any adjacent pair of U+0020s, I think.) So the text in the CP may have been too conservative, taking the case that there may be a DTD changing the whitespace handling into account. Best regards, Julian
Received on Thursday, 14 October 2010 10:03:12 UTC