Re: Working Group Decision on ISSUE-103 srcdoc-xml-escaping

On 14.10.2010 11:35, Simon Pieters wrote:
> ...
> The new text says that U+0020 needs to be escaped.
> <p class="note">Due to restrictions of <span>the XML syntax</span>,
> - in XML a number of other characters need to be escaped also to
> - ensure correctness.</p>
> + in XML the U+003C LESS-THAN SIGN character (&lt;) needs to be
> + escaped as well. In order to prevent <a
> + href="">attribute-value
> + normalization</a>, XML's whitespace characters &mdash; U+0009
> + RETURN (CR) and U+0020 SPACE &mdash; also need to be escaped. <a
> + href="#refsXML">[XML]</a></p>
> My reading of the XML spec suggests space does not need to be escaped.
> "For a white space character (#x20, #xD, #xA, #x9), append a space
> character (#x20) to the normalized value."
> i.e. a literal space and an escaped space results in the same thing.
> The paragraph "If the attribute type is not CDATA, then the XML
> processor MUST further process the normalized attribute value by
> discarding any leading and trailing space (#x20) characters, and by
> replacing sequences of space (#x20) characters by a single space (#x20)
> character." does not apply since srcdoc is a CDATA attribute.
> Should I file a bug report?
> ...

That's why I was asking Henri, and I agree with that conclusion.

I was going to file a bug once this is understood; but go ahead if you 
want to raise it :-)-

Note that the text wrt whitespace was added based on Philip's feedback 
in <>:

> But in attribute values, U+000D and U+000A and U+0009 must be escaped
> too. (Depending on DTD you might also need to escape any leading or
> trailing U+0020 and at least one of any adjacent pair of U+0020s, I think.)

So the text in the CP may have been too conservative, taking the case 
that there may be a DTD changing the whitespace handling into account.

Best regards, Julian

Received on Thursday, 14 October 2010 10:03:12 UTC