- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Thu, 30 Sep 2010 16:42:14 +0200
- To: Julian Reschke <julian.reschke@gmx.de>
- Cc: Philip Taylor <pjt47@cam.ac.uk>, Anne van Kesteren <annevk@opera.com>, public-html@w3.org
Julian Reschke, Thu, 30 Sep 2010 15:23:15 +0200: > On 30.09.2010 13:06, Leif Halvard Silli wrote: >> Julian Reschke, Thu, 18 Mar 2010 13:09:36 +0100: >> >>> "Note: Due to restrictions of the XML syntax, in XML the U+003C >>> LESS-THAN SIGN (<) needs be escaped as well. Also, XML's whitespace >>> characters -- U+0009 CHARACTER TABULATION (HT), U+000A LINE FEED >>> (LF), U+000D CARRIAGE RETURN (CR) and U+0020 SPACE -- need to be >>> escaped in order to prevent attribute-value normalization ([XML], >>> Section 3.3.3)." >> >> (This is a follow-up to my reply in the poll.) > > For the record: I don't see a reply from you here: > <http://www.w3.org/2002/09/wbs/40318/issue-103-objection-poll/results> That must be an error in the system, I have got WBS Mailer confirmation some minutes before the dead line of the poll - I ope that get fixed. >> To say that all XML white space characters have to be escaped, seems >> more complicated than what is correct. >> >> 1 #xA will, in CDATA attributes (and @srcdoc is CDATA) be >> normalized to x#20. Thus, if white space is significant, then >> #xA must be escaped. The same goes for #x9. But if it is not >> significant, then lack of escaping is no danger. >> 2 when it comes to #xD, then it is in principle not >> regulated by Section 3.3.3. of XML 1.0 but by section 2.3: >> ]] all #xD characters literally present in an XML document are >> either removed or replaced by #xA [[ >> Thus it is "a black sheep" which is generally treated as #xA. >> If one really needs to avoid the default of being treated as >> a non-escaped #xA, then it must be escaped. >> 3 however, it is not true that one needs to escape U+0020, see >> Henri's last two comments in bug 9965 (against Polyglot spec). >> ... > > Well, we certainly wouldn't want to put all of this into the note. > Would saying "significant whitespace" address your concern? Yes, that should work, I think. As long as you also remove #x20 from the list of characters which is necessary to escape. (Both " " and "<space>" get normalized to "<space>".) > Keep in mind that the advice is for people who already have a > character sequence, and need to figure out what to do in order to put > it into the attribute. At this point, it's not trivial to distinguish > between significant and insignificant anymore. But perhaps it is rather trivial whether it is part of an attribute or no? If one can escape all #xA, #xD and #x9, without any side effects, then fine. I guess it will work, since it is parsed twice: first 
 becomes normalized to <line-feed>, and then, in step 2, <line-feed> (if it is part of an CDATA attribute) becomes normalized to <space> -- leif hlavard silli
Received on Thursday, 30 September 2010 14:42:52 UTC