Re: Change proposal for issue 103, was: ISSUE-103 change proposal

On 30.09.2010 13:06, Leif Halvard Silli wrote:
> Julian Reschke, Thu, 18 Mar 2010 13:09:36 +0100:
>> "Note: Due to restrictions of the XML syntax, in XML the U+003C
>> LESS-THAN SIGN (<) needs be escaped as well. Also, XML's whitespace
>> characters -- U+0009 CHARACTER TABULATION (HT), U+000A LINE FEED
>> (LF), U+000D CARRIAGE RETURN (CR) and U+0020 SPACE -- need to be
>> escaped in order to prevent attribute-value normalization ([XML],
>> Section 3.3.3)."
> (This is a follow-up to my reply in the poll.)

For the record: I don't see a reply from you here: 

> To say that all XML white space characters have to be escaped, seems
> more complicated than what is correct.
> 1 #xA will, in CDATA attributes (and @srcdoc is CDATA) be
>    normalized to x#20. Thus, if white space is significant, then
>    #xA must be escaped. The same goes for #x9. But if it is not
>    significant, then lack of escaping is no danger.
> 2 when it comes to #xD, then it is in principle not
>    regulated by Section 3.3.3. of XML 1.0 but by section 2.3:
>    ]] all #xD characters literally present in an XML document are
>       either removed or replaced by #xA [[
>    Thus it is "a black sheep" which is generally treated as #xA.
>    If one really needs to avoid the default of being treated as
>    a non-escaped #xA, then it must be escaped.
> 3 however, it is not true that one needs to escape U+0020, see
>    Henri's last two comments in bug 9965 (against Polyglot spec).
> ...

Well, we certainly wouldn't want to put all of this into the note. Would 
saying "significant whitespace" address your concern?

Keep in mind that the advice is for people who already have a character 
sequence, and need to figure out what to do in order to put it into the 
attribute. At this point, it's not trivial to distinguish between 
significant and insignificant anymore.

Best regards, Julian

Received on Thursday, 30 September 2010 13:23:54 UTC