Re: Change proposal for issue 103, was: ISSUE-103 change proposal

Julian Reschke, Thu, 18 Mar 2010 13:09:36 +0100:
> "Note: Due to restrictions of the XML syntax, in XML the U+003C 
> LESS-THAN SIGN (<) needs be escaped as well. Also, XML's whitespace 
> characters -- U+0009 CHARACTER TABULATION (HT), U+000A LINE FEED 
> (LF), U+000D CARRIAGE RETURN (CR) and U+0020 SPACE -- need to be 
> escaped in order to prevent attribute-value normalization ([XML], 
> Section 3.3.3)."

(This is a follow-up to my reply in the poll.)

To say that all XML white space characters have to be escaped, seems 
more complicated than what is correct.

1 #xA will, in CDATA attributes (and @srcdoc is CDATA) be
  normalized to x#20. Thus, if white space is significant, then
  #xA must be escaped. The same goes for #x9. But if it is not
  significant, then lack of escaping is no danger.
2 when it comes to #xD, then it is in principle not 
  regulated by Section 3.3.3. of XML 1.0 but by section 2.3:
  ]] all #xD characters literally present in an XML document are
     either removed or replaced by #xA [[
  Thus it is "a black sheep" which is generally treated as #xA.
  If one really needs to avoid the default of being treated as
  a non-escaped #xA, then it must be escaped.
3 however, it is not true that one needs to escape U+0020, see
  Henri's last two comments in bug 9965 (against Polyglot spec).

Btw, the problematisation of white space in XML, seemes coupled with a 
simplification of the situation in HTML5. (Note how the spec says that 
one "only" needs to ...). But can anyone say how &#xD; (the character 
reference) is rendered in HTML5? 

  # leading &#xD; (NCR) in @alt is treated as line break in Opera,
    IE8 and Opera. But not in Firefox and Webkit.
  # I just discovered that certain combinations of escaped white 
    space and
    can set IE8 into quirks mode ... 
leif halvard silli

Received on Thursday, 30 September 2010 11:07:01 UTC