W3C home > Mailing lists > Public > public-html@w3.org > September 2010

Re: Change proposal for issue 103, was: ISSUE-103 change proposal

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Thu, 30 Sep 2010 13:06:09 +0200
To: Julian Reschke <julian.reschke@gmx.de>
Cc: Philip Taylor <pjt47@cam.ac.uk>, Anne van Kesteren <annevk@opera.com>, public-html@w3.org
Message-ID: <20100930130609894343.f5a691f9@xn--mlform-iua.no>
Julian Reschke, Thu, 18 Mar 2010 13:09:36 +0100:
 
> "Note: Due to restrictions of the XML syntax, in XML the U+003C 
> LESS-THAN SIGN (<) needs be escaped as well. Also, XML's whitespace 
> characters -- U+0009 CHARACTER TABULATION (HT), U+000A LINE FEED 
> (LF), U+000D CARRIAGE RETURN (CR) and U+0020 SPACE -- need to be 
> escaped in order to prevent attribute-value normalization ([XML], 
> Section 3.3.3)."

(This is a follow-up to my reply in the poll.)

To say that all XML white space characters have to be escaped, seems 
more complicated than what is correct.

1 #xA will, in CDATA attributes (and @srcdoc is CDATA) be
  normalized to x#20. Thus, if white space is significant, then
  #xA must be escaped. The same goes for #x9. But if it is not
  significant, then lack of escaping is no danger.
2 when it comes to #xD, then it is in principle not 
  regulated by Section 3.3.3. of XML 1.0 but by section 2.3:
  ]] all #xD characters literally present in an XML document are
     either removed or replaced by #xA [[
  Thus it is "a black sheep" which is generally treated as #xA.
  If one really needs to avoid the default of being treated as
  a non-escaped #xA, then it must be escaped.
3 however, it is not true that one needs to escape U+0020, see
  Henri's last two comments in bug 9965 (against Polyglot spec).

Btw, the problematisation of white space in XML, seemes coupled with a 
simplification of the situation in HTML5. (Note how the spec says that 
one "only" needs to ...). But can anyone say how &#xD; (the character 
reference) is rendered in HTML5? 

  # leading &#xD; (NCR) in @alt is treated as line break in Opera,
    IE8 and Opera. But not in Firefox and Webkit.
  # I just discovered that certain combinations of escaped white 
    space and
       *[attribute-with-escaped-white-space]:before
         {
           content:attr(attribute-with-escaped-white-space);
         }
    can set IE8 into quirks mode ... 
-- 
leif halvard silli
Received on Thursday, 30 September 2010 11:07:01 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:17:15 GMT