[css-syntax] Changes from CSS 2.1 and Selectors 3 from Simon Sapin on 2013-05-27 (www-style@w3.org from May 2013)

From: Simon Sapin <simon.sapin@exyr.org>
Date: Mon, 27 May 2013 16:25:33 +0800
To: www-style list <www-style@w3.org>
Message-ID: <51A3187D.2090709@exyr.org>

Hi,

The current ED has some changes that are not noted in a Changes section. 
I agree with most of these changes, but they should still be noted, and 
might need a WG resolution.

I believe this should make the Changes sections complete for the 
2013-05-24 ED.


Section 3.2: Character encoding detection has changed a lot since 2.1. 
Changes include:

* Don’t try to detect @charset with ASCII-incompatible patterns of bytes
* Ignore @charset if it specifies an ASCII-incompatible encoding (which 
would make the @charset rule itself decode as garbage.)
* Don’t "ignore style sheets in unknown encodings." (whatever that 
means, since even 2.1 specfies UTF-8 as a fallback.)
* Refer to the WHATWG Encoding standard rather than IANA, and as a 
consequence: (These might not need to be listed explicitly.)
   - A BOM takes precedence over anything else.
   - Drop support for UTF-32, EBCDIC, IBM1026 and GSM 03.38.
   - Disallow supporting more than the specified a finite list of 
encodings and labels.
   - Specify decoding error handling. (The default, which css-syntax 
does not override, is to insert U+FFFD REPLACEMENT CHARACTER and recover.)


Any U+0000 character in the CSS source is replaced by U+FFFD. An 
hexadecimal escape that would decode as U+0000 (eg. \00) instead decodes 
as U+FFFD. CSS 2.1 makes one or both of these two cases explicitly 
undefined, although it is unclear which.

I think this covers the same security concerns that lead Mozilla to 
decode such escapes to U+0030 zero.
https://bugzilla.mozilla.org/show_bug.cgi?id=228856


The definition of "non-ASCII" was changed from "U+00A0 and up" to the 
same as everyone outside of CSS, which is "U+0080 and up".


BAD_COMMENT tokens are now considered the same as normal comments, and 
neither are actually emitted by the tokenizer.


The <unicode-range> token now is more restrictive. Maybe it doesn’t need 
to be, now that css3-fonts considers any "empty range" as invalid and 
drops the declaration. (Although I think we also still need a resolution 
on *that* change.)


EOF in the middle of a quoted string or url() in not an error anymore, 
and produces a <string> or <url> token rather than BAD_STRING or BAD_URI.

However such "bad" tokens were not actually errors in 2.1, according to 
the EOF error handling rule:

http://www.w3.org/TR/CSS21/syndata.html#unexpected-eof

IMO this inconsistency in 2.1 is a bug that should be fixed, the way 
Syntax 3 does.


Lists of declarations now also accept at-rules. I support this change, 
see http://lists.w3.org/Archives/Public/www-style/2013Apr/0506.html


<an+b> is less restrictive with whitespace than in Selectors 3.


Cheers,
-- 
Simon Sapin

Received on Monday, 27 May 2013 08:26:05 UTC