[css2] Character encoding errata from Simon Sapin on 2014-04-08 (www-style@w3.org from April 2014)

From: Simon Sapin <simon.sapin@exyr.org>
Date: Tue, 08 Apr 2014 15:05:53 +0100
To: www-style <www-style@w3.org>
Message-ID: <53440241.7080403@exyr.org>

I’ll assume here that the behavior described in CSS Syntax Level 3 is 
the one we want. If you disagree, please start a new thread with the 
[css-syntax] tag as that’s a separate issue.

We currently have an errata for CSS 2 about the character encoding of 
stylesheets:

http://www.w3.org/Style/css2-updates/REC-CSS2-20110607-errata.html#s.4.4

This errata is both incorrect and insufficient.

Incorrect because it makes a BOM take precendence "If rule 1 above (an 
HTTP "charset" parameter or similar) yields a character encoding and it 
is one of UTF-8, UTF-16 or UTF-32". Level 3 makes a BOM always take 
precedence. Detecting a BOM is essentially a "rule 0" that comes before 
"rule 1".

Insufficient because the behavior described in Level 2 still differs in 
a number of ways form Level 3, which:

* Only accepts a more restricted set of encoding names/labels. (I.e. 
refer to the Encoding standard rather than the IANA registry.)

* Only looks for ASCII-compatible @charset declarations. (I.e. reduce 
the byte pattern table to only the "40 63 68 61 72 73 65 74 20 22 (XX)* 
22 3B" row.)

* Falls back to the next encoding hint or UTF-8 instead of ever ignoring 
the stylesheet.

(This list may not be exhaustive.)

-- 
Simon Sapin

Received on Tuesday, 8 April 2014 14:06:17 UTC