Re: UTF-8 signature / BOM in CSS

Richard Ishida wrote to <mailto:www-international@w3.org> on 27 November 
2003 in "New test page: UTF-8 signature / BOM" 
(<mid:004c01c3b4fe$6e7414e0$6601a8c0@w3c40upc3ma3j2>):

>[ Note that I've also seen the first line or so of external CSS style
>sheets fail if a utf-8 signature is present.  If I can remember how to
>replicate the failure, I'll write another test file to cover that. ]

The effect that you observe with Cascading Style Sheets is not a failure 
according to the CSS2 Recommendation. In short, the byte order mark (U+FEFF 
zero width no-break space) counts as an identifier component.

CSS level 2 specifies that any character from U+00A1 to U+FFFFFF can appear 
bare in an identifier or starting an identifier [CAC2]. Level 2.1 (a work 
in progress as of 27 November 2003) has the same allowance [CAC21]. Suppose 
I have a single-ruleset style sheet:

td { padding: 1ex; }

Now suppose that my CSS editor prepends a BOM to the style sheet. According 
to specification, the effect should be the same as if the style sheet were:

\FEFFtd { padding: 1ex; }

In other words, the CSS engine has a selector that matches against any 
element whose element-type name is the sequence

U+FEFF, U+0074, U+0064.

The selector must not match against "td" elements.

The syntax module in level 3 (a work in progress as of 27 November 2003) 
[SYN3] is adapting to the times by allowing an initial U+FEFF as an 
encoding signature rather than as an identifier character:

"A byte order mark (BOM), as described in section 2.7 of [UNICODE310], that 
begins the sequence of characters should not be considered, for purposes of 
applying the grammar below, as a part of the style sheet."

CSS level 1 didn't allow U+FEFF to appear in style sheets (although its 
representation through numeric escapes was permitted) [SYN1]. This is 
mostly a historical footnote; CSS level 1, although officially a 
Recommendation, has the effective status of a superseded Candidate 
Recommendation.

[CAC2]
Bert Bos; H虧on Wium Lie; Chris Lilley; Ian Jacobs.
"Characters and case", section 4.1.3 of CSS level 2 specification.
W3C Recommendation.
12 May 1998.
<http://www.w3.org/TR/REC-CSS2/syndata.html#q4>.

[CAC21]
Bert Bos; Tantek ヌelik; Ian Hickson; H虧on Wium Lie.
"Characters and case", section 4.1.3 of CSS level 2.1 specification.
W3C Working Draft.
15 September 2003.
<http://www.w3.org/TR/2003/WD-CSS21-20030915/syndata.html#q6>

[SYN3]
L. David Baron, editor.
"CSS style sheet representation", section 3 of CSS3 syntax module.
W3C Working Draft.
13 August 2003.
<http://www.w3.org/TR/2003/WD-css3-syntax-20030813/#css-style>.

[SYN1]
H虧on Wium Lie;Bert Bos.
"CSS1 grammar", Appendix B of revised CSS1 specification.
W3C Recommendation.
11 January 1999.
<http://www.w3.org/TR/REC-CSS1#appendix-b>.

Received on Saturday, 29 November 2003 10:10:43 UTC