RE: UTF-8 signature / BOM in CSS

Etan,

Many thanks for this clear expos.  

I wonder whether CSS can introduce a change to CSS2.1 at this stage to
clarify that the BOM - particularly any UTF-8 signature - should not be
considered part of the following text.

Comment from CSS WG welcome.

RI

============
Richard Ishida
W3C

contact info: http://www.w3.org/People/Ishida/ 

http://www.w3.org/International/ 
http://www.w3.org/International/geo/ 

W3C Internationalization FAQs
http://www.w3.org/International/questions.html
RSS feed: http://www.w3.org/International/questions.rss



> -----Original Message-----
> From: www-international-request@w3.org 
> [mailto:www-international-request@w3.org] On Behalf Of Etan 
> Wexler (by way of Martin Duerst <duerst@w3.org>)
> Sent: 29 November 2003 14:15
> To: www-international@w3.org
> Subject: Re: UTF-8 signature / BOM in CSS
> 
> 
> 
> 
> 
> 
> Richard Ishida wrote to <mailto:www-international@w3.org> on 
> 27 November 
> 2003 in "New test page: UTF-8 signature / BOM" 
> (<mid:004c01c3b4fe$6e7414e0$6601a8c0@w3c40upc3ma3j2>):
> 
> >[ Note that I've also seen the first line or so of external 
> CSS style 
> >sheets fail if a utf-8 signature is present.  If I can 
> remember how to 
> >replicate the failure, I'll write another test file to cover that. ]
> 
> The effect that you observe with Cascading Style Sheets is 
> not a failure 
> according to the CSS2 Recommendation. In short, the byte 
> order mark (U+FEFF 
> zero width no-break space) counts as an identifier component.
> 
> CSS level 2 specifies that any character from U+00A1 to 
> U+FFFFFF can appear 
> bare in an identifier or starting an identifier [CAC2]. Level 
> 2.1 (a work 
> in progress as of 27 November 2003) has the same allowance 
> [CAC21]. Suppose 
> I have a single-ruleset style sheet:
> 
> td { padding: 1ex; }
> 
> Now suppose that my CSS editor prepends a BOM to the style 
> sheet. According 
> to specification, the effect should be the same as if the 
> style sheet were:
> 
> \FEFFtd { padding: 1ex; }
> 
> In other words, the CSS engine has a selector that matches 
> against any 
> element whose element-type name is the sequence
> 
> U+FEFF, U+0074, U+0064.
> 
> The selector must not match against "td" elements.
> 
> The syntax module in level 3 (a work in progress as of 27 
> November 2003) 
> [SYN3] is adapting to the times by allowing an initial U+FEFF as an 
> encoding signature rather than as an identifier character:
> 
> "A byte order mark (BOM), as described in section 2.7 of 
> [UNICODE310], that 
> begins the sequence of characters should not be considered, 
> for purposes of 
> applying the grammar below, as a part of the style sheet."
> 
> CSS level 1 didn't allow U+FEFF to appear in style sheets 
> (although its 
> representation through numeric escapes was permitted) [SYN1]. This is 
> mostly a historical footnote; CSS level 1, although officially a 
> Recommendation, has the effective status of a superseded Candidate 
> Recommendation.
> 
> [CAC2]
> Bert Bos; H̝on Wium Lie; Chris Lilley; Ian Jacobs.
> "Characters and case", section 4.1.3 of CSS level 2 
> specification. W3C Recommendation. 12 May 1998. 
<http://www.w3.org/TR/REC-CSS2/syndata.html#q4>.

[CAC21]
Bert Bos; Tantek elik; Ian Hickson; H̝on Wium Lie.
"Characters and case", section 4.1.3 of CSS level 2.1 specification. W3C
Working Draft. 15 September 2003.
<http://www.w3.org/TR/2003/WD-CSS21-20030915/syndata.html#q6>

[SYN3]
L. David Baron, editor.
"CSS style sheet representation", section 3 of CSS3 syntax module. W3C
Working Draft. 13 August 2003.
<http://www.w3.org/TR/2003/WD-css3-syntax-20030813/#css-style>.

[SYN1]
H̝on Wium Lie;Bert Bos.
"CSS1 grammar", Appendix B of revised CSS1 specification.
W3C Recommendation.
11 January 1999.
<http://www.w3.org/TR/REC-CSS1#appendix-b>.

Received on Tuesday, 2 December 2003 09:53:59 UTC