W3C home > Mailing lists > Public > www-style@w3.org > December 2003

Re: UTF-8 signature / BOM in CSS

From: François Yergeau <francois@yergeau.com>
Date: Sun, 7 Dec 2003 21:08:58 -0500 (EST)
Message-id: <3FD3DD3D.3000209@yergeau.com>
To: Etan Wexler <ewexler@stickdog.com>
Cc: www-style@w3.org, w3c-i18n-ig@w3.org



Etan Wexler a écrit  :
> Yes. The codepoint U+FEFF is currently allowed in and at the start of 
> identifiers in the CSS 2 Recommendation, the CSS 2.1 Working Draft, and 
> the CSS3 syntax module Working Draft.

Ouch, that's pretty bad.  Too late for 2.0, of course, and perhaps also 
for 2.1, but please fix that in 3.0.

> The latter has a token type "BOM", 
> but since the "BOM" production comes after the "IDENT" production, a 
> U+FEFF codepoint would always end up as an "IDENT" or part of an "IDENT".

Ouch again!  What's the point of having the BOM production, then?

> Maybe I'm not so dull, after all.

:-) :-)

> If we accept Chris Lilley's assertion that U+FEFF is not 
> a character, stripping occurrences of U+FEFF before tokenization seems 
> very reasonable. If U+FEFF is a character (and I don't care to enter 
> that theological debate), stripping it may still be the sensible option. 

I think U+FEFF anywhere but as a BOM should be some kind of error, but 
like Tex I'm quite allergic to just stripping it, not having access to 
the proper medication.  Smacks too much of silent error recovery and 
invitation to sloppiness.

IIRC CSS has well-defined rules for ignoring a whole rule if there's an 
error in the selector(s) and ignoring a property if there's an error in 
it.  I guess these long-standing rules should just apply, nothing new.

Regards,

-- 
François
Received on Monday, 8 December 2003 05:23:44 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 27 April 2009 13:54:25 GMT