Re: UTF-8 signature / BOM in CSS from François Yergeau on 2003-12-08 (www-style@w3.org from December 2003)

From: François Yergeau <francois@yergeau.com>
Date: Sun, 7 Dec 2003 21:08:58 -0500 (EST)
To: Etan Wexler <ewexler@stickdog.com>
Cc: www-style@w3.org, w3c-i18n-ig@w3.org
Message-id: <3FD3DD3D.3000209@yergeau.com>

Etan Wexler a écrit  :
> Yes. The codepoint U+FEFF is currently allowed in and at the start of 
> identifiers in the CSS 2 Recommendation, the CSS 2.1 Working Draft, and 
> the CSS3 syntax module Working Draft.

Ouch, that's pretty bad.  Too late for 2.0, of course, and perhaps also 
for 2.1, but please fix that in 3.0.

> The latter has a token type "BOM", 
> but since the "BOM" production comes after the "IDENT" production, a 
> U+FEFF codepoint would always end up as an "IDENT" or part of an "IDENT".

Ouch again!  What's the point of having the BOM production, then?

> Maybe I'm not so dull, after all.

:-) :-)

> If we accept Chris Lilley's assertion that U+FEFF is not 
> a character, stripping occurrences of U+FEFF before tokenization seems 
> very reasonable. If U+FEFF is a character (and I don't care to enter 
> that theological debate), stripping it may still be the sensible option. 

I think U+FEFF anywhere but as a BOM should be some kind of error, but 
like Tex I'm quite allergic to just stripping it, not having access to 
the proper medication.  Smacks too much of silent error recovery and 
invitation to sloppiness.

IIRC CSS has well-defined rules for ignoring a whole rule if there's an 
error in the selector(s) and ignoring a property if there's an error in 
it.  I guess these long-standing rules should just apply, nothing new.

Regards,

-- 
François

Received on Monday, 8 December 2003 05:23:44 UTC