Re: UTF-8 signature / BOM in CSS from Ernest Cline on 2003-12-08 (www-style@w3.org from December 2003)

From: Ernest Cline <ernestcline@mindspring.com>
Date: Sun, 7 Dec 2003 20:55:33 -0500
To: www-style@w3.org
Cc: w3c-i18n-ig@w3.org, "Etan Wexler" <ewexler@stickdog.com>
Message-ID: <410-22003121815533562@mindspring.com>

> [Original Message]
> From: Etan Wexler <ewexler@stickdog.com>
>
> Fran�ois Yergeau wrote to <mailto:w3c-i18n-ig@w3.org>:
>
>
> > The BOM is a non-breaking space, quite the opposite of a separator.
> >
> > The BOM is really in a class in itself, my proposal was to name it 
> > explicitly in the grammar, appearing only at the very start of the 
> > stylesheet.
>
> What happens when a tokenizer finds a U+FEFF somewhere else in a style 
> sheet? The codepoint may be invalid there, granted, but the direction 
> that the CSS Working Group is heading is to specify error handling for 
> every scenario. If we accept Chris Lilley's assertion that U+FEFF is 
> not a character, stripping occurrences of U+FEFF before tokenization 
> seems very reasonable. If U+FEFF is a character (and I don't care to 
> enter that theological debate), stripping it may still be the sensible 
> option. What's the Yergeau recommendation? The Davis recommendation?

Well my recommendation is that aside from special casing its use as
the BOM at the start of data, that it be treated the same as any other
class Cf character such as the soft hyphen. Doing otherwise risks
invalidating under CSS3 a stylesheet that was perfectly acceptable
(if a bit odd) under previous versions of CSS.  Given the nature of CSS
doing so is a big no-no.  Making stuff that was acceptable earlier
unacceptable should only be done when there is a compelling
reason to do so.  Other than a theological debate over whether it is
a character, I see no reason to do so, and that reason is not compelling
to me.

Received on Sunday, 7 December 2003 20:55:32 UTC