Re: UTF-8 signature / BOM in CSS from Etan Wexler on 2003-12-07 (www-international@w3.org from October to December 2003)

From: Etan Wexler <ewexler@stickdog.com>
Date: Sat, 6 Dec 2003 19:08:19 -0800
To: François Yergeau <francois@yergeau.com>, Chris Lilley <chris@w3.org>, David Baron <dbaron@dbaron.org>, www-international@w3.org, w3c-css-wg@w3.org, w3c-i18n-ig@w3.org, www-style@w3.org
Message-Id: <9C0E219C-2862-11D8-9BCD-000502CB1B77@stickdog.com>

François Yergeau wrote to <mailto:www-international@w3.org>, 
<mailto:w3c-css-wg@w3.org>, <mailto:w3c-i18n-ig@w3.org>, and 
<mailto:www-style@w3.org> on 6 December 2003 in "Re: UTF-8 signature / 
BOM in CSS" (<mid:3FD23453.6000009@yergeau.com>):

> [...] another way is to consider [the BOM] a character and to bring it 
> squarely in the grammar of a language, like I proposed recently for 
> CSS:
>
>  EncodingDecl = [BOM][@charset=<foobar>]
>
> with the additional constraint that EncodingDecl must occur at the 
> start of the stylesheet.

Is the BOM to be considered an identifier character? That's possible. 
Then an identifier consisting solely of one U+FEFF would be allowed at 
the beginning of a style sheet. But the codepoint U+FEFF could just as 
well be tokenized as its own type and grouped with "S" (space tokens) 
and comments as a separator of other tokens. This latter approach is 
not backwards compatible in a formal sense, but how many existing 
Cascading Style Sheets make use of U+FEFF in identifiers? About zero, 
I'd guess.

-- 
Etan Wexler.

Received on Saturday, 6 December 2003 22:07:34 UTC