Re: UTF-8 signature / BOM in CSS

From: François Yergeau <francois@yergeau.com>
Date: Sat, 06 Dec 2003 14:56:03 -0500
To: Chris Lilley <chris@w3.org>
Cc: Etan Wexler <ewexler@stickdog.com>, Tex Texin <tex@i18nguy.com>, Richard Ishida <ishida@w3.org>, www-international@w3.org, w3c-css-wg@w3.org, w3c-i18n-ig@w3.org, www-style@w3.org
Message-id: <3FD23453.6000009@yergeau.com>

Chris Lilley a écrit  :
> Almost correct. There are various byte sequences, all of which encode
> U+FEFF, whichis a byte order mark and not a character.

That's one way to see it, but another way is to consider it a character 
and to bring it squarely in the grammar of a language, like I proposed 
recently for CSS:

  EncodingDecl = [BOM][@charset=<foobar>]

with the additional constraint that EncodingDecl must occur at the start 
of the stylesheet.

The BOM is a pretty mysterious beast for many, with a somewhat fuzzy 
status, and the above has the advantage of making it and its role 
explicit, instead of living in a some strange layer somewhere between 
byte sequences and character sequences.

