Re: UTF-8 signature / BOM in CSS from François Yergeau on 2003-12-06 (www-international@w3.org from October to December 2003)

From: François Yergeau <francois@yergeau.com>
Date: Sat, 06 Dec 2003 14:56:03 -0500
To: Chris Lilley <chris@w3.org>
Cc: Etan Wexler <ewexler@stickdog.com>, Tex Texin <tex@i18nguy.com>, Richard Ishida <ishida@w3.org>, www-international@w3.org, w3c-css-wg@w3.org, w3c-i18n-ig@w3.org, www-style@w3.org
Message-id: <3FD23453.6000009@yergeau.com>

Chris Lilley a écrit  :
> Almost correct. There are various byte sequences, all of which encode
> U+FEFF, whichis a byte order mark and not a character.

That's one way to see it, but another way is to consider it a character 
and to bring it squarely in the grammar of a language, like I proposed 
recently for CSS:

  EncodingDecl = [BOM][@charset=<foobar>]

with the additional constraint that EncodingDecl must occur at the start 
of the stylesheet.

The BOM is a pretty mysterious beast for many, with a somewhat fuzzy 
status, and the above has the advantage of making it and its role 
explicit, instead of living in a some strange layer somewhere between 
byte sequences and character sequences.

-- 
François

Received on Saturday, 6 December 2003 14:58:04 UTC