Re: UTF-8 signature / BOM in CSS from François Yergeau on 2003-12-08 (www-international@w3.org from October to December 2003)

From: François Yergeau <francois@yergeau.com>
Date: Sun, 07 Dec 2003 20:45:12 -0500
To: Etan Wexler <ewexler@stickdog.com>
Cc: www-international@w3.org, www-style@w3.org
Message-id: <3FD3D7A8.2030608@yergeau.com>

Etan Wexler a écrit  :
 > That given,
> does my processing model make sense? I'll repeat for convenience:
> 
> The encoding scheme has been detected and noted, including any 
> significant endian-ness. No BOM is necessary for the tokenizer. The BOM 
> is stripped from the internal representation of the style sheet. The 
> remaining byte stream moves along to the tokenizer. The tokenizer 
> consults the noted encoding scheme in order to properly interpret the 
> bytes.

It does make sense, IMHO.  It has the advantage of cleanliness that you 
point out (cleanliness from the PoV of the parser, which doesn't see a 
character that has become meaningless at this point).

But it loses the advantage of making the BOM more explicit and therefore 
less mysterious.  Stripping the BOM may also cause problems for 
processes that depend on the "file" remaining intact, such as digital 
signatures or more simply just character counting.  Not an 
insurmountable obstacle of course, but an issue to keep in mind before 
giving too much weight to cleanliness for just one process.

Regards,

-- 
François

Received on Sunday, 7 December 2003 20:53:40 UTC