Re: UTF-8 signature / BOM in CSS

On Friday, December 5, 2003, 10:30:37 PM, Etan wrote:

Tex Texin wrote to>>, <>, 
EW> <>, <>, and 
EW> <> on 2 December 2003 in "Re: UTF-8 signature /
EW> BOM in CSS" (<>):

>> I am not sure I would agree with stripping non-characters. I would
>> rather reject documents with junk in them than silently clean them up.

EW> I used to be of the junk-rejection mentality. Ian Hickson, time, and
EW> probably some brain-altering medication have convinced me of the case
EW> for parsing at all costs.

Probably the influence of too much HTML.

I refer you to the TAG Architecture document

Principle: Error recovery

  Silent recovery from error is harmful.

>> In the case of the UTF-8 BOM, I would not object to simply stripping
>> it,

The BOM is not an error. Nor is it a character, invalid or otherwise.

Invalid characters are errors

These should be treated separately.

EW> I assumed that the CSS engine would make use of out-of-band information
EW> to indicate the detected encoding scheme.

Please check the definition of that out of band information in in
particular what it says about when a BOM must be present.

EW> They're not just based on U+FEFF, they are U+FEFF. There are various
EW> byte sequences, yes, but each encodes the same character.

Almost correct. There are various byte sequences, all of which encode
U+FEFF, whichis a byte order mark and not a character.


Received on Saturday, 6 December 2003 10:48:23 UTC