Re: UTF-8 signature / BOM in CSS

On Friday, December 5, 2003, 10:30:37 PM, Etan wrote:


Tex Texin wrote to>>, <mailto:www-international@w3.org>, 
EW> <mailto:w3c-css-wg@w3.org>, <mailto:w3c-i18n-ig@w3.org>, and 
EW> <mailto:www-style@w3.org> on 2 December 2003 in "Re: UTF-8 signature /
EW> BOM in CSS" (<mid:3FCD6609.7C5A8F4F@i18nguy.com>):

>> I am not sure I would agree with stripping non-characters. I would
>> rather reject documents with junk in them than silently clean them up.

EW> I used to be of the junk-rejection mentality. Ian Hickson, time, and
EW> probably some brain-altering medication have convinced me of the case
EW> for parsing at all costs.

Probably the influence of too much HTML.

I refer you to the TAG Architecture document
http://www.w3.org/TR/webarch/#error-handling

Principle: Error recovery

  Silent recovery from error is harmful.



>> In the case of the UTF-8 BOM, I would not object to simply stripping
>> it,

The BOM is not an error. Nor is it a character, invalid or otherwise.

Invalid characters are errors

These should be treated separately.

EW> I assumed that the CSS engine would make use of out-of-band information
EW> to indicate the detected encoding scheme.

Please check the definition of that out of band information in in
particular what it says about when a BOM must be present.


EW> They're not just based on U+FEFF, they are U+FEFF. There are various
EW> byte sequences, yes, but each encodes the same character.

Almost correct. There are various byte sequences, all of which encode
U+FEFF, whichis a byte order mark and not a character.


-- 
 Chris                            mailto:chris@w3.org

Received on Saturday, 6 December 2003 10:48:23 UTC