- From: Chris Lilley <chris@w3.org>
- Date: Sat, 6 Dec 2003 16:48:22 +0100
- To: Etan Wexler <ewexler@stickdog.com>
- Cc: Tex Texin <tex@i18nguy.com>, Richard Ishida <ishida@w3.org>, www-international@w3.org, <w3c-css-wg@w3.org>, <w3c-i18n-ig@w3.org>, <www-style@w3.org>
On Friday, December 5, 2003, 10:30:37 PM, Etan wrote: Tex Texin wrote to>>, <mailto:www-international@w3.org>, EW> <mailto:w3c-css-wg@w3.org>, <mailto:w3c-i18n-ig@w3.org>, and EW> <mailto:www-style@w3.org> on 2 December 2003 in "Re: UTF-8 signature / EW> BOM in CSS" (<mid:3FCD6609.7C5A8F4F@i18nguy.com>): >> I am not sure I would agree with stripping non-characters. I would >> rather reject documents with junk in them than silently clean them up. EW> I used to be of the junk-rejection mentality. Ian Hickson, time, and EW> probably some brain-altering medication have convinced me of the case EW> for parsing at all costs. Probably the influence of too much HTML. I refer you to the TAG Architecture document http://www.w3.org/TR/webarch/#error-handling Principle: Error recovery Silent recovery from error is harmful. >> In the case of the UTF-8 BOM, I would not object to simply stripping >> it, The BOM is not an error. Nor is it a character, invalid or otherwise. Invalid characters are errors These should be treated separately. EW> I assumed that the CSS engine would make use of out-of-band information EW> to indicate the detected encoding scheme. Please check the definition of that out of band information in in particular what it says about when a BOM must be present. EW> They're not just based on U+FEFF, they are U+FEFF. There are various EW> byte sequences, yes, but each encodes the same character. Almost correct. There are various byte sequences, all of which encode U+FEFF, whichis a byte order mark and not a character. -- Chris mailto:chris@w3.org
Received on Saturday, 6 December 2003 10:48:23 UTC