Re: [CSS21] response to issue 115 (and 44) from Jukka K. Korpela on 2004-02-23 (www-style@w3.org from February 2004)

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Mon, 23 Feb 2004 16:09:43 +0200 (EET)
To: WWW Style <www-style@w3.org>
Message-ID: <Pine.GSO.4.58.0402231601250.9066@korppi.cs.tut.fi>

On Mon, 23 Feb 2004, Henri Sivonen wrote:

> On Feb 21, 2004, at 00:26, Bert Bos wrote:
>
> >  4) If all else fails, assume UTF-8.
>
> Why not windows-1252 (with the few undefined bytes mapped to
> *something* so that all byte streams can be converted some
> "characters")?

Either guess is bound to be wrong in some cases. And if the guess turns
out to result in something containing undefined octets, I think we can
relatively safely guess that the guess was wrong.

> Anyway, it's just
> plain stupid to use non-ASCII outside comments in a style sheet that
> doesn't have a character encoding label and doesn't have a BOM, so in
> the relatively rare cases where this heuristic fails, the author would
> have only him/herself to blame.

Indeed. And currently most style sheets contain Ascii only.

This is all about error processing, unless I'm missing something.
And it seems that it's about a small minority of cases (_within_ the
current minority of style sheets for which this is relevant at all).
I think it would best to simply state that if the encoding cannot
be determined in the three given steps, browsers
a) may apply whatever error processing they find suitable
b) should assume Ascii, if the style sheet
contains only octets with most significant bit set to zero.
(Should browsers give a warning? Maybe. But this is debatable,
and need not be stated in a specification.)

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Received on Monday, 23 February 2004 09:09:45 UTC