Re: [CSS21] response to issue 115 (and 44)

On Tue, 24 Feb 2004, Henri Sivonen wrote:

> On Feb 23, 2004, at 22:27, Jungshik Shin wrote:
>
> >>>>  4) If all else fails, assume UTF-8.

> comments contain non-ASCII bytes that don't form valid UTF-8 sequences,
> the CSS spec needs to require either a recovering UTF-8 decoder or a
> default encoding that otherwise makes all bytes streams valid.

  Note that '#4' was the last resort. Assuming the character
encoding of linking documents usually works (when stylesheets are
associated with html/xml documents).


> >> Indeed. And currently most style sheets contain Ascii only.
> >
> >   True in Western Europe and most other parts of the world. Not true in
> > Japan, China and Korea. I'm not talking about comments here. A number
> > of stylesheets list font-family names in Chinese, Japanese and Korean
> > in legacy
> > encodings (GB2312, Big5, Shift_JIS, EUC-JP, EUC-KR, etc).
>
> So why on earth don't they label their style sheets with the
> appropriate character encoding label? The UTF-8 default guess does not
> help at all with GB2312, Big5, Shift_JIS, EUC-JP, EUC-KR, etc.

  As already pointed out by others, for exactly the same reason as
many Western European stylesheets are not properly tagged as in
'ISO-8859-1' or 'Windows-1252' even though they have non-ASCII characters
although in comment. For cases like this, Boris' 'heuristics' of
assuming the encoding of 'linking documents' helps a lot.

> For the cases you're using as the counter examples for windows-1252,
> UTF-8 is a wrong guess, too.

  Absolutely, but at least it's not 'culturally biased' as
'Windows-1252' is. I admit that this may sound silly, but we're dealing
with an I18N issue here. Moreover, 'utf-8 default' only comes into
play when everything else fails.

  Jungshik

Received on Tuesday, 24 February 2004 03:25:40 UTC