- From: Jungshik Shin <jshin@i18nl10n.com>
- Date: Tue, 24 Feb 2004 17:25:37 +0900 (KST)
- To: www-style@w3.org
On Tue, 24 Feb 2004, Henri Sivonen wrote: > On Feb 23, 2004, at 22:27, Jungshik Shin wrote: > > >>>> 4) If all else fails, assume UTF-8. > comments contain non-ASCII bytes that don't form valid UTF-8 sequences, > the CSS spec needs to require either a recovering UTF-8 decoder or a > default encoding that otherwise makes all bytes streams valid. Note that '#4' was the last resort. Assuming the character encoding of linking documents usually works (when stylesheets are associated with html/xml documents). > >> Indeed. And currently most style sheets contain Ascii only. > > > > True in Western Europe and most other parts of the world. Not true in > > Japan, China and Korea. I'm not talking about comments here. A number > > of stylesheets list font-family names in Chinese, Japanese and Korean > > in legacy > > encodings (GB2312, Big5, Shift_JIS, EUC-JP, EUC-KR, etc). > > So why on earth don't they label their style sheets with the > appropriate character encoding label? The UTF-8 default guess does not > help at all with GB2312, Big5, Shift_JIS, EUC-JP, EUC-KR, etc. As already pointed out by others, for exactly the same reason as many Western European stylesheets are not properly tagged as in 'ISO-8859-1' or 'Windows-1252' even though they have non-ASCII characters although in comment. For cases like this, Boris' 'heuristics' of assuming the encoding of 'linking documents' helps a lot. > For the cases you're using as the counter examples for windows-1252, > UTF-8 is a wrong guess, too. Absolutely, but at least it's not 'culturally biased' as 'Windows-1252' is. I admit that this may sound silly, but we're dealing with an I18N issue here. Moreover, 'utf-8 default' only comes into play when everything else fails. Jungshik
Received on Tuesday, 24 February 2004 03:25:40 UTC