Re: [CSS21] response to issue 115 (and 44) from Jungshik Shin on 2004-02-23 (www-style@w3.org from February 2004)

From: Jungshik Shin <jshin@i18nl10n.com>
Date: Tue, 24 Feb 2004 05:46:27 +0900 (KST)
To: www-style@w3.org
Message-ID: <Pine.LNX.4.58.0402240529560.9245@jshin.net>
On Mon, 23 Feb 2004, Chris Lilley wrote:

> On Monday, February 23, 2004, 6:32:06 PM, Boris wrote:
>
>
> >> people how aren't clued about character encodings are more likely to
> >> serve style sheets that work if treated as windows-1252 than to serve
> >> UTF-8.
>
> BZ> Only in Western Europe.
>
> Only in those parts of Western Europe that don't speak Greek or
> Turkish and don't use Macs.

  I didn't know 'Western Europe' is that large ;-)


> >> Also, for HTML browsers tend to default to windows-1252 regardless of the
> >> specs.
>
> BZ> What gave you this idea?  Again, only in Western Europe, even if true (which I
> BZ> do not believe it is).
>
> I gather thatsome browsers treat 8859-1 as CP-1252 to catch the pages
> wich are actually CP-1252 but mislabelled as 8859-1.

 No, Boris wasn't talking about that (ISO-8859-1 vs Windows-1252). He
meant that Japanese users set the default encoding in their browser
to Shift_JIS or EUC-JP, Korean users set that to EUC-KR, Greek users set
it to ISO-8859-7 (or its Windows codepage extension), etc.

> >> Using this heuristic also in case 3 instead of looking at the linking
> >> document would improve the cacheability of parsed style sheets with
> >> negligible actual breakage.
>
> BZ> Using this instead of looking at the linking document will break
> BZ> Japanese pages that use Shift_JIS and Japanese classnames and
> BZ> don't specify the encoding (lots and lots of those). In fact, such
> BZ> pages were the reason Mozilla added the "look at the linking
> BZ> document" thing, if I recall correctly....
>
> Interesting. Of course, HTML browsers for Japanese speakers are set to
> autodetect among the few encodings used by Japanese language material
> (so they get, for example, 8859-1 pages all wrong) because the HTML

  Well, some browsers have 'universal' encoding detector in addition to
langauge/script-specific encoding detectors.

> files are typically served without any encoding information, too.

  I wonder how typical is typical. Do you have any hard number?
I don't think it's that bad.  I usually turn off the auto-detection with
the default encoding set to EUC-KR. I rarely have to override the encoding
manually. Well, my web browsing is mostly limited to English and Korean...

> So the CSS file gets set based on the encoding of a document, which
> was set by sniffing the byte stream and looking for characteristic
> patterns and byte frequencies.

  Auto-detection is just one of several methods by which the encoding of
a document can be determined. Users can manually set the encoding of a
linking document and the result is propagated to linked-in documents.


> It would also be nice if the algorithm for XML and the algorithm for
> CSS were identical except for s/encoding declaration/@charset/g
>
> http://www.w3.org/TR/2004/REC-xml11-20040204/#sec-guessing

  I can't agree with you more on this.

  Jungshik
Received on Monday, 23 February 2004 15:46:29 UTC