Re: [CSS21] response to issue 115 (and 44) from Chris Lilley on 2004-02-23 (www-style@w3.org from February 2004)

From: Chris Lilley <chris@w3.org>
Date: Mon, 23 Feb 2004 19:30:21 +0100
To: Boris Zbarsky <bzbarsky@MIT.EDU>
Cc: Henri Sivonen <hsivonen@iki.fi>, "WWW Style" <www-style@w3.org>
Message-ID: <936899584.20040223193021@w3.org>

On Monday, February 23, 2004, 6:32:06 PM, Boris wrote:

>> people how aren't clued about character encodings are more likely to
>> serve style sheets that work if treated as windows-1252 than to serve
>> UTF-8.

BZ> Only in Western Europe.

Only in those parts of Western Europe that don't speak Greek or
Turkish and don't use Macs.

>> Also, for HTML browsers tend to default to windows-1252 regardless of the
>> specs.

BZ> What gave you this idea?  Again, only in Western Europe, even if true (which I
BZ> do not believe it is).

I gather thatsome browsers treat 8859-1 as CP-1252 to catch the pages
wich are actually CP-1252 but mislabelled as 8859-1.

>> Using this heuristic also in case 3 instead of looking at the linking
>> document would improve the cacheability of parsed style sheets with
>> negligible actual breakage.

BZ> Using this instead of looking at the linking document will break
BZ> Japanese pages that use Shift_JIS and Japanese classnames and
BZ> don't specify the encoding (lots and lots of those). In fact, such
BZ> pages were the reason Mozilla added the "look at the linking
BZ> document" thing, if I recall correctly....

Interesting. Of course, HTML browsers for Japanese speakers are set to
autodetect among the few encodings used by Japanese language material
(so they get, for example, 8859-1 pages all wrong) because the HTML
files are typically served without any encoding information, too.

So the CSS file gets set based on the encoding of a document, which
was set by sniffing the byte stream and looking for characteristic
patterns and byte frequencies.

BZ> It really would be nice to only have to implement _one_ algorithm for this, of
BZ> course....

Yes.

It would also be nice if the algorithm for XML and the algorithm for
CSS were identical except for s/encoding declaration/@charset/g

http://www.w3.org/TR/2004/REC-xml11-20040204/#sec-guessing

Lastly, as you can tell form the non-normative nature of Appendix E
and the amusing fragment name, the best and correct way to indicate
the encoding is by internal labelling; this should be the case in CSS
as well. The presence of an override from a network protocol such as
HTTP is a special case. For CSS, there are three sources of
stylesheets and only one of those comes over HTTP, and that not all of
the time.

-- 
 Chris Lilley                    mailto:chris@w3.org
 Chair, W3C SVG Working Group
 Member, W3C Technical Architecture Group

Received on Monday, 23 February 2004 13:30:20 UTC