- From: Chris Lilley <chris@w3.org>
- Date: Mon, 23 Feb 2004 19:30:21 +0100
- To: Boris Zbarsky <bzbarsky@MIT.EDU>
- Cc: Henri Sivonen <hsivonen@iki.fi>, "WWW Style" <www-style@w3.org>
On Monday, February 23, 2004, 6:32:06 PM, Boris wrote: >> people how aren't clued about character encodings are more likely to >> serve style sheets that work if treated as windows-1252 than to serve >> UTF-8. BZ> Only in Western Europe. Only in those parts of Western Europe that don't speak Greek or Turkish and don't use Macs. >> Also, for HTML browsers tend to default to windows-1252 regardless of the >> specs. BZ> What gave you this idea? Again, only in Western Europe, even if true (which I BZ> do not believe it is). I gather thatsome browsers treat 8859-1 as CP-1252 to catch the pages wich are actually CP-1252 but mislabelled as 8859-1. >> Using this heuristic also in case 3 instead of looking at the linking >> document would improve the cacheability of parsed style sheets with >> negligible actual breakage. BZ> Using this instead of looking at the linking document will break BZ> Japanese pages that use Shift_JIS and Japanese classnames and BZ> don't specify the encoding (lots and lots of those). In fact, such BZ> pages were the reason Mozilla added the "look at the linking BZ> document" thing, if I recall correctly.... Interesting. Of course, HTML browsers for Japanese speakers are set to autodetect among the few encodings used by Japanese language material (so they get, for example, 8859-1 pages all wrong) because the HTML files are typically served without any encoding information, too. So the CSS file gets set based on the encoding of a document, which was set by sniffing the byte stream and looking for characteristic patterns and byte frequencies. BZ> It really would be nice to only have to implement _one_ algorithm for this, of BZ> course.... Yes. It would also be nice if the algorithm for XML and the algorithm for CSS were identical except for s/encoding declaration/@charset/g http://www.w3.org/TR/2004/REC-xml11-20040204/#sec-guessing Lastly, as you can tell form the non-normative nature of Appendix E and the amusing fragment name, the best and correct way to indicate the encoding is by internal labelling; this should be the case in CSS as well. The presence of an override from a network protocol such as HTTP is a special case. For CSS, there are three sources of stylesheets and only one of those comes over HTTP, and that not all of the time. -- Chris Lilley mailto:chris@w3.org Chair, W3C SVG Working Group Member, W3C Technical Architecture Group
Received on Monday, 23 February 2004 13:30:20 UTC