- From: Bert Bos <bert@w3.org>
- Date: Mon, 23 Feb 2004 02:13:17 +0100
- To: "WWW Style" <www-style@w3.org>
Bjoern Hoehrmann writes: > * Bert Bos wrote: > >So, if we assume that we can change the browsers in time, what do we > >want in CSS3? I'd say this: > > > > 1) Trust the HTTP header (or similar out-of-band information in other > > protocols). If the file then appears to start with a U+FEFF > > character, ignore it. If there is a @charset at the start or after > > that U+FEFF, ignore it. Otherwise, start parsing at the first > > character. > > I think this is wrong (0xhh refers to octet hh), suppose you have > > Content-Type: text/css;charset=MacThai > > 0xDB p { color: white } > > This is currently equivalent to > > \00FEFF p { color: white } Is 0xDB in MacThai a "zero-width no-break space," or rather a U+2060 "word joiner"? I think Unicode recommends against using zero-width no-break space for anything else than a BOM. But apart from that, I agree that my text can be misread. Replace it with something like: "If the file then starts with the BOM for that encoding, [...]." The concept of BOM only exists in the two "pseudo-encodings" UTF-32 and UTF-16, and, strangely enough, in UTF-8 and UTF-7. In any other encoding, a zero-width no-break space is just that, a zero-width no break space. So that takes care of MacThai. > It should also be pointed out, that (at least for HTTP and MIME) > explicit information in the header is required, otherwise processors > would never read a BOM or @charset because the encoding already has been > determined as ISO-8859-1 (HTTP) or US-ASCII (MIME) (and in fact, a > processor that chooses to adhere to CSS must violate HTTP/MIME...) Well, HTML already set the precedent to explicitly contradict HTTP. I think CSS will have to side with HTML on this one: in the absence of info in the HTTP headers, don't assume iso-8859-1, but use auto-detection. > > >But what about CSS 2.1? > > > >If we use the above in CSS 2.1 also, the question becomes if we will > >have two implementations in the next few months. Because for CSS 2.1 > >to make any sense, it should become a Recommendation soon, say before > >October. Otherwise we might as well skip it and wait for CSS3. > > IMO, this is not acceptable, CSS 2.1 and CSS 3.0 must use the same rules > for, after all, the same thing. Maybe I can live with the same rules but > different requirement levels, say, processors are STRONGLY RECOMMENDED > to do this and will be required in CSS 3.0; though I consider such > tricks of little use for interoperability, it just adds complexity and > confusion which probably rather reduces interoperability at some point. Sure, it would be good if documents on the Web (not just CSS documents) were properly labeled with MIME type and encoding. It's much easier and quicker to do things based on a HEAD command than based on sniffing the contents of the document. CSS 2.1 isn't doing anything to help that, but it isn't doing anything contrary either. It just has more important things (for CSS) to do. > > You did not address what to do if the processor encounters an encoding > error. Do I have to? CSS gives rules for parsing things that might someday cease to be errors. But I don't see what these non-characters can ever be useful for, so why should I say how to parse them? Bert -- Bert Bos ( W 3 C ) http://www.w3.org/ http://www.w3.org/people/bos/ W3C/ERCIM bert@w3.org 2004 Rt des Lucioles / BP 93 +33 (0)4 92 38 76 92 06902 Sophia Antipolis Cedex, France
Received on Sunday, 22 February 2004 20:13:44 UTC