Re: [CSS21] response to issue 115 (and 44) from Bert Bos on 2004-02-23 (www-style@w3.org from February 2004)

From: Bert Bos <bert@w3.org>
Date: Mon, 23 Feb 2004 02:13:17 +0100
To: "WWW Style" <www-style@w3.org>
Message-ID: <16441.21421.418687.467467@lanalana.inria.fr>
Bjoern Hoehrmann writes:
> * Bert Bos wrote:
> >So, if we assume that we can change the browsers in time, what do we
> >want in CSS3? I'd say this:
> >
> > 1) Trust the HTTP header (or similar out-of-band information in other
> >    protocols). If the file then appears to start with a U+FEFF
> >    character, ignore it. If there is a @charset at the start or after
> >    that U+FEFF, ignore it. Otherwise, start parsing at the first
> >    character.
> 
> I think this is wrong (0xhh refers to octet hh), suppose you have
> 
>   Content-Type: text/css;charset=MacThai
> 
>   0xDB p { color: white }
> 
> This is currently equivalent to
> 
>   \00FEFF p { color: white }

Is 0xDB in MacThai a "zero-width no-break space," or rather a U+2060
"word joiner"? I think Unicode recommends against using zero-width
no-break space for anything else than a BOM.

But apart from that, I agree that my text can be misread. Replace it
with something like: "If the file then starts with the BOM for that
encoding, [...]." The concept of BOM only exists in the two
"pseudo-encodings" UTF-32 and UTF-16, and, strangely enough, in UTF-8
and UTF-7. In any other encoding, a zero-width no-break space is just
that, a zero-width no break space. So that takes care of MacThai.

> It should also be pointed out, that (at least for HTTP and MIME)
> explicit information in the header is required, otherwise processors
> would never read a BOM or @charset because the encoding already has been
> determined as ISO-8859-1 (HTTP) or US-ASCII (MIME) (and in fact, a
> processor that chooses to adhere to CSS must violate HTTP/MIME...)

Well, HTML already set the precedent to explicitly contradict HTTP. I
think CSS will have to side with HTML on this one: in the absence of
info in the HTTP headers, don't assume iso-8859-1, but use
auto-detection.


> 
> >But what about CSS 2.1? 
> >
> >If we use the above in CSS 2.1 also, the question becomes if we will
> >have two implementations in the next few months. Because for CSS 2.1
> >to make any sense, it should become a Recommendation soon, say before
> >October. Otherwise we might as well skip it and wait for CSS3.
> 
> IMO, this is not acceptable, CSS 2.1 and CSS 3.0 must use the same rules
> for, after all, the same thing. Maybe I can live with the same rules but
> different requirement levels, say, processors are STRONGLY RECOMMENDED
> to do this and will be required in CSS 3.0; though I consider such
> tricks of little use for interoperability, it just adds complexity and
> confusion which probably rather reduces interoperability at some point.

Sure, it would be good if documents on the Web (not just CSS
documents) were properly labeled with MIME type and encoding. It's
much easier and quicker to do things based on a HEAD command than
based on sniffing the contents of the document. CSS 2.1 isn't doing
anything to help that, but it isn't doing anything contrary either. It
just has more important things (for CSS) to do.

> 
> You did not address what to do if the processor encounters an encoding
> error.

Do I have to? CSS gives rules for parsing things that might someday
cease to be errors. But I don't see what these non-characters can ever
be useful for, so why should I say how to parse them?



Bert
-- 
  Bert Bos                                ( W 3 C ) http://www.w3.org/
  http://www.w3.org/people/bos/                              W3C/ERCIM
  bert@w3.org                             2004 Rt des Lucioles / BP 93
  +33 (0)4 92 38 76 92            06902 Sophia Antipolis Cedex, France
Received on Sunday, 22 February 2004 20:13:44 UTC