Re: [CSS21] response to issue 115 (and 44) from Bert Bos on 2004-02-21 (www-style@w3.org from February 2004)

From: Bert Bos <bert@w3.org>
Date: Sat, 21 Feb 2004 01:51:59 +0100
To: "WWW Style" <www-style@w3.org>
Message-ID: <16438.43951.414332.953912@lanalana.inria.fr>

Boris Zbarsky writes:

> > I also omitted the CHARSET parameter of the LINK element in HTML. Is
> > that a problem?
> 
> It's only a problem if someone wants to link to a sheet they don't control that
> has no BOM/@charset/HTTP header and is not in the same encoding as the
> originating document....

I don't mind having that rule. I was just wondering if it was worth
adding. XHTML 2, e.g., doesn't have CHARSET on LINK anymore.

> 
> > The algorithm for (2) would be as follows:
> > 
> >   2a) If the first bytes are 00 00 FE FF, use UCS-4 (1234 order).
> >       Remove those bytes. If they are followed by "@charset
> >       <anything>;" remove that as well.
> 
> What is the rationale for removing the @charset part, if I may ask?  (Here by
> "remove" you mean "do not generate an @charset rule in the CSSOM," not "do not
> consider it in determining the charset to use", I assume?).  This may be
> difficult to do depending on how sheets are parsed, and seems unnecessary....

I haven't thought about the CSSOM. I just meant "remove" in the sense
that the CSS parser may start parsing after the @charset, since it has
already been dealt with (namely by ignoring it).

What goes into the CSSOM I don't know. Does @charset belong in the
CSSOM? Once the style sheet is parsed and stored in memory structures,
the notion of character encoding doesn't exist anymore. At that point,
you can't do anything to the @charset rule that changes the way the
document is displayed. Unless you are writing an editor, why should
you store the @charset? It just takes up space in RAM.

> 
> >   2e) If the first bytes are FE FF xx, where xx is not 00, use UTF-16-BE.
> >       Remove the first two bytes. If they are followed by "@charset
> >       <anything>;", remove that as well.
> 
> "xx" corresponds to two bytes here, I assume?

One byte is enough, I think.

> 
> >   2h) For all encodings X that the UA knows, starting with UTF-8,
> >       UTF-16-BE and UTF-16-LE, if the first bytes correspond to
> >       '@charset "X";' (case-insensitive) in encoding X, use that
> >       encoding X and remove those bytes.
> 
> This is the only really hard part....

:-) That's why I hid it all in one sentence.

Bert
-- 
  Bert Bos                                ( W 3 C ) http://www.w3.org/
  http://www.w3.org/people/bos/                              W3C/ERCIM
  bert@w3.org                             2004 Rt des Lucioles / BP 93
  +33 (0)4 92 38 76 92            06902 Sophia Antipolis Cedex, France

Received on Friday, 20 February 2004 19:52:23 UTC