- From: Bert Bos <bert@w3.org>
- Date: Sat, 21 Feb 2004 01:51:59 +0100
- To: "WWW Style" <www-style@w3.org>
Boris Zbarsky writes: > > I also omitted the CHARSET parameter of the LINK element in HTML. Is > > that a problem? > > It's only a problem if someone wants to link to a sheet they don't control that > has no BOM/@charset/HTTP header and is not in the same encoding as the > originating document.... I don't mind having that rule. I was just wondering if it was worth adding. XHTML 2, e.g., doesn't have CHARSET on LINK anymore. > > > The algorithm for (2) would be as follows: > > > > 2a) If the first bytes are 00 00 FE FF, use UCS-4 (1234 order). > > Remove those bytes. If they are followed by "@charset > > <anything>;" remove that as well. > > What is the rationale for removing the @charset part, if I may ask? (Here by > "remove" you mean "do not generate an @charset rule in the CSSOM," not "do not > consider it in determining the charset to use", I assume?). This may be > difficult to do depending on how sheets are parsed, and seems unnecessary.... I haven't thought about the CSSOM. I just meant "remove" in the sense that the CSS parser may start parsing after the @charset, since it has already been dealt with (namely by ignoring it). What goes into the CSSOM I don't know. Does @charset belong in the CSSOM? Once the style sheet is parsed and stored in memory structures, the notion of character encoding doesn't exist anymore. At that point, you can't do anything to the @charset rule that changes the way the document is displayed. Unless you are writing an editor, why should you store the @charset? It just takes up space in RAM. > > > 2e) If the first bytes are FE FF xx, where xx is not 00, use UTF-16-BE. > > Remove the first two bytes. If they are followed by "@charset > > <anything>;", remove that as well. > > "xx" corresponds to two bytes here, I assume? One byte is enough, I think. > > > 2h) For all encodings X that the UA knows, starting with UTF-8, > > UTF-16-BE and UTF-16-LE, if the first bytes correspond to > > '@charset "X";' (case-insensitive) in encoding X, use that > > encoding X and remove those bytes. > > This is the only really hard part.... :-) That's why I hid it all in one sentence. Bert -- Bert Bos ( W 3 C ) http://www.w3.org/ http://www.w3.org/people/bos/ W3C/ERCIM bert@w3.org 2004 Rt des Lucioles / BP 93 +33 (0)4 92 38 76 92 06902 Sophia Antipolis Cedex, France
Received on Friday, 20 February 2004 19:52:23 UTC