- From: Boris Zbarsky <bzbarsky@MIT.EDU>
- Date: Fri, 20 Feb 2004 18:20:21 -0500
- To: Bert Bos <bert@w3.org>
- Cc: "WWW Style" <www-style@w3.org>
> (1) it seems F 0.8 doesn't read utf-16 style sheets that have @charset... Something's wrong there.... I'll look into it... > (2) it seems F 0.8 ignores the style sheet if the BOM and @charset conflict Actually, no. It just goes with @charset over the BOM. In this case, tries to treat the sheet as ISO-8859-1. This causes the classname to be corrupted (such that the style is not applied) and the body rule to be discarded, because the bytes of the UTF-8 BOM are treated as part of a selector that subsequently fails to parse (due to its having things like '@', '"', and ';' in it). So the parser skips the whole declaration block. > I also omitted the CHARSET parameter of the LINK element in HTML. Is > that a problem? It's only a problem if someone wants to link to a sheet they don't control that has no BOM/@charset/HTTP header and is not in the same encoding as the originating document.... > The algorithm for (2) would be as follows: > > 2a) If the first bytes are 00 00 FE FF, use UCS-4 (1234 order). > Remove those bytes. If they are followed by "@charset > <anything>;" remove that as well. What is the rationale for removing the @charset part, if I may ask? (Here by "remove" you mean "do not generate an @charset rule in the CSSOM," not "do not consider it in determining the charset to use", I assume?). This may be difficult to do depending on how sheets are parsed, and seems unnecessary.... > 2e) If the first bytes are FE FF xx, where xx is not 00, use UTF-16-BE. > Remove the first two bytes. If they are followed by "@charset > <anything>;", remove that as well. "xx" corresponds to two bytes here, I assume? > 2h) For all encodings X that the UA knows, starting with UTF-8, > UTF-16-BE and UTF-16-LE, if the first bytes correspond to > '@charset "X";' (case-insensitive) in encoding X, use that > encoding X and remove those bytes. This is the only really hard part.... > If we use the above in CSS 2.1 also, the question becomes if we will > have two implementations in the next few months. Because for CSS 2.1 > to make any sense, it should become a Recommendation soon, say before > October. Otherwise we might as well skip it and wait for CSS3. > > But so far, only Opera passes my little test. Quite frankly, most of the options we're discussing are very close to what Mozilla does already. Apart from the two comments I had on your algorithm above, the rest would be rather minor modifications. So as long as we decide on _something_ that works with existing content I think Mozilla will end up implementing it fairly quickly... I've thought about it some more, by the way, and I agree that in the presence of a BOM we should use that over the value of the @charset rule. Thank you for doing the testing work, Bert! Boris -- "Why can one call the time component of the preceding 4-vector by the name energy? For two reasons: First, because this time component has the correct units -- the units of mass..." -- From "Spacetime Physics" by Taylor and Wheeler
Received on Friday, 20 February 2004 18:20:22 UTC