Re: [CSS21] response to issue 115 (and 44) from Boris Zbarsky on 2004-02-20 (www-style@w3.org from February 2004)

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Fri, 20 Feb 2004 18:20:21 -0500
To: Bert Bos <bert@w3.org>
Cc: "WWW Style" <www-style@w3.org>
Message-Id: <200402202320.i1KNKL3U015522@no-knife.mit.edu>

> (1) it seems F 0.8 doesn't read utf-16 style sheets that have @charset...

Something's wrong there....  I'll look into it...

> (2) it seems F 0.8 ignores the style sheet if the BOM and @charset conflict

Actually, no.  It just goes with @charset over the BOM.  In this case, tries to
treat the sheet as ISO-8859-1.

This causes the classname to be corrupted (such that the style is not applied)
and the body rule to be discarded, because the bytes of the UTF-8 BOM are
treated as part of a selector that subsequently fails to parse (due to its
having things like '@', '"', and ';' in it).  So the parser skips the whole
declaration block.

> I also omitted the CHARSET parameter of the LINK element in HTML. Is
> that a problem?

It's only a problem if someone wants to link to a sheet they don't control that
has no BOM/@charset/HTTP header and is not in the same encoding as the
originating document....

> The algorithm for (2) would be as follows:
> 
>   2a) If the first bytes are 00 00 FE FF, use UCS-4 (1234 order).
>       Remove those bytes. If they are followed by "@charset
>       <anything>;" remove that as well.

What is the rationale for removing the @charset part, if I may ask?  (Here by
"remove" you mean "do not generate an @charset rule in the CSSOM," not "do not
consider it in determining the charset to use", I assume?).  This may be
difficult to do depending on how sheets are parsed, and seems unnecessary....

>   2e) If the first bytes are FE FF xx, where xx is not 00, use UTF-16-BE.
>       Remove the first two bytes. If they are followed by "@charset
>       <anything>;", remove that as well.

"xx" corresponds to two bytes here, I assume?

>   2h) For all encodings X that the UA knows, starting with UTF-8,
>       UTF-16-BE and UTF-16-LE, if the first bytes correspond to
>       '@charset "X";' (case-insensitive) in encoding X, use that
>       encoding X and remove those bytes.

This is the only really hard part....

> If we use the above in CSS 2.1 also, the question becomes if we will
> have two implementations in the next few months. Because for CSS 2.1
> to make any sense, it should become a Recommendation soon, say before
> October. Otherwise we might as well skip it and wait for CSS3.
> 
> But so far, only Opera passes my little test.

Quite frankly, most of the options we're discussing are very close to what
Mozilla does already.  Apart from the two comments I had on your algorithm
above, the rest would be rather minor modifications.  So as long as we decide
on _something_ that works with existing content I think Mozilla will end up
implementing it fairly quickly...

I've thought about it some more, by the way, and I agree that in the presence
of a BOM we should use that over the value of the @charset rule.

Thank you for doing the testing work, Bert!

Boris
-- 
"Why can one call the time component of the preceding 4-vector 
by the name energy?  For two reasons:  First, because this time 
component has the correct units -- the units of mass..."
             -- From "Spacetime Physics" by Taylor and Wheeler

Received on Friday, 20 February 2004 18:20:22 UTC