- From: Mark Moore <mark.moore@notlimited.com>
- Date: Thu, 15 Jul 2004 13:44:45 -0700
- To: <www-style@w3.org>
> From: www-style-request@w3.org [mailto:www-style-request@w3.org] On Behalf > Of Boris Zbarsky > Sent: Thursday, July 15, 2004 11:46 AM > No, it will not. It will not even be able to tokenize the sheet. The > step > right before tokenization is to convert the sheet to Unicode and then work > with > the Unicode character stream, not the byte stream. If the conversion to > Unicode > cannot be performed, tokenization cannot even start. I appreciate the point you raise, Boris. I hadn't considered the requirements of section 4.4.1, "Conforming user agents must correctly map to Unicode all characters in any character encodings that they recognize (or they must behave as if they did)." I don't see how tokenization will be prevented, though, because A) the CR doesn't specify what should happen if the conforming UA *doesn't* recognize the character encoding, and B) the UA is only required to "behave as if" it performed the conversion. Is there some other requirement I'm missing that specifies conformant UA's must terminate tokenization when presented with a style sheet in an unrecognized character encoding? > The only way to attempt to deal short of discarding the sheet is to assume > some > other charset and use that. Say take the charset from the next step of > the > charset selection algorithm. I agree with you absolutely! This is where I think section 4.4 needs to be more clarified. The *best* strategy (IMHO) would be to specify that conformant UA's "ignore" style sheets whose character encoding is not recognized by the UA (where the charset is determined by the documented 5 step prioritization algorithm). Your idea of allowing (or requiring) the UA to take the next step in the 5 step selection algorithm would be a reasonable "best effort" attempt as long as the UA was restricted from advancing past step 3. Any other algorithm that allowed a conformant UA to try and guess what character encoding might be similar to the unrecognized encoding sounds very unpredictable. For instance, if the HTTP header specified a valid but unrecognized charset, and the UA used a UTF-8 mapping to decode the byte stream (as would be required by step 5), the resultant character and token stream would almost certainly be boloxed. The best bet would be to require the UA to toss style sheets with specified, but unrecognized character encodings. > > In this case, the @charset rule should be considered invalid, and the UA > > should continue parsing immediately after the terminating semicolon (or > > block) as described in section 4.1.5. > > This is not specified anywhere in the spec. Are you suggesting that it be > specified? Yes, I am suggesting some added clarification. If I'm reading the CR correctly, this requirement is partially specified in section 4.1.5, and section 4.1.1. "A CSS user agent that encounters an unrecognized at-rule must ignore the whole of the at-rule and continue parsing after it." Specifically, if the @charset rule doesn't parse as the token string "ATKEYWORD STRING ;", it's hard to see how a conformant UA can do anything but continue parsing the style sheet immediately after the semicolon or block that terminates the invalid @charset production, and completely ignore the unrecognized at-rule (e.g. "@charset 'UTF-8' screen;" might be a reasonable future extension). The unspecified case is if the STRING token cannot possibly represent a valid IANA character set name. This would be the case if the charset name was more than 40 characters long, empty, or contained a character outside the printable US-ASCII character codes. The invalid @charset rule is more interesting (IMHO) than the invalid IANA character set name since the bad IANA ID is likely to be caught during development. On the other hand, without specifying how conformant UA's should handle invalid @charset rules more completely, the CR limits future CSS expansion. As currently specified, conformant UA's are required to ignore the invalid @charset rule and continue parsing the remainder of the style sheet using a character mapping that may or may not be related to the one the style sheet uses. At the very least, it's ambiguous whether an invalid @charset rule requires the UA to continue trying to determine the style sheet's character encoding. Requiring conformant UA's to discard/ignore style sheets when either the encoding is unrecognized, or the @charset rule is invalid makes things more predictable, flexible, and benefits from a very simple implementation.
Received on Thursday, 15 July 2004 16:47:26 UTC