- From: Mark Moore <mark.moore@notlimited.com>
- Date: Thu, 15 Jul 2004 10:30:02 -0700
- To: <www-style@w3.org>
Justin, Sorry if I wasn't clear. I'm definitely not suggesting a UA should "parse what [it] cannot parse." According to the CSS21 CR spec, UA's are supposed to "read CSS 2.1 style sheets and discard parts they don't understand" [1]. The section on at-rules specifically says conformant UA's must ignore any unrecognized at-rules, and parsing must continue just after the terminating semicolon or block. [2] At-rules (including the @charset rule) start from the '@' character and consist of "everything up to and including the next semicolon (;) or the next block, whichever comes first." Although the detailed description of the @charset rule is extremely clear on how UA's unambiguously determine the charset of a given style sheet, it is surprisingly silent on how conformant UA's should treat style sheets encoded in a charset the UA doesn't understand. Let's consider an example from the CR where it's suggested that an author use the "ISO-8859-7" charset if "the style sheet contains a lot of Greek characters." [3] A UA that doesn't understand the Greek charset (ISO-8859-7) will find the style sheet perfectly syntactically correct. It will be able to parse the sheet, but the results will almost certainly not be what the author or user expect (given the style sheet has "a lot of Greek characters" for a reason). The spec documents a consistent philosophy of ignoring property names and property values that are unrecognized, ignoring unrecognized at-rules, and silently discarding rules with unrecognized constructs all of which is specifically designed to allow future enhancements in a predictably backward compatible manner. Given this philosophy, the only consistent way for a UA to handle a style sheet encoded in a future charset is to ignore the entire style sheet. The previous discussion assumes the @charset rule is properly formed, but the IANA charset name is unrecognized by the UA. The situation is different for a malformed @charset rule, including an @charset rule that has a malformed IANA charset identifier. (IANA specifies that valid character set names consist of 1 - 40 characters from the printable characters of US-ASCII. [4]) In this case, the @charset rule should be considered invalid, and the UA should continue parsing immediately after the terminating semicolon (or block) as described in section 4.1.5. [2] [1] http://www.w3.org/TR/2004/CR-CSS21-20040225/intro.html#q6 [2] http://www.w3.org/TR/2004/CR-CSS21-20040225/syndata.html#at-rules [3] http://www.w3.org/TR/2004/CR-CSS21-20040225/syndata.html#q24 [4] http://www.iana.org/assignments/character-sets > -----Original Message----- > From: www-style-request@w3.org [mailto:www-style-request@w3.org] On Behalf > Of Justin Wood > Sent: Wednesday, July 14, 2004 10:52 PM > To: W3C Style List > Subject: Re: Handling unrecognized or unsupported charset > > > Mark Moore wrote: > > >I can't find anything that specifies what a UA should do when it > encounters > >a style sheet with an unsupported charset, or what it should do when the > >charset identifier is malformed. > > > >My assumption is that the entire stylesheet should be ignored, but I > didn't > >see it covered. > > > > > > > > > > > If a UA cannot parse a style-sheet due to malformed charset, or other > means available to it, what else is to be expected?, we definately > cannot say "You must parse that which you cannot parse". > > I fail to see the need for clarification in this, not all UA's can do > charset-interpret on malformed ones, nor can we expect them to (imo). > If you could propose an interoperable way to do anything other than > "ignore" I would personally love to hear the suggestion for an errate > change. > > ~Justin Wood (non W3C WG member)
Received on Thursday, 15 July 2004 13:33:17 UTC