RE: Handling unrecognized or unsupported charset from Mark Moore on 2004-07-15 (www-style@w3.org from July 2004)

From: Mark Moore <mark.moore@notlimited.com>
Date: Thu, 15 Jul 2004 10:30:02 -0700
To: <www-style@w3.org>
Message-Id: <20040715173317.715ACA1698@frink.w3.org>
Justin,

Sorry if I wasn't clear.  I'm definitely not suggesting a UA should "parse
what [it] cannot parse."

According to the CSS21 CR spec, UA's are supposed to "read CSS 2.1 style
sheets and discard parts they don't understand" [1].

The section on at-rules specifically says conformant UA's must ignore any
unrecognized at-rules, and parsing must continue just after the terminating
semicolon or block. [2]

At-rules (including the @charset rule) start from the '@' character and
consist of "everything up to and including the next semicolon (;) or the
next block, whichever comes first."

Although the detailed description of the @charset rule is extremely clear on
how UA's unambiguously determine the charset of a given style sheet, it is
surprisingly silent on how conformant UA's should treat style sheets encoded
in a charset the UA doesn't understand.

Let's consider an example from the CR where it's suggested that an author
use the "ISO-8859-7" charset if "the style sheet contains a lot of Greek
characters." [3]

A UA that doesn't understand the Greek charset (ISO-8859-7) will find the
style sheet perfectly syntactically correct.  It will be able to parse the
sheet, but the results will almost certainly not be what the author or user
expect (given the style sheet has "a lot of Greek characters" for a reason).

The spec documents a consistent philosophy of ignoring property names and
property values that are unrecognized, ignoring unrecognized at-rules, and
silently discarding rules with unrecognized constructs all of which is
specifically designed to allow future enhancements in a predictably backward
compatible manner.

Given this philosophy, the only consistent way for a UA to handle a style
sheet encoded in a future charset is to ignore the entire style sheet.

The previous discussion assumes the @charset rule is properly formed, but
the IANA charset name is unrecognized by the UA.

The situation is different for a malformed @charset rule, including an
@charset rule that has a malformed IANA charset identifier.  (IANA specifies
that valid character set names consist of 1 - 40 characters from the
printable characters of US-ASCII. [4])

In this case, the @charset rule should be considered invalid, and the UA
should continue parsing immediately after the terminating semicolon (or
block) as described in section 4.1.5. [2]


[1] http://www.w3.org/TR/2004/CR-CSS21-20040225/intro.html#q6
[2] http://www.w3.org/TR/2004/CR-CSS21-20040225/syndata.html#at-rules
[3] http://www.w3.org/TR/2004/CR-CSS21-20040225/syndata.html#q24
[4] http://www.iana.org/assignments/character-sets


> -----Original Message-----
> From: www-style-request@w3.org [mailto:www-style-request@w3.org] On Behalf
> Of Justin Wood
> Sent: Wednesday, July 14, 2004 10:52 PM
> To: W3C Style List
> Subject: Re: Handling unrecognized or unsupported charset
> 
> 
> Mark Moore wrote:
> 
> >I can't find anything that specifies what a UA should do when it
> encounters
> >a style sheet with an unsupported charset, or what it should do when the
> >charset identifier is malformed.
> >
> >My assumption is that the entire stylesheet should be ignored, but I
> didn't
> >see it covered.
> >
> >
> >
> >
> >
> If a UA cannot parse a style-sheet due to malformed charset, or other
> means available to it, what else is to be expected?, we definately
> cannot say "You must parse that which you cannot parse".
> 
> I fail to see the need for clarification in this, not all UA's can do
> charset-interpret on malformed ones, nor can we expect them to (imo).
> If you could propose an interoperable way to do anything other than
> "ignore" I would personally love to hear the suggestion for an errate
> change.
> 
> ~Justin Wood (non W3C WG member)
Received on Thursday, 15 July 2004 13:33:17 UTC