Re: Handling unrecognized or unsupported charset

Boris,

Boris> ...The step right before tokenization is to convert the sheet to
Unicode ...

Is this mandatory or defined somewhere as a "must"? I hope no...

We found that such pre conversion (for CSS and HTML) is pretty expensive
from computational and resource point of view.
So we are doing "inline conversion": blind ASCII-7/UTF8 assumption until
first @charset.

Andrew Fedoniouk.
http://terrainformatica.com


----- Original Message ----- 
From: "Boris Zbarsky" <bzbarsky@MIT.EDU>
To: "Mark Moore" <mark.moore@notlimited.com>
Cc: <www-style@w3.org>
Sent: Thursday, July 15, 2004 11:45 AM
Subject: Re: Handling unrecognized or unsupported charset


>
> Mark Moore wrote:
> > A UA that doesn't understand the Greek charset (ISO-8859-7) will find
the
> > style sheet perfectly syntactically correct.  It will be able to parse
the
> > sheet
>
> No, it will not.  It will not even be able to tokenize the sheet.  The
step
> right before tokenization is to convert the sheet to Unicode and then work
with
> the Unicode character stream, not the byte stream.  If the conversion to
Unicode
> cannot be performed, tokenization cannot even start.
>
> The only way to attempt to deal short of discarding the sheet is to assume
some
> other charset and use that.  Say take the charset from the next step of
the
> charset selection algorithm.
>
> > In this case, the @charset rule should be considered invalid, and the UA
> > should continue parsing immediately after the terminating semicolon (or
> > block) as described in section 4.1.5. [2]
>
> This is not specified anywhere in the spec.  Are you suggesting that it be
> specified?
>
> -Boris
>

Received on Thursday, 15 July 2004 17:37:50 UTC