W3C home > Mailing lists > Public > www-style@w3.org > July 2004

Re: Handling unrecognized or unsupported charset

From: Andrew Fedoniouk <news@terrainformatica.com>
Date: Thu, 15 Jul 2004 14:36:33 -0700
Message-ID: <000601c46ab3$cd642880$eb01000a@AFedoniouk>
To: "Boris Zbarsky" <bzbarsky@MIT.EDU>, "Mark Moore" <mark.moore@notlimited.com>
Cc: <www-style@w3.org>


Boris> ...The step right before tokenization is to convert the sheet to
Unicode ...

Is this mandatory or defined somewhere as a "must"? I hope no...

We found that such pre conversion (for CSS and HTML) is pretty expensive
from computational and resource point of view.
So we are doing "inline conversion": blind ASCII-7/UTF8 assumption until
first @charset.

Andrew Fedoniouk.

----- Original Message ----- 
From: "Boris Zbarsky" <bzbarsky@MIT.EDU>
To: "Mark Moore" <mark.moore@notlimited.com>
Cc: <www-style@w3.org>
Sent: Thursday, July 15, 2004 11:45 AM
Subject: Re: Handling unrecognized or unsupported charset

> Mark Moore wrote:
> > A UA that doesn't understand the Greek charset (ISO-8859-7) will find
> > style sheet perfectly syntactically correct.  It will be able to parse
> > sheet
> No, it will not.  It will not even be able to tokenize the sheet.  The
> right before tokenization is to convert the sheet to Unicode and then work
> the Unicode character stream, not the byte stream.  If the conversion to
> cannot be performed, tokenization cannot even start.
> The only way to attempt to deal short of discarding the sheet is to assume
> other charset and use that.  Say take the charset from the next step of
> charset selection algorithm.
> > In this case, the @charset rule should be considered invalid, and the UA
> > should continue parsing immediately after the terminating semicolon (or
> > block) as described in section 4.1.5. [2]
> This is not specified anywhere in the spec.  Are you suggesting that it be
> specified?
> -Boris
Received on Thursday, 15 July 2004 17:37:50 UTC

This archive was generated by hypermail 2.3.1 : Monday, 2 May 2016 14:27:14 UTC