- From: Andrew Fedoniouk <news@terrainformatica.com>
- Date: Thu, 15 Jul 2004 14:36:33 -0700
- To: "Boris Zbarsky" <bzbarsky@MIT.EDU>, "Mark Moore" <mark.moore@notlimited.com>
- Cc: <www-style@w3.org>
Boris, Boris> ...The step right before tokenization is to convert the sheet to Unicode ... Is this mandatory or defined somewhere as a "must"? I hope no... We found that such pre conversion (for CSS and HTML) is pretty expensive from computational and resource point of view. So we are doing "inline conversion": blind ASCII-7/UTF8 assumption until first @charset. Andrew Fedoniouk. http://terrainformatica.com ----- Original Message ----- From: "Boris Zbarsky" <bzbarsky@MIT.EDU> To: "Mark Moore" <mark.moore@notlimited.com> Cc: <www-style@w3.org> Sent: Thursday, July 15, 2004 11:45 AM Subject: Re: Handling unrecognized or unsupported charset > > Mark Moore wrote: > > A UA that doesn't understand the Greek charset (ISO-8859-7) will find the > > style sheet perfectly syntactically correct. It will be able to parse the > > sheet > > No, it will not. It will not even be able to tokenize the sheet. The step > right before tokenization is to convert the sheet to Unicode and then work with > the Unicode character stream, not the byte stream. If the conversion to Unicode > cannot be performed, tokenization cannot even start. > > The only way to attempt to deal short of discarding the sheet is to assume some > other charset and use that. Say take the charset from the next step of the > charset selection algorithm. > > > In this case, the @charset rule should be considered invalid, and the UA > > should continue parsing immediately after the terminating semicolon (or > > block) as described in section 4.1.5. [2] > > This is not specified anywhere in the spec. Are you suggesting that it be > specified? > > -Boris >
Received on Thursday, 15 July 2004 17:37:50 UTC