Re: Handling unrecognized or unsupported charset

Boris Zbarsky wrote:
> 
> Mark Moore wrote:
> > A UA that doesn't understand the Greek charset (ISO-8859-7) will find the
> > style sheet perfectly syntactically correct.  It will be able to parse the
> > sheet
> 
> No, it will not.  It will not even be able to tokenize the sheet.  The step
> right before tokenization is to convert the sheet to Unicode and then work with
> the Unicode character stream, not the byte stream.  If the conversion to Unicode
> cannot be performed, tokenization cannot even start.

I don't see this in the standard.
A style sheet is a sequence of Unicode characters, but the standard is quite
clear that processors can work with other encodings.

Also, if this logic were true, what does it imply for the case where the
encoding is correctly identified but the sheet contains characters in that
encoding for which the processor doesn't have a mapping? (EG the encoding was
expanded to include more characters after the processor was released.) If the
processor detects a character it can't map should it give up on the entire
conversion?
Or ignore the characters it can't map, ignore the blocks they are used in, and
keep processing?

> The only way to attempt to deal short of discarding the sheet is to assume some
> other charset and use that.  Say take the charset from the next step of the
> charset selection algorithm.

I don't see this in the standard.
Another alternative is to simply parse what makes sense and ignore the rest,
consistent with other aspects of CSS.



-- 
-------------------------------------------------------------
Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
Xen Master                          http://www.i18nGuy.com
                         
XenCraft		            http://www.XenCraft.com
Making e-Business Work Around the World
-------------------------------------------------------------

Received on Thursday, 15 July 2004 22:51:12 UTC