- From: Tex Texin <tex@XenCraft.com>
- Date: Thu, 15 Jul 2004 22:50:01 -0400
- To: Boris Zbarsky <bzbarsky@MIT.EDU>
- Cc: Mark Moore <mark.moore@notlimited.com>, www-style@w3.org
Boris Zbarsky wrote: > > Mark Moore wrote: > > A UA that doesn't understand the Greek charset (ISO-8859-7) will find the > > style sheet perfectly syntactically correct. It will be able to parse the > > sheet > > No, it will not. It will not even be able to tokenize the sheet. The step > right before tokenization is to convert the sheet to Unicode and then work with > the Unicode character stream, not the byte stream. If the conversion to Unicode > cannot be performed, tokenization cannot even start. I don't see this in the standard. A style sheet is a sequence of Unicode characters, but the standard is quite clear that processors can work with other encodings. Also, if this logic were true, what does it imply for the case where the encoding is correctly identified but the sheet contains characters in that encoding for which the processor doesn't have a mapping? (EG the encoding was expanded to include more characters after the processor was released.) If the processor detects a character it can't map should it give up on the entire conversion? Or ignore the characters it can't map, ignore the blocks they are used in, and keep processing? > The only way to attempt to deal short of discarding the sheet is to assume some > other charset and use that. Say take the charset from the next step of the > charset selection algorithm. I don't see this in the standard. Another alternative is to simply parse what makes sense and ignore the rest, consistent with other aspects of CSS. -- ------------------------------------------------------------- Tex Texin cell: +1 781 789 1898 mailto:Tex@XenCraft.com Xen Master http://www.i18nGuy.com XenCraft http://www.XenCraft.com Making e-Business Work Around the World -------------------------------------------------------------
Received on Thursday, 15 July 2004 22:51:12 UTC