- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Tue, 24 Feb 2004 07:34:51 +0200 (EET)
- To: WWW Style <www-style@w3.org>
On Mon, 23 Feb 2004, Henri Sivonen wrote: > > Either guess is bound to be wrong in some cases. And if the guess turns > > out to result in something containing undefined octets, I think we can > > relatively safely guess that the guess was wrong. > > Yes, but restarting the parser at that point is expensive. So what? This is about error processing in presumably few cases. I don't efficiency is then more important than doing sensible things. If you play a guessing game, you should really restart if you can be practically certain that you have made a wrong guess. > > Indeed. And currently most style sheets contain Ascii only. > > Except non-ASCII occurs in comments Non-Ascii may occur in comments, but still the point stands that _most_ style sheets contain Ascii only. > In order to be useful in practice, the last resort > needs to handle the case with declarations are in ASCII but the > comments contain non-ASCII gremlins. Assuming UTF-8 and using a > draconian UTF-8 decoder would cause perceived breakage. There's nothing that prevents a user agent from using whatever last resort it finds suitable, if no encoding is declared and it turns out that the encoding cannot be Ascii. I don't think it is reasonable to _require_ any particular error processing in such cases. If you set up strict rules for error processing, that will be taken (by many) as actually changing the rules. What's the real distinction between _required_ error processing and making the error compliant? Actually required error processing sounds like a canonicalization of what browsers really do (we know that this isn't true for CSS in many cases, but many authors wouldn't), whereas required compliance in processing conforming documents often sounds like wishful thinking (and is). > > This is all about error processing, unless I'm missing something. > > Not exactly if the guessing is made part of an official sniffing > algorithm. (In XML, for example, the UTF-8 default is not about error > processing but about defaulting.) Setting an explicit default and using it is not guessing. It might be a bad move sometimes, but it has to be distinguished from a guessing game. > > a) may apply whatever error processing they find suitable > > But, as Ian Hickson pointed out, then the spec would be less useful and > everyone would have to just reverse engineer the market leader. How would that reduce the usefulness of a specification? A specification tells authors and browser programmers what to do, how to comply. If you tell authors how browsers are required to treat authors' errors you are more or less telling them they ain't no errors really. And if other browsers need to imitate the market leader, the need will not be removed by making the market leader's behavior a standard, still less by making some different error processing a standard. In the latter case, things would just get more difficult for the other browsers. > > b) should assume Ascii, if the style sheet > > contains only octets with most significant bit set to zero. > > Why would assuming ASCII be more useful than assuming windows-1252? Because we can be practically 100% certain that if the data comes with no encoding specified and it looks like Ascii, it is Ascii. Not so with windows-1252 or iso-8859-1. It could just as well be iso-8859-2, for example, and there you go. -- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Tuesday, 24 February 2004 00:34:53 UTC