Re: Guessing "correct" character set (was: [CSS21] response to issue 115 (and 44)) from Mikko Rantalainen on 2004-02-23 (www-style@w3.org from February 2004)

From: Mikko Rantalainen <mira@cc.jyu.fi>
Date: Mon, 23 Feb 2004 05:47:07 -0500 (EST)
To: www-style@w3.org
Message-ID: <4039DA01.4060708@cc.jyu.fi>

David Woolley wrote:
>>cause problems to you as a document author in case you cannot fix 
>>the HTTP header. Just transcode from your current character set to 
>>UTF-8. Shouldn't be a problem while authoring NEW documents.
> 
> I might agree if you said to new standards, but I think we are talking
> about people authoring to de facto CSS2 implementations being viewed
> in CSS3 capable browsers.  I think the only thing you could really do
> it is to defer the warning until a CSS3 feature is invoked, but then
> you've already done the hard work!

A broken file is a broken file. As we have no method to tell that 
some stylesheet is version CSS 3.2 instead of de-facto-CSS2, we're 
in a situation where user agents either *try* to automatically fix 
the problems or just report them and possibly ask for further advice 
  from the user. Delaying the problem reporting just because some 
old implementations were able to read the style sheet correctly 
because it automagically fixed the errors isn't very future proof 
setup. The fact is, if the document character set cannot be know, it 
either must be guessed or asked from somebody. Automated 
guess-machine is going to fail and the only one to ask from is the 
user, not the author, no matter how much I'd like that.

>>can make the document author feel that he gets all the blame, the 
>>faster he'll fix the document. If he doesn't care, it might be that 
> 
> The past experience in this area is that the browser developer gets
> the blame for faulting documents that are "clearly valid" because they
> work "perfectly" in the competing product.

That's because user agents usually drop the style/broken content 
silently. For example, when Opera.com was rendered "incorrectly" by 
Mozilla because the stylesheet was UTF-8 but contained illegal byte 
sequence, the Mozilla users thought it was a fault in Mozilla, 
because nowhere the browser reported that it had dropped the 
stylesheet because it was corrupted. If the user agent clearly 
informs the user that the page cannot be fully rendered because the 
page contains errors, nobody is going to blame the user agent.

Automagic machine correction is great as long as it works. Once it 
fails, it's usually *really* hard to find the original error that 
was covered by the automagic fixing feature.

-- 
Mikko

Received on Monday, 23 February 2004 18:30:04 UTC