Re: Fallback to UTF-8

On 24-Apr-08, at 11:38 PM, Andreas Prilop wrote:
> Which kind of patch do you mean?
> I just ask to change the default from UTF-8 to ISO-8859-1.

In a few years developing various software projects, I have learned to  
be very wary of the word "just". Any occurence of a suggestion that a  
software "just has to do this or that" usually means a lot of  
complexity and difficulty for whoever actually has to implement. I  
suggest banning this term from your RFEs or bug reports.

That said, as the long thread has shown, there are a number of  
candidates for default:
* utf-8, because it is the future-looking encoding, also appropriate  
for most international content. It is also what authors are strongly  
encouraged to use today, and as such, the validator is a tool that  
should favor this practice.
* windows-1252, which appears to be a safe default for a lot of  
content on the web today, and which the HTML5 specification suggests  
as a fallback for UAs trying to parse legacy content
* iso-8859-1, not because it's a proper encoding for most languages,  
but because it has (unfortunately) been set as default in a number of  
specifications.

We can either argue forever on which default is the right one (as  
parts of this thread - and many a sterile discussion before -  have  
shown, alas) or have implementations try the three. The latter is  
obviously not very performant, but hopefully should be helpful for  
document authors.

I think (as I already stated in the past) that the latter may be a  
fair solution. I have therefore tried implementing it in the  
development validator.

The patch is at:
http://lists.w3.org/Archives/Public/www-validator-cvs/2008Apr/0065.html
http://lists.w3.org/Archives/Public/www-validator-cvs/2008Apr/0064.html

And can be tested, as usual, at:
http://qa-dev.w3.org/wmvs/HEAD/

Any constructive feedback would be welcome.

Regards,
-- 
olivier

Received on Monday, 28 April 2008 03:48:07 UTC