Re: HTML5 Issue 11 (encoding detection): I18N WG response...

On Oct 12, 2009, at 07:14, Mark Davis ☕ wrote:

>  • Test if the bytes are valid UTF-8. If they are, return return  
> that encoding, with the confidence tentative, and abort these steps.
>   • [include note about UTF-8 patterns, maybe reworded a bit.]
>  • The user agent may attempt to autodetect the character encoding  
> [include rest of #5]

So you are suggesting making UTF-8 autodetect mandatory while leaving  
the rest of chardet optional? Does any one of the 5 top browsers do  
that?

>  • Otherwise, return an implementation-defined or user-specified  
> default character encoding, with the confidence tentative. Due to  
> its widespread use as a default in legacy content, windows-1252 is  
> recommended as a default in the absences of other information.


I think it would be useful to include a table showing the locales and  
their default encodings for the locales to which browsers  
traditionally ship with a non-Windows-1252 default.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Monday, 12 October 2009 07:43:25 UTC