Re: HTML5 Issue 11 (encoding detection): I18N WG response... from Henri Sivonen on 2009-10-12 (public-html@w3.org from October 2009)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Mon, 12 Oct 2009 10:42:47 +0300
To: Mark Davis ☕ <mark@macchiato.com>
Cc: Ian Hickson <ian@hixie.ch>, Larry Masinter <masinter@adobe.com>, Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>, "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, "Phillips, Addison" <addison@amazon.com>, Andrew Cunningham <andrewc@vicnet.net.au>, Richard Ishida <ishida@w3.org>, "public-html@w3.org" <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Message-Id: <B83EAF8F-F167-4C2E-8618-93823B6084BA@iki.fi>

On Oct 12, 2009, at 07:14, Mark Davis ☕ wrote:

>  • Test if the bytes are valid UTF-8. If they are, return return  
> that encoding, with the confidence tentative, and abort these steps.
>   • [include note about UTF-8 patterns, maybe reworded a bit.]
>  • The user agent may attempt to autodetect the character encoding  
> [include rest of #5]

So you are suggesting making UTF-8 autodetect mandatory while leaving  
the rest of chardet optional? Does any one of the 5 top browsers do  
that?

>  • Otherwise, return an implementation-defined or user-specified  
> default character encoding, with the confidence tentative. Due to  
> its widespread use as a default in legacy content, windows-1252 is  
> recommended as a default in the absences of other information.

I think it would be useful to include a table showing the locales and  
their default encodings for the locales to which browsers  
traditionally ship with a non-Windows-1252 default.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Monday, 12 October 2009 07:43:26 UTC