W3C home > Mailing lists > Public > public-html@w3.org > October 2009

Re: HTML5 Issue 11 (encoding detection): I18N WG response...

From: Henri Sivonen <hsivonen@iki.fi>
Date: Mon, 12 Oct 2009 10:42:47 +0300
Cc: Ian Hickson <ian@hixie.ch>, Larry Masinter <masinter@adobe.com>, Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>, "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, "Phillips, Addison" <addison@amazon.com>, Andrew Cunningham <andrewc@vicnet.net.au>, Richard Ishida <ishida@w3.org>, "public-html@w3.org" <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Message-Id: <B83EAF8F-F167-4C2E-8618-93823B6084BA@iki.fi>
To: Mark Davis ☕ <mark@macchiato.com>
On Oct 12, 2009, at 07:14, Mark Davis ☕ wrote:

> 	• Test if the bytes are valid UTF-8. If they are, return return  
> that encoding, with the confidence tentative, and abort these steps.
> 		• [include note about UTF-8 patterns, maybe reworded a bit.]
> 	• The user agent may attempt to autodetect the character encoding  
> [include rest of #5]

So you are suggesting making UTF-8 autodetect mandatory while leaving  
the rest of chardet optional? Does any one of the 5 top browsers do  

> 	• Otherwise, return an implementation-defined or user-specified  
> default character encoding, with the confidence tentative. Due to  
> its widespread use as a default in legacy content, windows-1252 is  
> recommended as a default in the absences of other information.

I think it would be useful to include a table showing the locales and  
their default encodings for the locales to which browsers  
traditionally ship with a non-Windows-1252 default.

Henri Sivonen
Received on Monday, 12 October 2009 07:43:26 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:52 UTC