W3C home > Mailing lists > Public > public-html@w3.org > June 2009

Re: Auto-detect and encodings in HTML5

From: Maciej Stachowiak <mjs@apple.com>
Date: Sun, 31 May 2009 18:54:19 -0700
Cc: "M.T. Carrasco Benitez" <mtcarrascob@yahoo.com>, Travis Leithead <Travis.Leithead@microsoft.com>, Erik van der Poel <erikv@google.com>, "public-html@w3.org" <public-html@w3.org>, "www-international@w3.org" <www-international@w3.org>, Richard Ishida <ishida@w3.org>, Ian Hickson <ian@hixie.ch>, Chris Wilson <Chris.Wilson@microsoft.com>, Harley Rosnow <Harley.Rosnow@microsoft.com>
Message-id: <BF628922-CF69-4DE1-B8AB-24EB2BB159E4@apple.com>
To: Larry Masinter <masinter@adobe.com>

On May 31, 2009, at 3:45 PM, Larry Masinter wrote:

>
>
> Changing the default charset from *something
> well known* to *something else* would be a bad
> idea -- that would be "default charset switching".
>
> But changing the charset from "known, please guess"
> to "UTF-8" doesn't seem like it is "default
> charset switching", it's "default charset
> setting".

HTML4.01 says that UAs MUST NOT have a default charset. But the de  
facto standard is that the default charset must be WinLatin1. If no  
charset is explicitly specified, some UAs will use heuristic charset  
autodetection, or in some cases bave a locale-specific default. But  
effectively the default is WinLatin1 (Windows-1252). Removing the  
heuristic processing when the HTML doctype is present might be  
appropriate, since a fairly small proportion of sites depend on it and  
generally modernizing a site will include properly declaring the  
charset. However, changing the baseline default from WinLatin1 to  
UTF-8 would be problematic for exactly the reasons I cited.

Was my message unclear on the fact that the current effective default  
is WinLatin1? I cited it specifically. Or are you saying that there's  
no default only in the legalistic sense?

Regards,
Maciej
Received on Monday, 1 June 2009 01:56:02 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:39:04 UTC