W3C home > Mailing lists > Public > whatwg@whatwg.org > December 2011

[whatwg] Default encoding to UTF-8?

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Mon, 5 Dec 2011 19:55:43 +0100
Message-ID: <20111205195543123445.3e1ac21b@xn--mlform-iua.no>
>> (And HTML5 defines it the same.)
> 
> No. As far as I understand, HTML5 defines US-ASCII to be the default and
> requires that any other encoding is explicitly declared. I do like this
> approach.

We are here discussing the default *user agent behaviour* - we are not 
specifically discussing how web pages should be authored.

For use agents, then please be aware that HTML5 maintains a table over 
'Suggested default encoding': 
http://dev.w3.org/html5/spec/parsing.html#determining-the-character-encoding

When you say 'requires': Of course, HTML5 recommends that you declare 
the encoding (via HTTP/higher protocol, via the BOM 'sideshow' or via 
<meta charset=UTF-8>). I just now also discovered that Validator.nu 
issues an error message if it does not find any of of those *and* the 
document contains non-ASCII. (I don't know, however, whether this error 
message is just something Henri added at his own discretion - it would 
be nice to have it literally in the spec too.)

(The problem is of course that many English pages expect the whole 
"Unicode alphabet" even if they only contain US-ASCII from the start.)

HTML5 says that validators *may* issue a warning if UTF-8 is *not* the 
encoding. But so far, validator.nu has not picked that up.
 
> We should also lobby for authoring tools (as recommended by HTML5) to
> default their output to UTF-8 and make sure the encoding is declared.

HTML5 already says: "Authoring tools should default to using UTF-8 for 
newly-created documents. [RFC3629]" 
http://dev.w3.org/html5/spec/semantics.html#charset

> As
> so many pages, supposedly (I have not researched this), use the incorrect
> encoding, it makes no sense to try to clean this mess by messing with
> existing defaults. It may fix some pages and break others. Browsers have
> the ability to override an incorrect encoding and this a reasonable
> workaround.

Do you use a English locale computer? If you do, without being a native 
English speaker, then you are some kind of geek ... Why can't you work 
around the troubles -as you are used to anyway?

Starting a switch to UTF-8 as the default UA encoding for English 
locale users should *only* affect how English locale users experience 
languages which *both* need non-ASCII *and* historically have been 
using Windows-1252 as the default encoding *and* which additionally do 
not include any encoding declaration.
-- 
Leif Halvard Silli
Received on Monday, 5 December 2011 10:55:43 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:59:38 UTC