W3C home > Mailing lists > Public > www-international@w3.org > April to June 2009

Re: Auto-detect and encodings in HTML5

From: Leif Halvard Silli <lhs@malform.no>
Date: Mon, 01 Jun 2009 03:46:23 +0200
Message-ID: <4A2332EF.1080709@malform.no>
To: Larry Masinter <masinter@adobe.com>
CC: Maciej Stachowiak <mjs@apple.com>, "M.T. Carrasco Benitez" <mtcarrascob@yahoo.com>, Travis Leithead <Travis.Leithead@microsoft.com>, Erik van der Poel <erikv@google.com>, "public-html@w3.org" <public-html@w3.org>, "www-international@w3.org" <www-international@w3.org>, Richard Ishida <ishida@w3.org>, Ian Hickson <ian@hixie.ch>, Chris Wilson <Chris.Wilson@microsoft.com>, Harley Rosnow <Harley.Rosnow@microsoft.com>
Larry Masinter On 09-06-01 00.45:
> Changing the default charset from *something
> well known* to *something else* would be a bad
> idea -- that would be "default charset switching".
> But changing the charset from "known, please guess"
> to "UTF-8" doesn't seem like it is "default
> charset switching", it's "default charset 
> setting".


> Setting default charset setting may not be
> a good reason for a version indicator, but
> it's a supporting reason.


> If there were other reasons for having a version
> indicator (e.g., to support authoring requirements),
> the version indicator could also indicate default
> charset UTF8.


Maciej Stachowiak Sunday, May 31, 2009 3:35 PM

>> I think it would be pretty poor if some indicator of the document  
>> version (e.g. the doctype or as suggested by someone else a version  
>> parameter in the Content-Type header) changed the default charset.  
>> There are two reasons I say this:
>> 1) It goes against our desire to allow for gradual adoption. If  
>> changing your doctype declaration could have the side effect of  
>> changing your charset from Windows-1252 ("Windows Latin-1") to UTF-8,  
>> that would be a serious risk of breaking upgraded documents.

How so? Wouldn't this rather /encourage/ gradual adoption by 
attracting authors to it? One would probably find that authors 
would switch doctype even though they did not otherwise rework 
their pages /only/ to get this effect. Why would a Windows Latin-1 
document be switched to HTML 5 doctype if there otherwise were no 
effect in doing so? In fact, this change could prevent changes 
purely based on being "cool".

The HTML 5 doctype saves authors from typing. This effect would 
save many of them from typing the charset as well.

Such a change would also be very much in line with the "support 
world languages" principle. [1]

>> 2) Doctype and Content-type parameter are both opt-in mechanisms. But  
>> there's already explicit ways to opt in to UTF-8: the charset  
>> parameter on Content-type, or a <meta> tag in the document. Explicit  
>> opt-in seems better to me than implicit, since it's more likely the  
>> author will be making a change intentionally. [...]

What do you mean by saying that DOCTYPE is an opt-in? The draft 
says that "A DOCTYPE is a mostly useless, but required, header."


leif halvard silli
Received on Monday, 1 June 2009 01:47:04 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:30 UTC