W3C home > Mailing lists > Public > www-validator@w3.org > April 2008

Re: Fallback to UTF-8

From: olivier Thereaux <ot@w3.org>
Date: Mon, 28 Apr 2008 10:43:01 +0900
Cc: W3C Validator Community <www-validator@w3.org>
Message-Id: <2DF059F6-EA37-45DB-82F6-AA331D75D7E6@w3.org>
To: Henri Sivonen <hsivonen@iki.fi>

Hi Henri, all.

Thanks for all your thoughts on this thread. I am disappointed by some  
of the name calling, but overall I believe this has been an  
interesting and informative discussion.

On 24-Apr-08, at 5:10 PM, Henri Sivonen wrote:
> More precisely for text/html:
> http://www.w3.org/html/wg/html5/#determining
>
> Step 7. defines Windows-1252 as the general default which can be  
> different in non-Western browser installations. Global online apps  
> like validators should probably stick to Windows-1252.

Henri, this is an interesting and important statement in the HTML5  
spec. How does the group feel about the inconsistency this created  
between the spec and defaults stated by other specifications, such as

http://www.ietf.org/rfc/rfc2854.txt
“ Section 3.7.1, defines that "media subtypes of the 'text' type are
defined to have a default charset value of 'ISO-8859-1'".”
(ditto RFC 2616)

This is the inconsistency at the core of the issue, isn't it.

I heard that the group working on HTTPbis had considered changing the  
default, but had not managed to reach consensus yet. Is the HTML WG  
considering updating rfc2854?

>  (The mention of UTF-8 there is a token gesture; the Web is a legacy  
> system, so UTF-8 for non-legacy does not apply.)

This sounds rather like a subjective statement, which I would be wary  
of. Of course, the HTML5 spec is here to fix things in a backward- 
compatible way, but specifications are forward looking, not just back  
- and checkers are here in part to help move the landscape futureward.  
Or, at least, so am I told all the time by the likes of timbl :).

I also note in the HTML5 specification:
“Authors are encouraged to use UTF-8. Conformance checkers may advise  
against authors using legacy encodings.”

So is this a question of a future-looking default (utf8) versus  
conservative default (win1252)? If so, I would argue that a checker  
should favor utf8 first, and fallback to win1252 second, no?

Thanks.
-- 
olivier
Received on Monday, 28 April 2008 01:43:38 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:29 GMT