W3C home > Mailing lists > Public > public-html@w3.org > June 2009

Re: Auto-detect and encodings in HTML5

From: Henri Sivonen <hsivonen@iki.fi>
Date: Mon, 15 Jun 2009 13:44:00 +0300
Cc: public-html@w3.org
Message-Id: <461F9FAE-F737-4405-A401-42FA79116504@iki.fi>
To: Ian Hickson <ian@hixie.ch>
On Jun 12, 2009, at 02:16, Ian Hickson wrote:

> On Wed, 3 Jun 2009, Henri Sivonen wrote:
>> *Of course* authoring tools
>> should use UTF-8 *and declare it* for any new documents.
>> HTML5 already says: "Authors are encouraged to use UTF-8."
>> http://www.whatwg.org/specs/web-apps/current-work/#charset
> I could make this stronger if people think that would be helpful.

I think it would be helpful to informatively mention the bad  
consequences on form submission and URL query parts if this advice is  
not followed.

> On Wed, 3 Jun 2009, Henri Sivonen wrote:
>> My counter-argument is that it's useful for a validator to whine in
>> the ASCII-only case, because the validator user may be testing a CMS
>> template that is ASCII-only at the time of testing but gets filled
>> with arbitrary content at deployment time.
> I think it is reasonable for a validator to warn if a document is
> US-ASCII without a declaration.

Validator.nu already does this. However, this means that when error  
reporting is compiled in, the parser needs to burn per-code unit  
cycles to detect this situation, so I think it is unlikely that the  
warning could make it to browsers even if parsers otherwise started  
reporting HTML5 parse errors.

Henri Sivonen
Received on Monday, 15 June 2009 10:44:41 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:46 UTC