Re: Auto-detect and encodings in HTML5 from Henri Sivonen on 2009-06-15 (public-html@w3.org from June 2009)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Mon, 15 Jun 2009 13:44:00 +0300
To: Ian Hickson <ian@hixie.ch>
Cc: public-html@w3.org
Message-Id: <461F9FAE-F737-4405-A401-42FA79116504@iki.fi>

On Jun 12, 2009, at 02:16, Ian Hickson wrote:

> On Wed, 3 Jun 2009, Henri Sivonen wrote:
>>
>> *Of course* authoring tools
>> should use UTF-8 *and declare it* for any new documents.
>>
>> HTML5 already says: "Authors are encouraged to use UTF-8."
>> http://www.whatwg.org/specs/web-apps/current-work/#charset
>
> I could make this stronger if people think that would be helpful.

I think it would be helpful to informatively mention the bad  
consequences on form submission and URL query parts if this advice is  
not followed.

> On Wed, 3 Jun 2009, Henri Sivonen wrote:
>>
>> My counter-argument is that it's useful for a validator to whine in
>> the ASCII-only case, because the validator user may be testing a CMS
>> template that is ASCII-only at the time of testing but gets filled
>> with arbitrary content at deployment time.
>
> I think it is reasonable for a validator to warn if a document is
> US-ASCII without a declaration.

Validator.nu already does this. However, this means that when error  
reporting is compiled in, the parser needs to burn per-code unit  
cycles to detect this situation, so I think it is unlikely that the  
warning could make it to browsers even if parsers otherwise started  
reporting HTML5 parse errors.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Monday, 15 June 2009 10:44:41 UTC