Re: Auto-detect and encodings in HTML5

Ian Hickson On 09-06-12 01.16:

> On Wed, 3 Jun 2009, Henri Sivonen wrote:
>> *Of course* authoring tools
>> should use UTF-8 *and declare it* for any new documents.
>>
>> HTML5 already says: "Authors are encouraged to use UTF-8."
>> http://www.whatwg.org/specs/web-apps/current-work/#charset
> 
> I could make this stronger if people think that would be helpful.

That would be a good thing. It should say that  conforming 
authoring tools (please do not only say 'authors' in
this case) MUST _default_ to using UTF-8.

The developers of the (partly) W3 sponsored Amaya editor claim 
that there are reasons for having ISO-8859-1 as the default 
charset.[1] (Seems like the concern is that some web servers sets 
ISO-8859-1 as default for documents with the .html extension .)

I have also previously (2007) mentioned on this list that 
authoring tools, including Web browsers, should have support for 
character encoding suffixes (file.html.utf8) - such as e.g. Apache 
has. I think that, when reading file:/// urls, then user agents 
could use these charset extensions to mimic the charset header of 
web servers. Thus authors and authoring tools would get a simple 
way to experience how the HTTP headers have a higher importance 
than what the META element specifies.

Thus, if a file has the name "file.html.utf8", then UAs should, 
when reading that file via the file URL protocol give precedence 
to the encoding expressed by the file suffix.

Thus, I would suggest that HTML 5 a) specifies the file suffixes 
for all the encodings that it endorses (building on those that 
Apache by default uses), b) recommend Web browsers to recognize 
these suffixes, when reading files via file://

[1] http://lists.w3.org/Archives/Public/www-amaya/2007OctDec/0169
-- 
leif halvard silli

Received on Friday, 12 June 2009 02:48:55 UTC