W3C home > Mailing lists > Public > public-html@w3.org > July 2009

Re: Auto-detect and encodings in HTML5

From: Ian Hickson <ian@hixie.ch>
Date: Fri, 10 Jul 2009 22:09:21 +0000 (UTC)
To: Leif Halvard Silli <lhs@malform.no>, Henri Sivonen <hsivonen@iki.fi>
Cc: public-html@w3.org
Message-ID: <Pine.LNX.4.62.0907102143090.23663@hixie.dreamhostps.com>
On Fri, 12 Jun 2009, Leif Halvard Silli wrote:
> Ian Hickson On 09-06-12 01.16:
> > On Wed, 3 Jun 2009, Henri Sivonen wrote:
> > > *Of course* authoring tools should use UTF-8 *and declare it* for 
> > > any new documents.
> > > 
> > > HTML5 already says: "Authors are encouraged to use UTF-8." 
> > > http://www.whatwg.org/specs/web-apps/current-work/#charset
> > 
> > I could make this stronger if people think that would be helpful.
> That would be a good thing. It should say that conforming authoring 
> tools (please do not only say 'authors' in this case) MUST _default_ to 
> using UTF-8.

I've made it say SHOULD default. (MUST would be inappropriate here, since 
there might well be valid reasons to default to somethinge else, e.g. 
ASCII if the tool is outputting HTML that will be sent through non-8-bit- 
safe environments, like pasting into into a terminal.)

> I have also previously (2007) mentioned on this list that authoring 
> tools, including Web browsers, should have support for character 
> encoding suffixes (file.html.utf8) - such as e.g. Apache has. I think 
> that, when reading file:/// urls, then user agents could use these 
> charset extensions to mimic the charset header of web servers. Thus 
> authors and authoring tools would get a simple way to experience how the 
> HTTP headers have a higher importance than what the META element 
> specifies.
> Thus, if a file has the name "file.html.utf8", then UAs should, when 
> reading that file via the file URL protocol give precedence to the 
> encoding expressed by the file suffix.
> Thus, I would suggest that HTML 5 a) specifies the file suffixes for all 
> the encodings that it endorses (building on those that Apache by default 
> uses), b) recommend Web browsers to recognize these suffixes, when 
> reading files via file://

I haven't added this; I think that this is something that the file:// spec 
should cover, not HTML5. (I'm not really convinced we need to worry about 
interop with file:// anyway.)

On Mon, 15 Jun 2009, Henri Sivonen wrote:
> I think it would be helpful to informatively mention the bad 
> consequences on form submission and URL query parts if this advice is 
> not followed.

I added something brief and non-specific. If you have specific things 
you'd like me to mention, let me know.

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Friday, 10 July 2009 22:10:05 UTC

This archive was generated by hypermail 2.4.0 : Saturday, 9 October 2021 18:44:51 UTC