W3C home > Mailing lists > Public > public-html@w3.org > June 2009

Re: Auto-detect and encodings in HTML5

From: Maciej Stachowiak <mjs@apple.com>
Date: Mon, 01 Jun 2009 18:23:42 -0700
Cc: Leif Halvard Silli <lhs@malform.no>, "public-html@w3.org" <public-html@w3.org>, "www-international@w3.org" <www-international@w3.org>
Message-id: <870C5609-BD64-4610-9219-229D59A6FA3F@apple.com>
To: Boris Zbarsky <bzbarsky@mit.edu>

On Jun 1, 2009, at 6:07 PM, Boris Zbarsky wrote:

> Leif Halvard Silli wrote:
>
>> There is one aspect that you are - again - forgetting, and that is   
>> authoring tools and web servers.
>
> I don't think Maciej forgot anything like that.  He's talking about  
> the proposal that was made: that HTML consumers (not producers)  
> default to UTF-8 whenever they see "<!DOCTYPE html>".  He is clearly  
> talking about the case "unless the author explicitly declares an  
> encoding", where "author" is anything that's producing HTML.   
> "declares an encoding" could take the form of an HTTP header or a  
> <meta> tag in the HTML.
>
>> If complying authoring tools had to default to UTF-8 whenever  
>> someone select to create a HTML 5 document (much the same way that  
>> XML default to UTF-8/-16), then that would be a bonus and  
>> simplification and _motivation_ for using HTML 5.
>
> Presumably by "default" you mean encode it as UTF-8 and then include  
> the appropriate <meta> tag?  That sounds like a pretty good idea to  
> me.
>
>> The next level should be that web servers defaults to sending a  
>> charset header which said "UTF-8" whenever they saw the HTML 5  
>> doctype.
>
> Very few web servers look inside the document content when deciding  
> on headers.  I don't believe the two most common ones (Apache and  
> IIS) do so by default....
>
>> Thus we could leave the Web browser behaviour as drafted, but  
>> require utf-8 as default from serves and authoring tools.
>
> I doubt you'll hear any browser developers complaining about this!   
> I certainly have no objections to it.  If authoring tools do in fact  
> behave this way, then maybe at some point (decades from now, I  
> suspect) we'll get to a world where we can start dropping support  
> for encodings that are no longer in use because the documents have  
> been transcoded to UTF-8 in the meantime.... Would be nice.

Agreed. I have no problem with authoring tools or servers producing  
UTF-8 by default, as long as they explicitly flag it. In fact, HTML  
tooling defaulting to UTF-8 would be great! But as I understand it,  
the proposal on the table was to change the behavior of HTML  
consumers, and that I would object to.

Regards,
Maciej
Received on Tuesday, 2 June 2009 01:24:23 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:16:38 GMT