Re: Auto-detect and encodings in HTML5

On Jun 1, 2009, at 6:07 PM, Boris Zbarsky wrote:

> Leif Halvard Silli wrote:
>> There is one aspect that you are - again - forgetting, and that is   
>> authoring tools and web servers.
> I don't think Maciej forgot anything like that.  He's talking about  
> the proposal that was made: that HTML consumers (not producers)  
> default to UTF-8 whenever they see "<!DOCTYPE html>".  He is clearly  
> talking about the case "unless the author explicitly declares an  
> encoding", where "author" is anything that's producing HTML.   
> "declares an encoding" could take the form of an HTTP header or a  
> <meta> tag in the HTML.
>> If complying authoring tools had to default to UTF-8 whenever  
>> someone select to create a HTML 5 document (much the same way that  
>> XML default to UTF-8/-16), then that would be a bonus and  
>> simplification and _motivation_ for using HTML 5.
> Presumably by "default" you mean encode it as UTF-8 and then include  
> the appropriate <meta> tag?  That sounds like a pretty good idea to  
> me.
>> The next level should be that web servers defaults to sending a  
>> charset header which said "UTF-8" whenever they saw the HTML 5  
>> doctype.
> Very few web servers look inside the document content when deciding  
> on headers.  I don't believe the two most common ones (Apache and  
> IIS) do so by default....
>> Thus we could leave the Web browser behaviour as drafted, but  
>> require utf-8 as default from serves and authoring tools.
> I doubt you'll hear any browser developers complaining about this!   
> I certainly have no objections to it.  If authoring tools do in fact  
> behave this way, then maybe at some point (decades from now, I  
> suspect) we'll get to a world where we can start dropping support  
> for encodings that are no longer in use because the documents have  
> been transcoded to UTF-8 in the meantime.... Would be nice.

Agreed. I have no problem with authoring tools or servers producing  
UTF-8 by default, as long as they explicitly flag it. In fact, HTML  
tooling defaulting to UTF-8 would be great! But as I understand it,  
the proposal on the table was to change the behavior of HTML  
consumers, and that I would object to.


Received on Tuesday, 2 June 2009 01:24:22 UTC