- From: Leif Halvard Silli <lhs@malform.no>
- Date: Tue, 02 Jun 2009 03:34:18 +0200
- To: Boris Zbarsky <bzbarsky@MIT.EDU>
- CC: "public-html@w3.org" <public-html@w3.org>, "www-international@w3.org" <www-international@w3.org>
Boris Zbarsky On 09-06-02 03.07: > Leif Halvard Silli wrote: >> Maciej Stachowiak On 09-06-02 00.38: >>> Making the doctype switch the default from Windows-1252 to UTF-8 will >>> mean only ASCII documents work correctly in both older and newer user >>> agents, unless the author explicitly declares an encoding. > [etc] > >> There is one aspect that you are - again - forgetting, and that is >> authoring tools and web servers. > > I don't think Maciej forgot anything like that. He's talking about the > proposal that was made: that HTML consumers (not producers) default to > UTF-8 whenever they see "<!DOCTYPE html>". He is clearly talking about > the case "unless the author explicitly declares an encoding", where > "author" is anything that's producing HTML. "declares an encoding" > could take the form of an HTTP header or a <meta> tag in the HTML. My comment was related to what Larry said [1]: >If there were other reasons for having a version >indicator (e.g., to support authoring requirements), >the version indicator could also indicate default >charset UTF8. Larry has repeatedly spoken about the needs of authoring tools e.g. w.r.t. versioning. >> If complying authoring tools had to default to UTF-8 whenever someone >> select to create a HTML 5 document (much the same way that XML default >> to UTF-8/-16), then that would be a bonus and simplification and >> _motivation_ for using HTML 5. > > Presumably by "default" you mean encode it as UTF-8 and then include the > appropriate <meta> tag? That sounds like a pretty good idea to me. Yes, indeed. As Larry said[2]: "Yes, supplying explicit charset is preferable, but ..." The spec also talks about relying on BOM as an alternative - I guess /that/ should be conforming/required authoring tool behaviour as well? >> The next level should be that web servers defaults to sending a >> charset header which said "UTF-8" whenever they saw the HTML 5 doctype. > > Very few web servers look inside the document content when deciding on > headers. I don't believe the two most common ones (Apache and IIS) do > so by default.... Perhaps Sam or Roy or someone from Microsoft can enlighten us if such a thing would be possible in Apache and IIS? >> Thus we could leave the Web browser behaviour as drafted, but require >> utf-8 as default from serves and authoring tools. > > I doubt you'll hear any browser developers complaining about this! I > certainly have no objections to it. If authoring tools do in fact > behave this way, then maybe at some point (decades from now, I suspect) > we'll get to a world where we can start dropping support for encodings > that are no longer in use because the documents have been transcoded to > UTF-8 in the meantime.... Would be nice. Indeed. :-) [1] http://lists.w3.org/Archives/Public/public-html/2009May/0654 [2] http://lists.w3.org/Archives/Public/public-html/2009Jun/0036 -- leif halvard silli
Received on Tuesday, 2 June 2009 01:35:00 UTC