- From: Leif Halvard Silli <lhs@malform.no>
- Date: Tue, 02 Jun 2009 03:34:18 +0200
- To: Boris Zbarsky <bzbarsky@MIT.EDU>
- CC: "public-html@w3.org" <public-html@w3.org>, "www-international@w3.org" <www-international@w3.org>
Boris Zbarsky On 09-06-02 03.07:
> Leif Halvard Silli wrote:
>> Maciej Stachowiak On 09-06-02 00.38:
>>> Making the doctype switch the default from Windows-1252 to UTF-8 will
>>> mean only ASCII documents work correctly in both older and newer user
>>> agents, unless the author explicitly declares an encoding.
> [etc]
>
>> There is one aspect that you are - again - forgetting, and that is
>> authoring tools and web servers.
>
> I don't think Maciej forgot anything like that. He's talking about the
> proposal that was made: that HTML consumers (not producers) default to
> UTF-8 whenever they see "<!DOCTYPE html>". He is clearly talking about
> the case "unless the author explicitly declares an encoding", where
> "author" is anything that's producing HTML. "declares an encoding"
> could take the form of an HTTP header or a <meta> tag in the HTML.
My comment was related to what Larry said [1]:
>If there were other reasons for having a version
>indicator (e.g., to support authoring requirements),
>the version indicator could also indicate default
>charset UTF8.
Larry has repeatedly spoken about the needs of authoring tools
e.g. w.r.t. versioning.
>> If complying authoring tools had to default to UTF-8 whenever someone
>> select to create a HTML 5 document (much the same way that XML default
>> to UTF-8/-16), then that would be a bonus and simplification and
>> _motivation_ for using HTML 5.
>
> Presumably by "default" you mean encode it as UTF-8 and then include the
> appropriate <meta> tag? That sounds like a pretty good idea to me.
Yes, indeed. As Larry said[2]: "Yes, supplying explicit charset is
preferable, but ..."
The spec also talks about relying on BOM as an alternative - I
guess /that/ should be conforming/required authoring tool
behaviour as well?
>> The next level should be that web servers defaults to sending a
>> charset header which said "UTF-8" whenever they saw the HTML 5 doctype.
>
> Very few web servers look inside the document content when deciding on
> headers. I don't believe the two most common ones (Apache and IIS) do
> so by default....
Perhaps Sam or Roy or someone from Microsoft can enlighten us if
such a thing would be possible in Apache and IIS?
>> Thus we could leave the Web browser behaviour as drafted, but require
>> utf-8 as default from serves and authoring tools.
>
> I doubt you'll hear any browser developers complaining about this! I
> certainly have no objections to it. If authoring tools do in fact
> behave this way, then maybe at some point (decades from now, I suspect)
> we'll get to a world where we can start dropping support for encodings
> that are no longer in use because the documents have been transcoded to
> UTF-8 in the meantime.... Would be nice.
Indeed. :-)
[1] http://lists.w3.org/Archives/Public/public-html/2009May/0654
[2] http://lists.w3.org/Archives/Public/public-html/2009Jun/0036
--
leif halvard silli
Received on Tuesday, 2 June 2009 01:35:00 UTC