W3C home > Mailing lists > Public > www-validator@w3.org > March 2017

Re: Default charset in HTML5

From: Michael[tm] Smith <mike@w3.org>
Date: Fri, 10 Mar 2017 05:22:48 +0900
To: Nick <halbtaxabo-temp4@yahoo.com>
Cc: "www-validator@w3.org" <www-validator@w3.org>
Message-ID: <20170309202248.24tqwnavlavvy7yh@sideshowbarker.net>
Nick <halbtaxabo-temp4@yahoo.com>, 2017-03-09 13:46 +0000:
> Archived-At: <http://www.w3.org/mid/53672439.2120582.1489067212112@mail.yahoo.com>
> >Michael[tm] Smith <mike@w3.org>:
...
> >And for legacy backward-compat, if a document doesn’t declare an encoding,
> >then browsers are required to parse it using windows-1252 as the encoding.
> 
> Really? Which current standards document says that?

https://html.spec.whatwg.org/#determining-the-character-encoding:concept-encoding-confidence-8

> Otherwise, return an implementation-defined or user-specified default
> character encoding, with the confidence tentative.
...
> In other environments, the default encoding is typically dependent on the
> user's locale (an approximation of the languages, and thus often
> encodings, of the pages that the user is likely to frequent). The
> following table gives suggested defaults based on the user's locale, for
> compatibility with legacy content.

windows-1252 is the default there for all user locales other than the ones
explicitly listed. In the context of checking a document with the HTML
checker there is no user locale to examine, so it uses windows-1252.

But as you can see from that table, the encoding that browsers will use for a
document that doesn’t declare an encoding changes based on the user’s locale.
For example, if the user’s locale is Japanese, browsers will use Shift_JIS.

  —Mike

-- 
Michael[tm] Smith https://sideshowbarker.net/

Received on Thursday, 9 March 2017 20:23:18 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 9 March 2017 20:23:22 UTC