W3C home > Mailing lists > Public > www-validator@w3.org > March 2017

Re: Default charset in HTML5

From: Michael[tm] Smith <mike@w3.org>
Date: Thu, 9 Mar 2017 21:05:03 +0900
To: Nick <halbtaxabo-temp4@yahoo.com>
Cc: "www-validator@w3.org" <www-validator@w3.org>
Message-ID: <20170309120503.vmdiiwmel3kl3m2v@sideshowbarker.net>
Nick <halbtaxabo-temp4@yahoo.com>, 2017-03-09 09:35 +0000:
> Archived-At: <http://www.w3.org/mid/399303649.2033694.1489052134241@mail.yahoo.com>
> If an HTML5 document doesn't specify a charset, the validator flags an error like this:
> "Error: The character encoding was not declared. Proceeding using windows-1252"
> and then proceeds to flag further errors  ("Unmappable byte sequence")
> when it encounters utf-8 encodings of characters not in the windows-1252
> set. Isn't utf-8 the default for HTML5?

It’s not the default if by that you mean you don’t need to declare it.

Per the Encoding spec, conforming documents are required to both use UTF-8
as their encoding and also are required to explicitly specify UTF-8 as the
encoding—either using a Content-Type header or a <meta> element.

So it’s non-conforming for a document to not declare an encoding, but
browsers are still required to process documents that don’t declare one.

And for legacy backward-compat, if a document doesn’t declare an encoding,
then browsers are required to parse it using windows-1252 as the encoding.


Michael[tm] Smith https://sideshowbarker.net/

Received on Thursday, 9 March 2017 12:05:32 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 9 March 2017 12:05:35 UTC