W3C home > Mailing lists > Public > www-validator@w3.org > July 2011

Re: Incorrect behaviour with utf-16 meta declaration

From: Michael[tm] Smith <mike@w3.org>
Date: Mon, 4 Jul 2011 16:11:26 +0900
To: Richard Ishida <ishida@w3.org>
Cc: www-validator@w3.org
Message-ID: <20110704071125.GA25700@sideshowbarker>
Richard Ishida <ishida@w3.org>, 2011-07-03 10:32 +0100:

> Checking http://www.w3.org/International/tests/i18n-checker/utf16/utf16le-charset-html5.html
> 
> I get the following error messages:
> 
> [[
> Error Line 5, Column 70: Internal encoding declaration specified utf-16
> which is not an ASCII superset. Continuing as if the encoding had been
> utf-8.
> 
> <meta http-equiv="Content-Type" content="text/html; charset=utf-16" />
> 
> ‚úČ
> Error Line 5, Column 70: Internal encoding declaration utf-8 disagrees with
> the actual encoding of the document (utf-16).
> 
> <meta http-equiv="Content-Type" content="text/html; charset=utf-16" />
> ]]
> 
> It is incorrect to parse the document as utf-8, since the document actually
> *is* a utf-16 document. You can report that use of the utf-16 meta
> declaration is against the spec in utf-16 documents, but not assume that the
> encoding is wrong.

According to the HTML5 spec, it is correct to parse the document as UTF-8.
In fact, the spec requires that behavior; see step 5.1.13 of the algorithm
in the "Determining the character encoding" section of the spec:

  "If charset is a UTF-16 encoding, change the value of charset to UTF-8."
  http://dev.w3.org/html5/spec/parsing.html#determining-the-character-encoding

The validator.nu backend includes a parser that conforms to the HTML5 spec
(which incidentally is the same parser that Firefox now uses). And both of
the error messages you cite above are being emitted by that parser, during
the parsing phase, before the backend actually gets around to starting the
validation stage at all.

Note also that any browser which conforms to the HTML5 spec will exhibit
this same behavior (that is, changing the charset from UTF-16 to UTF-8)

So as far as the spec goes, those messages are both correct and expected --
as well as being consistent with parsing behavior in browsers.

  --Mike

-- 
Michael[tm] Smith
http://people.w3.org/mike
Received on Monday, 4 July 2011 07:11:29 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:48 GMT