W3C home > Mailing lists > Public > public-html@w3.org > June 2009

Re: Auto-detect and encodings in HTML5

From: Geoffrey Sneddon <foolistbar@googlemail.com>
Date: Mon, 1 Jun 2009 22:09:17 +0100
Cc: Anne van Kesteren <annevk@opera.com>, Chris Wilson <Chris.Wilson@microsoft.com>, Maciej Stachowiak <mjs@apple.com>, "M.T. Carrasco Benitez" <mtcarrascob@yahoo.com>, Travis Leithead <Travis.Leithead@microsoft.com>, Erik van der Poel <erikv@google.com>, "public-html@w3.org" <public-html@w3.org>, "www-international@w3.org" <www-international@w3.org>, Richard Ishida <ishida@w3.org>, Ian Hickson <ian@hixie.ch>, Harley Rosnow <Harley.Rosnow@microsoft.com>
Message-Id: <02AFC71A-9E5D-435B-833C-ABCE9FC2D666@googlemail.com>
To: Larry Masinter <masinter@adobe.com>

On 1 Jun 2009, at 19:37, Larry Masinter wrote:

> New behavior: IF you see, say, <doctype html5> THEN assume default  
> charset
> is UTF8, rather than applying heuristics to guess charset.

If you see it how? You need to have read the encoded string to see  
such a string.

> Yes, supplying explicit charset is preferable, but what would break
> if such a new rule were supplied?

The problem is that any HTML 5 content served as text/html will be  
treated as Windows-1252 by all existing user agents and UTF-8 by new  
ones, which is problematic and will lead to problems (as people tend  
to only test in one browser, and if it works in one browser assume it  
works everywhere) as it is hence inconsistent.


--
Geoffrey Sneddon
<http://gsnedders.com/>
Received on Monday, 1 June 2009 21:10:09 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:16:38 GMT