W3C home > Mailing lists > Public > public-html@w3.org > June 2009

Re: Auto-detect and encodings in HTML5

From: Anne van Kesteren <annevk@opera.com>
Date: Mon, 01 Jun 2009 20:13:28 +0200
To: "Larry Masinter" <masinter@adobe.com>, "Chris Wilson" <Chris.Wilson@microsoft.com>, "Maciej Stachowiak" <mjs@apple.com>
Cc: "M.T. Carrasco Benitez" <mtcarrascob@yahoo.com>, "Travis Leithead" <Travis.Leithead@microsoft.com>, "Erik van der Poel" <erikv@google.com>, "public-html@w3.org" <public-html@w3.org>, "www-international@w3.org" <www-international@w3.org>, "Richard Ishida" <ishida@w3.org>, "Ian Hickson" <ian@hixie.ch>, "Harley Rosnow" <Harley.Rosnow@microsoft.com>
Message-ID: <op.uuux8qvl64w2qv@anne-van-kesterens-macbook.local>
On Mon, 01 Jun 2009 19:44:23 +0200, Larry Masinter <masinter@adobe.com>  
> Chris, in your note below you claim that the "current de facto" value  
> was "Win1252" which seems to contradict what I thought was claimed in  
> another message that the "de facto" default was "unknown" (which was my  
> understanding, i.e., that browsers used a wide variety of heuristics to  
> determine charset).

If the heuristics fail the final fallback is typically windows-1252. See  
also the section "Determining the character encoding" in HTML5.

> I'm interested in reducing ambiguity and making web transactions more  
> reliable, and associating a new version indicator (DOCTYPE) with a more  
> constrained default (charset default UTF8, rather than 'unknown') is  
> reasonable, while I also would be opposed to making an incompatible  
> change with actual current behavior.

Isn't that contradictory?

If people want a better encoding, why can't they simply specify it along  
with the DOCTYPE? Or specifity it at the HTTP level? Letting the DOCTYPE  
have more side effects than it already has seems harmful.

Anne van Kesteren
Received on Monday, 1 June 2009 18:14:35 UTC

This archive was generated by hypermail 2.4.0 : Saturday, 9 October 2021 18:44:48 UTC