RE: Auto-detect and encodings in HTML5

I've read what I think are contradictory opinions about the current state
of charset determination.

Previously, HTTP (not HTML) specified (de jure) the default charset to be ISO-8859-1.
Some effort was taken to remove that as it didn't follow actual practice (of browsers
guessing or allowing users to explicitly set charset) and recommend explicit charset.

Chris, in your note below you claim that the "current de facto" value was "Win1252"
which seems to contradict what I thought was claimed in another message that the
"de facto" default was "unknown" (which was my understanding, i.e., that browsers
used a wide variety of heuristics to determine charset).

I'm interested in reducing ambiguity and making web transactions more reliable,
and associating a new version indicator (DOCTYPE) with a more constrained default
(charset default UTF8, rather than 'unknown') is reasonable, while I also would
be opposed to making an incompatible change with actual current behavior.

My understanding of IE and "encoding" determination is limited, but I do see
"auto-select" as a menu item which is selectable in IE, and always assumed that
led to a heuristic determination.


-----Original Message-----
From: Chris Wilson [] 
Sent: Monday, June 01, 2009 9:36 AM
To: Maciej Stachowiak; Larry Masinter
Cc: M.T. Carrasco Benitez; Travis Leithead; Erik van der Poel;;; Richard Ishida; Ian Hickson; Harley Rosnow
Subject: RE: Auto-detect and encodings in HTML5

Maciej Stachowiak [] wrote:
>I think it would be pretty poor if some indicator of the document  
>version (e.g. the doctype or as suggested by someone else a version  
>parameter in the Content-Type header) changed the default charset.  

As much as I'm well documented in favor of having a versioned doctype, I agree that changing the default charset from its current de facto Win1252 value would be a bad idea, largely for the gradual adoption desire that Maciej mentioned.


Received on Monday, 1 June 2009 17:45:17 UTC