- From: Chris Wilson <Chris.Wilson@microsoft.com>
- Date: Tue, 2 Jun 2009 11:09:24 -0700
- To: Henri Sivonen <hsivonen@iki.fi>, Larry Masinter <masinter@adobe.com>
- CC: Maciej Stachowiak <mjs@apple.com>, "M.T. Carrasco Benitez" <mtcarrascob@yahoo.com>, Travis Leithead <Travis.Leithead@microsoft.com>, Erik van der Poel <erikv@google.com>, "public-html@w3.org" <public-html@w3.org>, "www-international@w3.org" <www-international@w3.org>, Richard Ishida <ishida@w3.org>, Ian Hickson <ian@hixie.ch>, Harley Rosnow <Harley.Rosnow@microsoft.com>
+1 (since Henri answered your question so nicely :) -----Original Message----- From: Henri Sivonen [mailto:hsivonen@iki.fi] Sent: Tuesday, June 02, 2009 1:11 AM To: Larry Masinter Cc: Chris Wilson; Maciej Stachowiak; M.T. Carrasco Benitez; Travis Leithead; Erik van der Poel; public-html@w3.org; www-international@w3.org; Richard Ishida; Ian Hickson; Harley Rosnow Subject: Re: Auto-detect and encodings in HTML5 On Jun 1, 2009, at 20:44, Larry Masinter wrote: > Chris, in your note below you claim that the "current de facto" > value was "Win1252" > which seems to contradict what I thought was claimed in another > message that the > "de facto" default was "unknown" (which was my understanding, i.e., > that browsers > used a wide variety of heuristics to determine charset). The de facto default is Windows-1252 except for locales where it isn't. If a user mostly browses pages written in Simplified Chinese, it makes sense to make GBK the default (GBK is to GB2312 what Windows-1252 is to ISO-8859-1) at least when heuristics are turned off. At least for the U.S. locale, Firefox and IE8 default to heuristics off. The user can enable heuristics for CJK and Cyrillic (in various groupings: all, both kinds of Chinese only, only Simplified Chinese, only Russian, etc.). Firefox (but not IE8) also supports a grouping for all of CJK (excluding Cyrillic). In Opera, the heuristic detector may be enabled for any one of C, J and K or Cyrillic. (Opera doesn't seem to have a Russian-only, Ukranian-only, Traditional Chinese-only or Simplified Chinese-only modes.) It's unclear if Opera's default behavior is heuristics off or universal heuristics on. Perhaps someone from Opera could enlighten us. Safari doesn't have heuristic selection UI. It is unclear if Safari has no heuristics or whether it has always-on heuristics by default. Chrome's UI is slightly differently ambiguous. (For clarity: Above I'm using the word "heuristics" to mean exclusively the frequency and chaining analysis on bytes.) > I'm interested in reducing ambiguity and making web transactions > more reliable, > and associating a new version indicator (DOCTYPE) with a more > constrained default > (charset default UTF8, rather than 'unknown') is reasonable, while I > also would > be opposed to making an incompatible change with actual current > behavior. We already have 3 reliable version indicators for encoding axis of versioning: charset=utf-8 on the HTTP layer charset=utf-8 in <meta> the UTF-8 BOM We don't need a new indicator that wouldn't be as compatible with existing user agents as the indicators we already have. (Consider the Degrade Gracefully principle.) On Jun 2, 2009, at 03:48, Leif Halvard Silli wrote: > Is it the choice of UTF-8 as default you don't understand? If so, > then I'd like to quote the "Support World Languages" principle. The Support World Languages principle is satisfied by HTML5 allowing authors easily to opt in to UTF-8. It has to be opt in due to the Support Existing Content and Degrade Gracefully principles. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Tuesday, 2 June 2009 18:11:51 UTC