- From: Ian Hickson <ian@hixie.ch>
- Date: Mon, 12 Oct 2009 11:45:29 +0000 (UTC)
- To: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>, Mark Davis ☕ <mark@macchiato.com>, Henri Sivonen <hsivonen@iki.fi>, Maciej Stachowiak <mjs@apple.com>
- Cc: Martin J. D?rst <duerst@it.aoyama.ac.jp>, "Phillips, Addison" <addison@amazon.com>, Andrew Cunningham <andrewc@vicnet.net.au>, Richard Ishida <ishida@w3.org>, "public-html@w3.org" <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>, Larry Masinter <masinter@adobe.com>
- Message-ID: <Pine.LNX.4.62.0910121138010.25383@hixie.dreamhostps.com>
On Mon, 12 Oct 2009, Leif Halvard Silli wrote: > Ian Hickson On 09-10-11 21.23: > > On Sun, 11 Oct 2009, Leif Halvard Silli wrote (reordered): > > > > > > The choice of character set - alphabet - for instance, has always > > > been a political matter, and still is. > > > > Ok, then it seems sensible to use a political way of speaking to refer > > to the choice of alphabet. > > > > > "Western this-and-that" is predominantly a political way of > > > speaking. > > > > Good, then it is appropriate terminology. > > Appropriate for what? For the spec. Using political ways of speaking to talk about political matters. > "Western European Language [environments]" as Addison suggested is a > reasonable neutral term, btw, despite use of "Western". It also gives > the reader much more hints about what the politics involved ... "European" has no place in this term, as far as I can tell. > > > Therefore is wrong to use a wording that causes readers to think in > > > political terms. > > > > But you agree that it _is_ a political matter. > > Which "it" are you referring to now? The choice of character set - alphabet. > "Western demographics" is a term that leaves the job of finding out > which those areas are to the reader, anyhow. If we can have instead a table of languages to default encodings, I would much rather have that. Is the data for such a table available? On Mon, 12 Oct 2009, Henri Sivonen wrote: > > It probably wouldn't make sense to build an exhaustive lists of locales > where browsers default to Windows-1252, but wouldn't it be feasible to > build an exhaustive list of the locales where browsers *don't* default > to Windows-1252 (e.g. by grepping Firefox localization files)? If such data is available, I'd be happy to include it instead of the current text. On Sun, 11 Oct 2009, Mark Davis â~X~U wrote: > > But focusing on advice to developers, I'd suggest replacing 6 and 7 in > http://dev.w3.org/html5/spec/Overview.html#determining-the-character-encoding, > by the following 3 numbered items. > > - Test if the bytes are valid UTF-8. If they are, return return that > encoding, with the > confidence<http://dev.w3.org/html5/spec/Overview.html#concept-encoding-confidence> > *tentative*, and abort these steps. > - *[include note about UTF-8 patterns, maybe reworded a bit.]* > - The user agent may attempt to autodetect the character encoding *[include > rest of #5]* > - Otherwise, return an implementation-defined or user-specified default > character encoding, with the > confidence<http://dev.w3.org/html5/spec/Overview.html#concept-encoding-confidence> > *tentative*. Due to its widespread use as a default in legacy content, > windows-1252 is recommended as a default in the absences of other > information. On Mon, 12 Oct 2009, Henri Sivonen wrote: > > So you are suggesting making UTF-8 autodetect mandatory while leaving > the rest of chardet optional? Does any one of the 5 top browsers do > that? Mark, could you elaborate on your reasoning for this proposal and on the intent of browser vendors to follow those requirements? On Mon, 12 Oct 2009, Maciej Stachowiak wrote: > On Oct 11, 2009, at 12:23 PM, Ian Hickson wrote: > > > > What phrase best approximates the areas of the world where _today_ UAs > > are shipping with a 1252 default encoding? > > "locales that predominantly use the Latin script" Given that 1252 is the Latin script, and seem circular. > Or you could say: > > "locales that predominantly use the Latin script, and whose primary > languages are completely or almost completely covered by Windows-1252." I'd rather just have an explicit table, if we can. > Note: in the browsers that vary this, it is always determined by > "locale", not "demographic" (which is not a computing concept). I don't > think using the term "demographic" makes sense in this context. Fair enough. Changed to "locale". -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 12 October 2009 11:34:55 UTC