- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Mon, 31 Aug 2009 03:27:44 +0200
- To: public-html@w3.org
- CC: "Phillips, Addison" <addison@amazon.com>
On Sun, 30 Aug 2009 02:37:13 +0000 (UTC) Ian Hickson wrote: >>On Wed, 19 Aug 2009, Phillips, Addison wrote: > > We remain concerned about the text in Step 7 in this section: > > > > > http://dev.w3.org/html5/spec/Overview.html#determining-the-character-encoding > > Our concerns about this text are: > > > > 1. It isn't clear what constitutes a "legacy" or "non-legacy > > environment". > > The Web is a legacy environment. Non-legacy environments are new walled > gardens. Any new document created on a computer is thus a non-legacy environment. This should be pointed out. Should file:// URL come into consideration as "non-legacy"? Should date of creation come into consideration come into consideration? > > The sentence starting "Due to its use..." mentions "predominantly > > Western demographics", which we find troublesome, especially given that > > it is associated with the keyword "recommended". > > Why? I agree with Addison that the text is unclear. Some comments on the following paragraph: <txt>Otherwise, return an implementation-defined or user-specified default character encoding, with the confidence tentative. In non-legacy environments, the more comprehensive UTF-8 encoding is recommended. Due to its use in legacy content, windows-1252 is recommended as a default in predominantly Western demographics instead.</txt> (1) The text says "... windows-1252 is recommended as a default ....". I also suggest saying "... UTF-8 encoding is recommended _as_a_default._" in the preceding sentence about "non-legacy environments". (2) The last word is "... instead". In your debate with Addison, you seem to draw a clear line between "legacy" and "non-legacy". But here, in the text, the word - "instead" - seems to link back to the sentence about "non-legacy" content, thus making it seem as there is a link from "non-legacy" to "legacy". Hence the advice could be interpreted like this: "for legacy content - but only for 'Western' legacy content, then windows-1252 is recommended as a default, instead of using the recommended non-legacy environment encoding". Another possible interpretation: "for Western _non-legacy_ environments, then due to the state of Western _legacy_ _content_, win-1252 is actually recommended as default" ... (3) I suggest replacing the phrase "Western demographics". Because, "Western" is a political word. For instance, Japan is sometimes referred to as "Western". In addition, the phrase "Western demographics" is used less than thousand times on the Web, according to Google [1]. I don't think it is necessary to invent a phrase to express what is meant here. (Are there any "demographics" that predominantly use the Western European character set, but for which the Windows-1252 is _not_ recommended as fallback encoding?) (4) Should it not be mentioned that other defaults may be recommended - whether specified or not - for non-Western Latin hemispheres? (5) You talk about "non-legacy environments" versus "legacy content". I wonder what the difference between "environment" and "content" is. Is "legacy content" the same as "old" content? Can timestamps be used to deciding the best default ...? (6) "Default" in plain English means "fallback". Can "fallback" be used instead of "default" in this paragraph? (And, by any means - wherever, if you wish.) Default has so many unlucky interpretations ... For instance, some might interpret "Windows 1252 is default for Western European languages" as "Windows 1252 is the recommended encoding Western European languages". (7) I wonder if "locale" or localization could be used instead of "demographics". The text speaks about about "implementation-defined or use-specified" fallback encoding. Browser vendors will perhaps need to define "demographics". But this is a seldom used word. Locale - or locales (plural) is much more well known, to most parties, I think. All in all, here is a suggested improvement, as far as I've understood ... <txt>Otherwise, return an implementation-defined or user-specified fallback encoding, with the confidence tentative. In non-legacy environments, the more comprehensive UTF-8 encoding <ins>is recommended as a fallback encoding</ins>. <ins>For</in> legacy content, <ins>then the dominating legacy encoding of one or several text encoding related locale(s), is often recommendable as a fallback encoding. For instance, for legacy content of locales that predominantly use the Western European Latin character set, then </ins> Windows-1252 is recommended as a fallback encoding.</txt> -- leif halvard silli
Received on Monday, 31 August 2009 09:24:38 UTC