- From: Ian Hickson <ian@hixie.ch>
- Date: Sun, 4 Oct 2009 11:28:35 +0000 (UTC)
- To: "Phillips, Addison" <addison@amazon.com>
- Cc: Andrew Cunningham <andrewc@vicnet.net.au>, Richard Ishida <ishida@w3.org>, "public-html@w3.org" <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
On Mon, 31 Aug 2009, Phillips, Addison wrote: > > > > > > Our concerns about this text are: > > > > > > 1. It isn't clear what constitutes a "legacy" or "non-legacy > > > environment". > > > > The Web is a legacy environment. Non-legacy environments are new > > walled gardens. > > I understand. But I think that the average reader might not. Using UTF-8 > as a default authoring choice in your walled garden is a Good Thing, but > really this would make a better FAQ or Best Practice than a > "recommended" case? If your goal is to inform people writing user agents > for these cases, perhaps say instead: > > -- > In controlled environments or in cases where the encoding of documents > can be prescribed, the UTF-8 character encoding is recommended. > -- Ok, I've changed "non-legacy" to something more like the above. > > > The sentence starting "Due to its use..." mentions "predominantly > > > Western demographics", which we find troublesome, especially given > > > that it is associated with the keyword "recommended". > > > > Why? > > This is really two points. > > First, I think that the demographics phrase isn't very well defined or > is imprecise. You should be more specific with the recommendation so > that implementers will know how to evaluate it. The problem here, as > we've discussed before, is that down this path is a list of > recommendations (one per "demographic"), something that I think better > to avoid in HTML5. This seems to be two problems: - "Western demographics" not being very clear for implementors. In practice, I think implementors understand this pretty well, so I'm not convinced that's a problem. - The slippery slope of needing to define this for all demographic. I would actually like to include details for other major demographies, but I don't think there's a slippery slope here, given that in the years of this text being present, we have not added requirements for other demographies. > Second, "recommended" is a 2119 keyword with a normative meaning. While > windows-1252 is probably the best default when the users are most often > accessing Latin-1 resources, it really should be an example of how an > implementation-defined default is chosen (user defined defaults being > the user's business). "Western demographics" combined with this > normative meaning might produce confusion (is Poland, a primarily > Latin-2 environment, Western? Etc. etc.). You are trying to say that the > best default for various language/regional audiences depends on the > audience. Browsers in the main do the right thing here, keying off > system locale or browser localization. I've changed "recommended" to "suggested". > > I haven't added this, as I don't want this step to turn into a long > > list of possible algorithms to use. However, if you have other papers > > I should reference in addition to [UNIVCHARDET], I'm happy to add > > references. > > I don't think you should add a lot of possible algorithms. It is just > that the special nature of UTF-8 and the relative simplicity of > bit-sniffing for it is a useful strategy, at least on the server side. I > suggested a special mention, given that I have seen browser vendors > saying that they are removing the optional step 6 support as time goes > on. If browsers don't do full chardet, they may still get some utility > by including the UTF-8 sniff. I'll dig up an appropriate reference if > you prefer. If you have a reference for this, that would be preferable, yes. Thanks. > My real issue was that in step 6 you allowed for bit sniffing. And then > you allow it again with: > > > Since these encodings can in many cases be distinguished by > > inspection, a user agent may heuristically decide which to use as a > > default. > > If what you meant to suggest here was that the default might be > something like "Japanese auto-detect", you should probably say that more > directly. > > However, it's not that important. I've since removed that quoted text. > > > http://www.w3.org/Bugs/Public/show_bug.cgi?id=7381 > > > "Clarify default encoding wording and add some examples for non- > > > latin locales." > > > > Thanks. I will get to these in due course. > > Thanks. Please let I18N WG know if we can assist you with this. I think > that the text suggested further down the thread marks a useful > improvement both on the existing text and on the original proposal. This bug is currently awaiting elaboration from the reporter. Cheers, -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Sunday, 4 October 2009 11:19:40 UTC