- From: Martin J. Duerst <duerst@w3.org>
- Date: Sun, 28 Nov 1999 17:15:29 +0900
- To: "Sean M. Burke" <sburke@netadventure.net>
- Cc: "www international" <www-international@w3.org>
Forwarded by the list moderator. At 00:42 1999/11/28 -0500, Sean M. Burke wrote: > At 10:32 AM 1999-11-26 -0800, you wrote: > >The biggest downside to this list > [ apparently referring to > http://www.din.de/gremien/nas/nabd/iso3166ma/codlstp1 ] > >is that it is only the region-specific codes, even though > >applications like Netscape and IE use the language and region > >when there is a need for clarity. > > That's like saying that the downside to ketchup is that it's not a > fissionable material. > The "downside" is not in any shortcoming of the product (country codes, or > ketchup), but in its unfitness for an unintended purpose (localization, or > nuclear fission, resp.). > > The fact that countries aren't the same thing as locales or languages is a > well-known problem; and it's why we have locale IDs and language codes. > Anyone who tries to represent a locale with a country code, or a country > with a language code, etc., is obviously misguided, and shouldn't be > localizing anything. > > >More importantly, in cases where you might not localize into > >(for example) every single region that speaks Arabic, you > >must deal with the fact that any of > [...the language codes...] > >(ar, ar-ae, ar-bh, ar-dz, > >ar-eg, ar-iq, ar-jo, ar-kw, ar-lb, ar-ly, ar-ma, ar-om, > >ar-qa, ar-sa, ar-sy, ar-tn, ar-ye) may come back to you > >from a browser or elsewhere, yet almost all people localize > >into the "sa" version > ...by which I assume you meant not "sa" but "ar-sa"... > >and so "ar-sa" or "ar" is what you > >will want to show. > > If a user agent requests an object while expressing a preference for > "ar-ma" (Moroccan Arabic), if the server sees that the closest thing it has > is an object tagged as being in "ar" ("generic" Arabic), yes, this would be > probably the best thing under the circumstances. > > However, answering an "ar-ma" request with an "ar-sa" (Saudi Arabic) object > seems decidedly less of a good idea; if the object in question is an audio > object, the Moroccan-speaker might find it quite unintelligible. > > > I'm aware of no single good solution to this problem -- particularly not > for Arabic or Chinese, where what "ar" or "zh" means varies so greatly > depending on the medium in question. > > However, having language-negotiation mechanisms interpret "ar" to mean > "Arabic in a dialect intelligible to the average international > speaker/reader of Arabic" does go a long way toward clarifying these > things. What I personally do is that if a program I write receives an > object request with this list of languages (in decreasing order of > preference): > > en-us, ar-kw, fr > > I impute it to mean: > en-us, ar-kw, fr, en, ar > I.e., the "generic international" codes are appended to the end, for each > more specific code specified in the preferences list. (This violates > RFC1766's rule that language-tags should be considered atomic, but I use it > just as a fallback and a heuristic.) > > Granted, that means that if the object requested is available in forms > tagged as being in "fr", "en", and "ar", the user will get the "fr" > version. This is passable, if potentially suboptimal. > > Moreoever, it comes about only because of two problems: > 1) The server's resources are not labelled right. The English version > should be marked as being in whatever dialect it's in, in addition to the > fact it's in a form of English intelligible to the notional "average > international English-speaker/reader". > Ditto for the Arabic version. > 2) The user should specify his preferences, in order, for > such "international" variants. > > For example, the user who specifies > en-us, ar-kw, fr > might mean this: > en-us, en, ar-kw, fr, ar > or might mean this: > en-us, en, ar-kw, ar, fr > > I'm not sure which is less realistic -- expecting users to configure their > user agents correctly, or expecting content providers to label things > correctly. > Presumably the former task could be simplified by having the installers for > user-agents give Americans a default Accept-Language of "en-US, en", > Mexicans a default Accept-Language of "es-MX, es", and so on; anyone > unhappy with these defaults would be welcome to edit them. The defaults > for Arabic-language and Chinese-language versions of user-agents could > differ from country to country. User-agents being correctly configured by > default would save the content-providers from having to jump thru hoops to > deal with the effects of misconfiguration. > > This all presumes the existence of a "generic/international" variants of > languages with many variants. Unfortunately that's a notably problematic > assumption for Arabic and Chinese, to a degree that depends on the medium > of the object in question. > > In these specific and problematic cases, I'd suppose that implementors > could specially treat them by access to a table somewhere expressing the > extent to which the average speaker of ar-X would accept an object in ar-Y. > It's my guess that you'd need at least three tables for different media: > writing, audio, video (that is, video without writing -- unlike Chinese TV > shows I see that are in spoken Mandarin, but subtitled in written Chinese > for the benefit of people who can read Chinese, but can't understand spoken > Mandarin). > Moreoever, the concept of "average speaker of ar-X" may also be fishy, or > may change greatly over time. > > While IANA/ISO language-negotiation protocols do not (as far as I know) > currently see heavy and crucial use in negotiating the serving of variant > audio/video resources in Arabic or Chinese, one never knows what tomorrow > may bring. I suppose the hard part is in not overcomplicating the > protocols for everyone else merely to accomodate content-negotiation of > Chinese and Arabic. > > -- > Sean M. Burke sburke@netadventure.net http://www.netadventure.net/~sburke/ > > > #-#-# Martin J. Du"rst, World Wide Web Consortium #-#-# mailto:duerst@w3.org http://www.w3.org
Received on Sunday, 28 November 1999 03:30:31 UTC