W3C home > Mailing lists > Public > www-international@w3.org > October to December 1999

Re: [Moderator Action] Re: Official ISO 3166 country codes online

From: Martin J. Duerst <duerst@w3.org>
Date: Sun, 28 Nov 1999 17:15:29 +0900
Message-Id: <199911280830.RAA15813@sh.w3.mag.keio.ac.jp>
To: "Sean M. Burke" <sburke@netadventure.net>
Cc: "www international" <www-international@w3.org>
Forwarded by the list moderator.

At 00:42 1999/11/28 -0500, Sean M. Burke wrote:
> At 10:32 AM 1999-11-26 -0800, you wrote:
> >The biggest downside to this list 
> [ apparently referring to
>   http://www.din.de/gremien/nas/nabd/iso3166ma/codlstp1 ]
> >is that it is only the region-specific codes, even though
> >applications like Netscape and IE use the language and region
> >when there is a need for clarity.
> That's like saying that the downside to ketchup is that it's not a
> fissionable material.
> The "downside" is not in any shortcoming of the product (country codes, or
> ketchup), but in its unfitness for an unintended purpose (localization, or
> nuclear fission, resp.).
> The fact that countries aren't the same thing as locales or languages is a
> well-known problem; and it's why we have locale IDs and language codes.
> Anyone who tries to represent a locale with a country code, or a country
> with a language code, etc., is obviously misguided, and shouldn't be
> localizing anything.
> >More importantly, in cases where you might not localize into
> >(for example) every single region that speaks Arabic, you
> >must deal with the fact that any of 
>  [...the language codes...]
> >(ar, ar-ae, ar-bh, ar-dz,
> >ar-eg, ar-iq, ar-jo, ar-kw, ar-lb, ar-ly, ar-ma, ar-om,
> >ar-qa, ar-sa, ar-sy, ar-tn, ar-ye) may come back to you
> >from a browser or elsewhere, yet almost all people localize
> >into the "sa" version
>  ...by which I assume you meant not "sa" but "ar-sa"...
> >and so "ar-sa" or "ar" is what you
> >will want to show.
> If a user agent requests an object while expressing a preference for
> "ar-ma" (Moroccan Arabic), if the server sees that the closest thing it has
> is an object tagged as being in "ar" ("generic" Arabic), yes, this would be
> probably the best thing under the circumstances.
> However, answering an "ar-ma" request with an "ar-sa" (Saudi Arabic) object
> seems decidedly less of a good idea; if the object in question is an audio
> object, the Moroccan-speaker might find it quite unintelligible.
> I'm aware of no single good solution to this problem -- particularly not
> for Arabic or Chinese, where what "ar" or "zh" means varies so greatly
> depending on the medium in question.
> However, having language-negotiation mechanisms interpret "ar" to mean
> "Arabic in a dialect intelligible to the average international
> speaker/reader of Arabic" does go a long way toward clarifying these
> things.  What I personally do is that if a program I write receives an
> object request with this list of languages (in decreasing order of
> preference):
>   en-us, ar-kw, fr
> I impute it to mean:
>   en-us, ar-kw, fr,    en, ar
> I.e., the "generic international" codes are appended to the end, for each
> more specific code specified in the preferences list.  (This violates
> RFC1766's rule that language-tags should be considered atomic, but I use it
> just as a fallback and a heuristic.)
> Granted, that means that if the object requested is available in forms
> tagged as being in "fr", "en", and "ar", the user will get the "fr"
> version.  This is passable, if potentially suboptimal.
> Moreoever, it comes about only because of two problems:
> 1) The server's resources are not labelled right.  The English version
> should be marked as being in whatever dialect it's in, in addition to the
> fact it's in a form of English intelligible to the notional "average
> international English-speaker/reader".
> Ditto for the Arabic version.
> 2) The user should specify his preferences, in order, for
>  such "international" variants.
> For example, the user who specifies
>   en-us, ar-kw, fr
> might mean this:
>   en-us, en, ar-kw, fr, ar
> or might mean this:
>   en-us, en, ar-kw, ar, fr
> I'm not sure which is less realistic -- expecting users to configure their
> user agents correctly, or expecting content providers to label things
> correctly.
> Presumably the former task could be simplified by having the installers for
> user-agents give Americans a default Accept-Language of "en-US, en",
> Mexicans a default Accept-Language of "es-MX, es", and so on; anyone
> unhappy with these defaults would be welcome to edit them.  The defaults
> for Arabic-language and Chinese-language versions of user-agents could
> differ from country to country.  User-agents being correctly configured by
> default would save the content-providers from having to jump thru hoops to
> deal with the effects of misconfiguration.
> This all presumes the existence of a "generic/international" variants of
> languages with many variants.  Unfortunately that's a notably problematic
> assumption for Arabic and Chinese, to a degree that depends on the medium
> of the object in question.
> In these specific and problematic cases, I'd suppose that implementors
> could specially treat them by access to a table somewhere expressing the
> extent to which the average speaker of ar-X would accept an object in ar-Y.
> It's my guess that you'd need at least three tables for different media:
> writing, audio, video (that is, video without writing -- unlike Chinese TV
> shows I see that are in spoken Mandarin, but subtitled in written Chinese
> for the benefit of people who can read Chinese, but can't understand spoken
> Mandarin).
> Moreoever, the concept of "average speaker of ar-X" may also be fishy, or
> may change greatly over time.
> While IANA/ISO language-negotiation protocols do not (as far as I know)
> currently see heavy and crucial use in negotiating the serving of variant
> audio/video resources in Arabic or Chinese, one never knows what tomorrow
> may bring.  I suppose the hard part is in not overcomplicating the
> protocols for everyone else merely to accomodate content-negotiation of
> Chinese and Arabic.
> --
> Sean M. Burke sburke@netadventure.net http://www.netadventure.net/~sburke/

#-#-#  Martin J. Du"rst, World Wide Web Consortium
#-#-#  mailto:duerst@w3.org   http://www.w3.org
Received on Sunday, 28 November 1999 03:30:31 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:19 UTC