Re: valid locales ---> was Re: bilingual websites from Peter_Constable@sil.org on 2001-11-07 (www-international@w3.org from October to December 2001)

From: <Peter_Constable@sil.org>
Date: Wed, 7 Nov 2001 08:15:53 -0600
To: www-international@w3.org
Message-ID: <OF976A214B.0078475D-ON06256AFD.004AD05D@sil.org>
On 10/31/2001 12:47:09 PM David_Possin@i2.com wrote:

>So far we have ISO codes for language (I prefer language group) and for 
country 
>(I prefer region). But there is not standard definition that tells me 
which 
>combinations are valid. Therefore I assume that any combination is valid 
and 
>legal and can be used. WRONG! 

[snip]

>Let me describe 2 simple workflows our customers require. A major online 
>bookseller wants to display the site in the user's language and the 
user's 
>currency... The bookseller wants to 
>offer Spanish titles with Mexican preferences in US dollars. 

The problem is potentially even more complex; once we have solved that 
problem, we may one day want to be able to offer Japanese titles to a 
Spanish speaker while quoting prices in US dollars, or deliver information 
in Korean on Australian train schedules reporting times in the Sydney time 
zone but formatting them using Thai conventions (e.g. a Korean working in 
Thailand planning a trip to Australia).


>Therefore we had to ignore locale identifiers for our application, write 
our 
>own language, region, time zone, and currency APIs, and maintain all 
ourselves. 
>Even obvious "globalized" Java standards were useless, because they were 
>inconsistent between the platforms. Our locales are now defined 
internally as 
>        language_country_timezone_base-currency 

I think this reflects a better understanding of what a locale is: it's a 
bundle of default values for culturally-related user-interface parameters. 
It has language (but see below) as a property, but also other properties. 
Simply using language and country to distinguish one from another is not 
adequate.

Note, by the way, that *language* is not really what is relevant for most 
current implementations; it's orthography, which is a particular usage of 
a particular writing system for a particular language. In the future, we 
will probably want locales to handle settings for both text and voice. If 
so, then we may want both orthography and dialect to be properties of a 
locale

There is a current problem in that the key systems for "language" 
identification, ISO 639 and RFC 3066 (but the same is true of things like 
MS LANGIDs and LCIDs) do not have an adequate model of what they are 
identifying. It has been assumed that "language" is the thing being 
described, but we are encountering increasing confusion because they are 
starting to be used to distinguish several types of categories. Locale 
identification needs fixing, but it can't be fully fixed until the 
"language" identification problem is fixed. I'm trying to do some work on 
that front (with discussions happening on some lists other than this -- 
don't worry, Martin, I don't want to start that discussion here as well 
:-)


Tex: 

You wrote

>Well, I have not seen an alternative proposed and I 
>don't have one at the ready, but I don't mind taking 
>a shot at improving the current situation.

I'd be glad to discuss with you my ideas on the "language" identification 
problem when you get to working on that. I'll be attending a meeting of TC 
37/SC 2/WG 1 in late Jan or early Feb to discuss a new work project for 
ISO 639 that is intended to solve some of the langid problems, and I have 
in mind to draft a proposal for that meeting.


- Peter


---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>
Received on Wednesday, 7 November 2001 09:23:21 UTC