- From: <Peter_Constable@sil.org>
- Date: Wed, 7 Nov 2001 08:15:53 -0600
- To: www-international@w3.org
- Message-ID: <OF976A214B.0078475D-ON06256AFD.004AD05D@sil.org>
On 10/31/2001 12:47:09 PM David_Possin@i2.com wrote: >So far we have ISO codes for language (I prefer language group) and for country >(I prefer region). But there is not standard definition that tells me which >combinations are valid. Therefore I assume that any combination is valid and >legal and can be used. WRONG! [snip] >Let me describe 2 simple workflows our customers require. A major online >bookseller wants to display the site in the user's language and the user's >currency... The bookseller wants to >offer Spanish titles with Mexican preferences in US dollars. The problem is potentially even more complex; once we have solved that problem, we may one day want to be able to offer Japanese titles to a Spanish speaker while quoting prices in US dollars, or deliver information in Korean on Australian train schedules reporting times in the Sydney time zone but formatting them using Thai conventions (e.g. a Korean working in Thailand planning a trip to Australia). >Therefore we had to ignore locale identifiers for our application, write our >own language, region, time zone, and currency APIs, and maintain all ourselves. >Even obvious "globalized" Java standards were useless, because they were >inconsistent between the platforms. Our locales are now defined internally as > language_country_timezone_base-currency I think this reflects a better understanding of what a locale is: it's a bundle of default values for culturally-related user-interface parameters. It has language (but see below) as a property, but also other properties. Simply using language and country to distinguish one from another is not adequate. Note, by the way, that *language* is not really what is relevant for most current implementations; it's orthography, which is a particular usage of a particular writing system for a particular language. In the future, we will probably want locales to handle settings for both text and voice. If so, then we may want both orthography and dialect to be properties of a locale There is a current problem in that the key systems for "language" identification, ISO 639 and RFC 3066 (but the same is true of things like MS LANGIDs and LCIDs) do not have an adequate model of what they are identifying. It has been assumed that "language" is the thing being described, but we are encountering increasing confusion because they are starting to be used to distinguish several types of categories. Locale identification needs fixing, but it can't be fully fixed until the "language" identification problem is fixed. I'm trying to do some work on that front (with discussions happening on some lists other than this -- don't worry, Martin, I don't want to start that discussion here as well :-) Tex: You wrote >Well, I have not seen an alternative proposed and I >don't have one at the ready, but I don't mind taking >a shot at improving the current situation. I'd be glad to discuss with you my ideas on the "language" identification problem when you get to working on that. I'll be attending a meeting of TC 37/SC 2/WG 1 in late Jan or early Feb to discuss a new work project for ISO 639 that is intended to solve some of the langid problems, and I have in mind to draft a proposal for that meeting. - Peter --------------------------------------------------------------------------- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <peter_constable@sil.org>
Received on Wednesday, 7 November 2001 09:23:21 UTC