W3C home > Mailing lists > Public > www-international@w3.org > October to December 2001

Re: locales

From: Tex Texin <texin@progress.com>
Date: Thu, 08 Nov 2001 11:26:46 -0500
Message-ID: <3BEAB246.5886748@progress.com>
To: Thierry Sourbier <webmaster@i18ngurus.com>
CC: www-international@w3.org, NE Localization SIG <nelocsig@egroups.com>
Thierry,

I think we know it is a hard problem. An algorithmic solution may not be
possible, but we might be able to offer an improved heuristic over what
we have now. I don't think we are trying to eliminate custom coding.
There is always a need to override the defaults.

For myself, I am not concerned whether a particular locale is exactly
right for some per cent of a region. I would just be happy if I could
specify a locale and know what it entailed, and that if I gave the same
locale to 2 different software modules, they would both use the same
definitions for that locale.

Earlier, I used the term "invalid", but I didn't mean that the locale
would be accurate for the region. I meant that it would be
self-consistent, as using a character for currency that was represented
in the character set. Another example, might be using the character for
"space" as both a list item separator and the thousands separator in
numbers. (Making it impossible to have a list of numbers larger than a
thousand.)


hth
tex


Thierry Sourbier wrote:
> 
> While I fully understand the limitation of locales as they are currently
> defined, I'm very doubtful that the situation can be improved in a near
> future, given that:
> 
> 1. It is hardly possible to define *scientifically* what is a locale. Even
> the candidates for the *base* have shaky definition (e.g. language,
> region -why country?-, time zone, ...).
> 
> if we pass this hurdle:
> 
> 2. It is hardly possible to decide what is a *valid* locale (This is where
> David started). Shall we base it on the number of people it targets? In that
> case for example a locale such as sp_US (22 million people) should be *more
> valid* than fr_CA (7 million people). How can we prevent the lurking
> combination explosion? Some quick maths show that technically there are more
> locale candidates than character candidates for Unicode (dooh!).
> 
> if we pass this hurdle:
> 
> 3. It will be impossible for each application to support ALL valid locales.
> Then how the fall back mechanisms should work? Say that the sp_US locale is
> not present in my system, shall I default to Spain Spanish or English US? I
> guess you will say a bit of both... (side question then, how to prevent Mr
> QA guy from going postal?)
> 
> if we pass this hurdle:
> 
> 4. As Tex pointed out it is not even obvious what locales are to be used
> for. Some candidates include Selecting the content to display, formatting
> rules, collation rules, time zone, calendar, address format, units of
> measure, currency (shall we limit to one?)  but I'm sure we can find much
> more (e.g. basic privacy rules, sales tax information, ...).
> 
> and last but not least:
> 
> 5. It won't be an easy thing to make it simple to use, so at least people be
> tempted to look at it. How to make it a stantard so our locales will be
> portable to all platform? Shall a "Unilocale consortium" be created :).
> 
> The point of these questions is certainly not to get answers, but to show
> that without a given application framework it is impossible to get a closure
> on this topic. Sorry if this is bad news for some but I don't really see how
> custom coding could be avoided  in the forseeable future for application for
> which the current locales are not enough (this is what I believe trigered
> this entire discussion).
> 
> Don't take me wrong, I'm all for a better world but to join Martin Duerst
> comment, rather than critizing current models why not present ideas on how
> they could be improved? For those who have implemented their own solutions,
> why not make them into an open source project (Universal Locale Components?)
> to try to get it to become a de-facto standard like tz? - I'll be the first
> to advertise it-.
> 
> My 2 Euro cents,
> 
> Thierry.
> (who moved back to France to see the Euro mess first hand :).
> 
> <><><><><><><><><><><><><><><><><><><><><><>
> www.i18ngurus.com - Open Internationalization Resources Directory
> 
> ----- Original Message -----
> From: "Tex Texin" <texin@progress.com>
> To: "Carl W. Brown" <cbrown@xnetinc.com>
> Cc: <www-international@w3.org>
> Sent: Thursday, November 08, 2001 1:07 AM
> Subject: Re: locales
> 
> > Thanks Carl.
> >
> > I take this to mean that you are proposing that the language, country,
> > character set, time zone, and variant, represent 5 orthogonal attributes
> > which uniquely describe a "locale" and which are sufficient to describe
> > a user.
> >
> > I think I would like "variant" to go away, or at least not be required
> > to meet most needs.
> > I know it is used for Euro, I am not sure what other general purpose
> > usages it has.
> >
> > I wonder if we should add currency to your list of orthogonal values.
> >
> > Also, I note that language, country, and time zone are not sufficient to
> > determine which calendar is being used.
> > Perhaps timezone should be replaced with something representing
> > calendar+date+time formats and timezone?
> >
> > I am not sure what to say about possibly "invalid" combinations such as
> > euro currency and ISO 8859-1 character set (since it doesn't have the
> > euro symbol)...
> >
> > Perhaps this leads us to defining locale as a collection of names for
> > formats associated with basic datatypes-
> > (text, calendar, currency...)
> >
> > It then becomes more precise, but less useful as an easy to use
> > nomenclature...
> >
> > tex
> >
> > "Carl W. Brown" wrote:
> > >
> > > Tex,
> > >
> > > In xIUA I use the following format:
> > >
> > >      Format: (no spaces)
> > >      ll[_CC ][.MM ][@VV][#TT]
> > >
> > >      ll = lang, CC = ctry, MM = charmap, VV = Variant, TT = Time Zone
> > >
> > > For example:
> > >
> > > en_US.iso-5589-1#America/Los_Angeles
> > >
> > > or
> > >
> > > fr_FR.iso-5589-15@EURO#Europe/Paris
> > >
> > > It works well with ICU.  The conversion both ways is very simple and
> > > straight forward.
> > >
> > > Carl
> > >
> > > > -----Original Message-----
> > > > From: Tex Texin [mailto:texin@progress.com]
> > > > Sent: Wednesday, November 07, 2001 11:54 AM
> > > > To: David_Possin@i2.com
> > > > Cc: cbrown@xnetinc.com; www-international@w3.org;
> > > > www-international-request@w3.org
> > > > Subject: locales
> > > >
> > > >
> > > > David,
> > > >
> > > > If you would set up an archived forum, that would be great. It will
> save
> > > > me trying to identify which messages are relevant and saving them all
> on
> > > > my drive.
> > > >
> > > > Mentioning time zones will, I am sure, insure a blast from Carl. (;-)
> I
> > > > look forward to it.)
> > > > One point is that a locale may include more than one zone (e.g. US
> goes
> > > > from EST, CST PST) so is ambiguous, and we may go down the trail of
> the
> > > > changes to daylight savings time may vary within a locale.
> > > >
> > > > A key question for me is which of the many variables for
> > > > internationalization belong in a locale and which belong in some other
> > > > structure?
> > > >
> > > > Maybe time and calendar should not be a function of locale...
> > > > Maybe currency should not be.
> > > >
> > > > Which variables are best associated with the locale, which with the
> > > > data, and which with the application?
> > > > For example, since I develop database products, and I cannot have
> > > > indexes changing on me, I always include the rules for sorting in the
> > > > database, with the data.
> > > >
> > > > I don't generally worry about hyphenation, I would probably keep rules
> > > > for that with the application (the choice being influenced but not
> > > > defined by locale).
> > > >
> > > > tex
> > > >
> > > >
> > > >
> > > > David_Possin@i2.com wrote:
> > > > >
> > > > > I would propose to open a discussion forum for locales in the
> > > > > yahoo.groups like many other globalization people have done for
> other
> > > > > issues. It will be tough keeping up to date with all the threads
> > > > > starting to pop up, and all are extremely important to me and my
> job.
> > > > > Here are the issues I have been trying to monitor and even reply to,
> > > > > adding my 2 cents:
> > > > >
> > > > >   1. Locale definition - what is a locale?
> > > > >   2. Locale identification - how many parameters are needed for a
> > > > >      default minimal locale description?
> > > > >   3. Language identification - how can we identify languages that
> are
> > > > >      not included in the ISO 639 language group standard? (Current
> > > > >      locale identifiers use the 2-letter code, not the 3-letter
> code)
> > > > >   4. Time zones - There is no standard, the tz database is as close
> as
> > > > >      I can get to a standard and it is not officially tied to a
> > > > >      locale. This only touches the need for a standard global time &
> > > > >      date display.
> > > > >   5. Currencies - Locales have only one currency tied to them, and
> > > > >      European locales still all have their national currencies
> > > > >      implied.
> > > > >   6. Euro - The big problem is not the display, but how to use it.
> The
> > > > >      EC has strict requirements on how to do currency triangulation
> > > > >      with the euro. We discovered that rounding problems popped up
> > > > >      everywhere, especially when using euro precision for
> calculation
> > > > >      and had to display the value in a currency without decimals. It
> > > > >      would be a dream to have this in ICU.
> > > > >   7. Even when the euro becomes standard for a country, older
> > > > >      transactions will still have to be working with old currencies
> > > > >      and/or triangulation. We can't just convert them.
> > > > >
> > > > >      This only lists what has been mentioned in the last few days,
> > > > >      there is much more to be mentioned. I am trying to make PMs,
> > > > >      Devs, QA, etc globally aware here, but it is very hard to get
> > > > >      official requirements written up when there are no standards I
> > > > >      can show as reference.
> > > > >
> > > > >      And my biggest proposal is to break the tie between language
> and
> > > > >      country when selecting a locale.
> > > > >
> > > > >      Dave
> > > > >
> > > > >       "Tex Texin" <texin@progress.com>
> > > > >       Sent by:                                   To:        "Carl W.
> > > > >       www-international-request@w3.org   Brown" <cbrown@xnetinc.com>
> > > > >                                                  cc:
> > > > >       11/07/01 12:15 PM                   www-international@w3.org
> > > > >                                                  Subject:        Re:
> > > > >                                          Euro mess (Was: valid
> > > > >                                          locales ---> was  bilingual
> > > > >                                          websites
> > > > >
> > > > >      Carl,
> > > > >
> > > > >      I hope the locales issue doesn't fan out into thousands of
> other
> > > > >      threads, I won't be able to track them.
> > > > >
> > > > >      With respect to the Euro, there are several different issues.
> > > > >
> > > > >      a) Of course the Euro is important and having proper support
> for
> > > > >      the
> > > > >      Euro is required.
> > > > >
> > > > >      b) ISO 8859-15 does not seem to be getting much adoption, which
> > > > >      is a
> > > > >      good thing. Since 8859-15 and 8859-1 are incompatible, and if
> you
> > > > >      adopt
> > > > >      8859-15 you likely still need to interchange text with users of
> > > > >      8859-1,
> > > > >      (as they both support the same languages more or less), the
> world
> > > > >      would
> > > > >      be a very difficult if there was a lot of adoption of -15.
> > > > >
> > > > >      Anyone considering -15, should instead be considering Unicode.
> > > > >
> > > > >      And there are other alternatives if the only requirement is to
> > > > >      support
> > > > >      the Euro character and continue with a single byte codepage.
> > > > >      Spelling out "Eur" or "Euro" is acceptable if there is space.
> And
> > > > >      inventing mechanisms (e.g. escape sequences, or other
> specialized
> > > > >      encodings) to print the Euro symbol are also possible.
> > > > >
> > > > >      c) The issue relative to locales, is there is no standard
> > > > >      handling for
> > > > >      the Euro. So my understanding is some software will change the
> > > > >      currency
> > > > >      of their European locales from native monetary units to Euro on
> > > > >      Jan. 1.
> > > > >      This may be useful for some, but will likely break many
> > > > >      applications as
> > > > >      well.
> > > > >
> > > > >      Others will create new locales specific to the Euro and/or
> > > > >      specific to
> > > > >      the old native currency. But which nomenclature you use when
> you
> > > > >      are
> > > > >      integrating software with different technologies and different
> > > > >      locale
> > > > >      naming conventions is a mystery to me.
> > > > >
> > > > >      So now if I say fr_fr I do not know which currency I get and it
> > > > >      may
> > > > >      change from Dec 31 2001 to Jan 1 2002.
> > > > >      If I use an application that integrates technologies with
> > > > >      different
> > > > >      rules for locales, it could get very messy.
> > > > >
> > > > >      I presume reading monetary data created before 2002 may also be
> > > > >      interpreted differently after 2002.
> > > > >
> > > > >      And minor upgrades of software may in fact invoke these locale
> > > > >      changes,
> > > > >      so what should be a minor patch may in fact be a large change
> to
> > > > >      monetary handling.
> > > > >
> > > > >      d) I don't know why there isn't more of an outcry over this.
> > > > >      Maybe there
> > > > >      is a reason the problems I cite in (c) won't happen that I
> don't
> > > > >      understand. (I am by no means an expert on the subject. Most of
> > > > >      my own
> > > > >      software has explicit regional settings and doesn't follow the
> > > > >      locale
> > > > >      model.) It will be interesting to know what people find if they
> > > > >      change
> > > > >      their system clock to 2002 and do some application testing.
> > > > >
> > > > >      hth
> > > > >      tex
> > > > >
> > > > >      "Carl W. Brown" wrote:
> > > > >      >
> > > > >      > Tex,
> > > > >      >
> > > > >      > I wonder why no one seems to care about the Euro?  Are sites
> > > > >      going to
> > > > >      > continue to use iso-5589-1?  How many browsers and systems
> > > > >      support
> > > > >      > iso-5589-15?
> > > > >      >
> > > > >      > Carl
> > > > >      >
> > > > >      > > -----Original Message-----
> > > > >      > > From: www-international-request@w3.org
> > > > >      > > [mailto:www-international-request@w3.org]On Behalf Of Tex
> > > > >      Texin
> > > > >      > > Sent: Tuesday, November 06, 2001 7:42 PM
> > > > >      > > To: Martin Duerst
> > > > >      > > Cc: David_Possin@i2.com; Karl Ove Hufthammer;
> > > > >      www-international@w3.org
> > > > >      > > Subject: Re: valid locales ---> was Re: bilingual websites
> > > > >      > >
> > > > >      > >
> > > > >      > > Martin,
> > > > >      > >
> > > > >      > > You mean I can't just grouse and take potshots from the
> > > > >      sidelines? ;-)
> > > > >      > >
> > > > >      > > Well, I have not seen an alternative proposed and I don't
> > > > >      have one at
> > > > >      > > the ready, but I don't mind taking a shot at improving the
> > > > >      current
> > > > >      > > situation. However, I am crunching now thru the end of the
> > > > >      year, so I
> > > > >      > > will give it a go in the new year.
> > > > >      > > In the meantime, I would be happy to collect both
> suggestions
> > > > >      for
> > > > >      > > requirements and suggestions for solutions on this list or
> > > > >      privately.
> > > > >      > >
> > > > >      > > The new year should be interesting, as the switch to the
> new
> > > > >      Euro
> > > > >      > > currency will demonstrate some of the chaos with locales.
> > > > >      > >
> > > > >      > > tex
> > > > >      > >
> > > > >      > > Martin Duerst wrote:
> > > > >      > > >
> > > > >      > > > Tex - Could you write up (short), or point to, any
> proposal
> > > > >      > > > for how to do better than currently?
> > > > >      > > >
> > > > >      > > > Regards,  Martin.
> > > > >      > > >
> > > > >      > > > At 14:57 01/10/31 -0500, Tex Texin wrote:
> > > > >      > > > >David,
> > > > >      > > > >
> > > > >      > > > >FWIW, I thoroughly agree that locales as we currently
> > > > >      define and
> > > > >      > > > >implement them, do not work.
> > > > >      > > > >As a naming convention it is inadequate, and when you
> > > > >      select a
> > > > >      > > name, you
> > > > >      > > > >are not sure what behavior you will get.
> > > > >      > > > >
> > > > >      > > > >I have mentioned this before, and the response is always
> > > > >      "Yes, it's
> > > > >      > > > >broken, but it is the best we have at the moment.".
> > > > >      > > > >
> > > > >      > > > >It is rather unfortunate that we have this methodology
> > > > >      therefore, and
> > > > >      > > > >that it is accepted, since it won't be fixed as long as
> > > > >      this response
> > > > >      > > > >continues.
> > > > >      > > > >
> > > > >      > > > >tex
> > > > >      > > > >
> > > > >      > > > >--
> > > > >      > > >
> > > > >      >-------------------------------------------------------------
> > > > >      > > > >Tex Texin                    Director, International
> > > > >      Business
> > > > >      > > > >mailto:Texin@Progress.com    Tel: +1-781-280-4271
> > > > >      > > > >the Progress Company         Fax: +1-781-280-4655
> > > > >      > > >
> > > > >      >-------------------------------------------------------------
> > > > >      > >
> > > > >      > > --
> > > > >      >
> > -------------------------------------------------------------
> > > > >      > > Tex Texin                    Director, International
> Business
> > > > >      > > mailto:Texin@Progress.com    Tel: +1-781-280-4271
> > > > >      > > the Progress Company         Fax: +1-781-280-4655
> > > > >      >
> > -------------------------------------------------------------
> > > > >      > >
> > > > >
> > > > >      --
> > > > >      -------------------------------------------------------------
> > > > >      Tex Texin                    Director, International Business
> > > > >      mailto:Texin@Progress.com    Tel: +1-781-280-4271
> > > > >      the Progress Company         Fax: +1-781-280-4655
> > > > >      -------------------------------------------------------------
> > > >
> > > > --
> > > > -------------------------------------------------------------
> > > > Tex Texin                    Director, International Business
> > > > mailto:Texin@Progress.com    Tel: +1-781-280-4271
> > > > the Progress Company         Fax: +1-781-280-4655
> > > > -------------------------------------------------------------
> > > > "When choosing between two evils, I always like to try the
> > > > one I've never tried before."- -Mae West
> >
> > --
> > -------------------------------------------------------------
> > Tex Texin                    Director, International Business
> > mailto:Texin@Progress.com    Tel: +1-781-280-4271
> > the Progress Company         Fax: +1-781-280-4655
> > -------------------------------------------------------------
> > "When choosing between two evils, I always like to try the
> > one I've never tried before."- -Mae West
> >
> >

-- 
-------------------------------------------------------------
Tex Texin                    Director, International Business
mailto:Texin@Progress.com    Tel: +1-781-280-4271
the Progress Company         Fax: +1-781-280-4655
-------------------------------------------------------------
"When choosing between two evils, I always like to try the
one I've never tried before."- -Mae West
Received on Thursday, 8 November 2001 11:26:54 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:58 GMT