Re: Localization and Internationalization

On Mon, 4 Aug 1997 Iain.URQUHART@lux.dg13.cec.be wrote:

> There is a related problem in electronic commerce, which is about
> locale as well as geography. An electronic commerce server should
> have some way of knowing e.g what currency and/or weights & measurements
> system a given user prefers. (which may of course be independent of
> geographical location, although a geographical location might also
> be relevant). It's probably possible to do all this with cookies
> but something like an Accept-Locale header would be a bit less trouble.

Hello Iain,

Many thanks for your comment, giving me a chance to say something
about languages and locales I wanted to say for quite some time.

As one of the authors of RFC 2070 (HTML i18n), I have several times been
contacted by people asking "You have language, but what about locale?"
and also by people mentionning that "In posix, LANG means locale, so please
don't use LANG to mean language in HTML; people might get confused".

My standard answer to the first comment was usually: Yes, we planned
to include it, and had it in our first draft, but we got under heavy
fire about this, and decided to remove it in order to save the rest.
(We actually had included it from the beginning as a strategy to give
people something to complain about, so they wouldn't complain too much
about the rest :-).

My standard answer to the second comment was usually: Too bad posix
choose LANG to mean locale, that's not our fault, and wouldn't get
better if we called language LOCALE or something like that.

I was never really satisfied with my answers, but also never really
succeeded to understand why people would need locale information.

Recently, I have come to the conclusion that everything might be
easier than we all thought :-). Let's look at a typical posix locale
tag, assinged to the LANG environment variable. It is something
like ja_JP or en_US (sorry if I got the syntax wrong). And let's
have a look at the typical RFC 1766 language tag, used in HTML LANG
attributes and in HTTP. It is something like ja-JP (or just ja) or
en-US.

Now what does that mean in each case. en_US as a locale tag means
that the language of (error) messages is (US) English, and number/
date/... formatting is done according to US conventions. en-US means
that the document is (requested to be, or actually is) in (US)
English. As part of that, if the document is really US English,
we should also be entitled to expect that the number/date/...
formatting conforms to the conventions used in US English.

If you have detected a certain similarity in the descriptions
above, that would have been my intention. As a conclusion, we
might say that we most probably don't need cookies for this,
and neither an Accept-Locale header; we just have to make
Accept-Language work correctly. Maybe some note about this
in the HTML spec would be necessary?

There may be cases where Accept-Language doesn't exactly work
as expected. In particular, assume that I have requested en-US
but that the server only has a single generic English version,
which conforms more to British than to US conventions. Reading
through that document, I may not get aware of the fact that
British conventions are used, which might be a problem.

But this can't be remedied by an Accept-Locale header. If the
server can't serve an en-US version when receiving
	Accept-Language: en-US
it won't be able to do so even if it gets
	Accept-Language: en-US
	Accept-Locale: en-US
The only chance would be to handle this at the client, for which
we would need markup for dates, measures, numbers,..., but which
wouldn't need Accept-Locale because it's a client-only business.
But I doubt that this is necessary. (I would like to hear about
cases where this may become necessary or useful, if there are
some.)


Regards,	Martin.

Received on Monday, 4 August 1997 08:16:07 UTC