Re: Localization and Internationalization

Martin J. Duerst wrote:
> 
> On Mon, 4 Aug 1997 Iain.URQUHART@lux.dg13.cec.be wrote:
> 
> > There is a related problem in electronic commerce, which is about
> > locale as well as geography. An electronic commerce server should
> > have some way of knowing e.g what currency and/or weights &
> measurements
> > system a given user prefers. (which may of course be independent of
> > geographical location, although a geographical location might also
> > be relevant). It's probably possible to do all this with cookies
> > but something like an Accept-Locale header would be a bit less
> trouble.
> 
> Hello Iain,
> 
> Many thanks for your comment, giving me a chance to say something
> about languages and locales I wanted to say for quite some time.
> 
> As one of the authors of RFC 2070 (HTML i18n), I have several times
> been
> contacted by people asking "You have language, but what about locale?"
> and also by people mentionning that "In posix, LANG means locale, so
> please
> don't use LANG to mean language in HTML; people might get confused".

Sorry, but while POSIX may call the LANG variable a locale variable, it
does in fact cover language, territory, and charset.  And while many
implementations of the POSIX locale will not allow all combinations of
language and territory, that doesn't mean that they're not
implementable.

In addition, many other renditions of a "locale" variable account for a
variant, which could be anything.  In one example, it was used for
state, province, or principality of the territory.  In another example
it was used for computer platform.  (X/Open uses a modifier, not sure if
its POSIX or not).

Java uses the class Locale.  Windows uses LANG and LOCALE.  Oracle uses
NLS_LANG.  Let's face it, if people weren't already confused, then the
HTML parameter won't confuse them any more.

> 
> My standard answer to the second comment was usually: Too bad posix
> choose LANG to mean locale, that's not our fault, and wouldn't get
> better if we called language LOCALE or something like that.

True, it would not be better.  Language isn't locale and locale isn't
language.  I don't think there's an easy term to pick, so long as
territories have more than one language as the official (or commonly
used) one.  The LANG variable's shortest form is language-only; in that
sense the term is descriptive of its own syntax.

> 
> I was never really satisfied with my answers, but also never really
> succeeded to understand why people would need locale information.
> 
> Recently, I have come to the conclusion that everything might be
> easier than we all thought :-). Let's look at a typical posix locale
> tag, assinged to the LANG environment variable. It is something
> like ja_JP or en_US (sorry if I got the syntax wrong). And let's
> have a look at the typical RFC 1766 language tag, used in HTML LANG
> attributes and in HTTP. It is something like ja-JP (or just ja) or
> en-US.
> 
> Now what does that mean in each case. en_US as a locale tag means
> that the language of (error) messages is (US) English, and number/
> date/... formatting is done according to US conventions. en-US means
> that the document is (requested to be, or actually is) in (US)
> English. As part of that, if the document is really US English,
> we should also be entitled to expect that the number/date/...
> formatting conforms to the conventions used in US English.
> 
> If you have detected a certain similarity in the descriptions
> above, that would have been my intention. As a conclusion, we
> might say that we most probably don't need cookies for this,
> and neither an Accept-Locale header; we just have to make
> Accept-Language work correctly. Maybe some note about this
> in the HTML spec would be necessary?

I think if HTML follows the POSIX syntax it would make life a lot easier
for us implementers, regardless of what the parameter name is.

> 
> There may be cases where Accept-Language doesn't exactly work
> as expected. In particular, assume that I have requested en-US
> but that the server only has a single generic English version,
> which conforms more to British than to US conventions. Reading
> through that document, I may not get aware of the fact that
> British conventions are used, which might be a problem.
> 
> But this can't be remedied by an Accept-Locale header. If the
> server can't serve an en-US version when receiving
>         Accept-Language: en-US
> it won't be able to do so even if it gets
>         Accept-Language: en-US
>         Accept-Locale: en-US
> The only chance would be to handle this at the client, for which
> we would need markup for dates, measures, numbers,..., but which
> wouldn't need Accept-Locale because it's a client-only business.
> But I doubt that this is necessary. (I would like to hear about
> cases where this may become necessary or useful, if there are
> some.)

The entire locale is helpful at the server end in the world of
electronic commerce, as Iain pointed out, though I'm not sure that
products would leave the display decision up to an Accept-Language
header.  In any case, the data which appears on our products' Web pages
come from a database at the server end.  If the server doesn't know what
language/locale's data to retrieve, this creates a problem.

As an implementer, the POSIX syntax is useful and straightforward.
> 
> Regards,        Martin.

Received on Monday, 4 August 1997 14:25:18 UTC