Re: FW: Determining Locale in a Browser for Web 2.0 Applications

Thank you Addison for the interesting topic and Najib for the excellent 
example.

My opinion is that the advances in the i18n technologies for locale 
support such as BCP 47, CLDR, WS-I18N and LTLI are really great. 
Meanwhile, I think it is unfortunate that we still often see 
nonconforming behavior on the Web when these specifications are 
applicable to the use case. I also think it is obscure why Google uses 
location to choose the default UI language. It is a drop in the bucket, 
there is a large number of problems due to inadequate locale negotiation.

I suppose this unfortunate situation is because supporting 
implementation is not widely available. For example, Java is two 
generation obsolete from BCP 47 and trying to catch up in the locale 
enhancement project <http://sites.google.com/site/openjdklocale/Home>. I 
wish this project included more requirements, especially the matching 
part of BCP 47 - RFC 4646 and CLDR are covered but little of 4647. My 
FireFox doesn't allow me to enter a BCP 47 tag, while I can enter one in 
IE as a user defined tag.

One of the most painful i18n problems in Java Web applications is 
neglecting locale negotiation. A developer may think taking the locale 
from the request object or JSF view locale and passing it to the 
ResourceBundle which implements a locale determination algorithm is all 
it takes. This model sometimes works fine to serve a small number of 
languages or locales, but it has a fundamental issue; there is no way 
the instances of the resource bundle to tell which locale to use, 
because they don't know the language preference of the user. For 
example, for an application to serve Najib, the appropriate algorithm to 
find the resource is looking for en first, fr next, then ar. This is as 
defined in RFC 4647. However nowhere in Java we can find this 
implemented. View locale determination of JSF is close to this, but its 
model is often ineffectual due to the fact that the negotiated locale 
cannot be used to choose both translation language and the locale used 
in other locale sensitive operations such as datetime and number 
formatting. For example, a conventional date format in Morocco may be 
preferred, even if the UI language is English, French or other foreign 
languages.

And in the increasing number of scenarios where web content is generated 
in a remote service, we have a big question mark for the way the service 
determines the right locale to use in producing the language or locale 
sensitive content. Proprietary solutions are used today, but WS-I18N is 
defining the standard locale determination mechanism to resolve the 
problems. :-)  (yet to be defined & implemented)

Regards,
-Dan

Najib Tounsi wrote:
> Phillips, Addison wrote:
>> For those not on the unicode@ mailing list, you may find this note to be of interest.
>>
>> And yes the beach was very nice.
>>
>> Addison
>>
>> Addison Phillips
>> Globalization Architect -- Lab126
>>
>> Internationalization is not a feature.
>> It is an architecture.
>>
>>
>> -----Original Message-----
>> From: Phillips, Addison 
>> Sent: Monday, April 20, 2009 9:33 PM
>> To: 'Peter Krefting'; Unicode Mailing List
>> Cc: cldr-users@unicode.org
>> Subject: RE: Determining Locale in a Browser for Web 2.0 Applications
>>
>> A few notes on this thread. Note that these are *personal* comments, notwithstanding my .sig.
>>
>> "Language preference" isn't quite the same thing as "locale", 
>
> Yes.
> For example (same for other parts of the world, I imagine (historical 
> reasons?)), I am in Morocco, my "locale" (native/official language) is 
> supposed to be Arabic, but I browse the web in English and Frech.
> My "Language Preference" is set to the En, then Fr, then Ar.  So, 
> these values are used as "A-L" by my browser.
> When I know there is an Arabic version of a site, I go to it 
> explicitly (e.g. News, e-gov infos...).
>
> I am don't always agree with sites assuming "I'am in Morocco, so I 
> want Arabic Web-pages". This is not straightforward.
> When I type google.com, I am redirected to google.co.ma. Well, I can 
> set my "Interface Language" to English (Arabic is default).
>
> NB:
> - with http://www.google.co.ma/ I get also
> "Google.co.ma offered in: Français 
> <http://www.google.co.ma/setprefs?sig=0_vmu-277buNDn_k70ZpuPvfVHVC4=&hl=fr> 
> العربية 
> <http://www.google.co.ma/setprefs?sig=0_vmu-277buNDn_k70ZpuPvfVHVC4=&hl=ar> 
> ", *both* Arabic and Frensh, when interface langage is English
> "Le domaine Google.co.ma est disponible en : العربية 
> <http://www.google.co.ma/setprefs?sig=0_KIl2jBrZrxzklJw4eI07JXiGFcA=&hl=ar> 
> ", when interface langage is French
> When interface language is Arabic, I am offered (only?) the French 
> interface language.
> Well, Morocco is also a Francophone country.
>
> - About google-translation page. My usual translations are from 
> English to Arabic. I wonder why the  default selection is set to 
> "spanish to english"?
>
> Regards,
>
> Najib.
>> although they are closely related. Locale is a programming concept useful in many ways, but mostly to do with APIs.
>>   
>> The Accept-Language header was intended to do language negotiation, but since implementation of it is inconsistent and since managing it is quite arcane, language negotiation via Accept-Language (A-L) alone is usually not fully satisfying. Sites that rely solely on A-L eventually tend to migrate to some form of personalization scheme (such as cookies) to track the actual user preference---even Google does this today. [Implementers should read and understand RFC 4647 and the "lookup" algorithm to avoid spotty performance such as that cited by Peter Krefting below. RFC 2616 is just too vague to make an effective algorithm.]
>>
>> "Navigator.language" is sometimes a synonym for A-L, however, knowing the language isn't all that useful in the browser, since JavaScript-the-language has no locale facet and locale-based formatting is not under programmatic control. Typically the JavaScript locale matches the system or user default locale where the browser is running, so locale-specific formatting ends up being a server-side task (or it risks being inconsistent with the server-side content). In XmlHttpRequests (for "REST" style or so-called "Web 2.0" interactions) one often sees the locale being transmitted using the A-L header, since programmers assume that's what the header is for, with the value poked into the header being stored as a session variable of some kind.
>>
>> Geolocation is not as bad as Peter Krefting makes it sound below. I know my general reaction to is has been negative---just because I'm in the Frankfurt airport doesn't mean I want German content, to pay in Euros, etc. However, geolocation can be exceedingly useful for finding "locality", local resources, or when all else fails (uncookied A-L-free browser pointed at a generic URI).
>>
>> Most sites that do language/locale negotiation end up providing some form of user interaction for managing the language following the negotiation process (hence the prevalence of cookie-ing or URL-rewriting) so that users can get what they want. 
>> With multiple ways of getting it wrong, you have to allow the user to adapt.
>>   
>> Overall, the whole thing is a bit of a patchwork mess. Each Web technology seems to choose a different approach, none of which are wholly wrong. And, indeed, there is work to try and address this at W3C. Specifically, the I18N WG is trying to complete work on two documents: "LTLI" (Language Tags and Locale Identifiers) and WS-I18N, promoting other standards (CLDR! IETF BCP 47!) and trying to lobby other W3C working groups (for example, WebApps, sometimes with some success to provide for consistent approaches.
>>
>> Do note that the latest BCP 47 (RFC4646bis) is in last call at the IETF right now. One thing browser vendors could do is implement it, since that would address some gaps in language coverage as well as the problem of script identification in locale identifiers.
>>
>> And anyone interested should really consider participating in the W3C Internationalization WG. We could use the help.
>>
>> Regards,
>>
>> Addison
>>
>> Addison Phillips
>> Globalization Architect -- Lab126
>> Chair -- W3C Internationalization WG
>>
>> Internationalization is not a feature.
>> It is an architecture.
>>
>>   
>>> -----Original Message-----
>>> From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]
>>> On Behalf Of Peter Krefting
>>> Sent: Monday, April 20, 2009 1:57 AM
>>> To: Unicode Mailing List
>>> Cc: cldr-users@unicode.org
>>> Subject: Re: Determining Locale in a Browser for Web 2.0
>>> Applications
>>>
>>> Hi!
>>>
>>>     
>>>> Will HTTP Accept-Language ever give you any more information than
>>>> Javascript's Navigator.language provides?
>>>>       
>>> It may, or may not. It might even be the same, depending on what
>>> browser you
>>> are using (I'm no JavaScript expert, so I cannot tell). Mine is
>>> currently
>>> set to "sv-SE,sv;q=0.9,nb;q=0.8,da;q=0.7,en;q=0.6" (I used to
>>> include "de"
>>> with a really low score as well, but some buggy servers then always
>>> sent me
>>> German instead of English, so I stopped doing that).
>>>
>>> Whether or not you will have a country code or just a language code
>>> depends
>>> on the browser, its user and the system it is on.
>>>
>>>     
>>>> So I am just wondering if anyone has been thinking about exposing
>>>>       
>>> more
>>>     
>>>> specific locale information inside of web browsers?  For example,
>>>>       
>>> a
>>>     
>>>> browser could just read the OS's locale information and expose
>>>>       
>>> that in a
>>>     
>>>> relevant object accessible via Javascript.
>>>>       
>>> But then you run into the problem of trying to figure out which
>>> setting is
>>> authoritative. I am currently running an English OS, but it was
>>> initially
>>> installed with a Norwegian locale (I live and work in Norway) and
>>> my user
>>> is set up for Swedish. Depending on what data software looks at,
>>> programs
>>> prompt me in either English, Norwegian or Swedish, seemingly
>>> randomly. A bit
>>> annoying.
>>>
>>> For web applications, the Accept-Language is usually the one that
>>> is most
>>> correct, as people tend to set it up to get Google to work properly,
>>> except
>>> if it says just "en" or "en-US", in which case the user didn't care
>>> to
>>> change the default.
>>>
>>> Using Geolocation is usually bad, but it depends on what type of
>>> information
>>> you provide. I'm quite happy to get Swedish text (from Accept-
>>> Language) with
>>> prices in Norwegian currency (from geolocation) when I browse
>>> flights at my
>>> friendly local airline operator. But I'm equally unhappy with sites
>>> assuming
>>> that I want Norwegian *text* just because I'm in Norway (and with
>>> countries
>>> having more than one official language, that becomes even more fun).
>>>
>>> One thing you especially should *not* look at when deciding
>>> language is
>>> either the operating system or browser UI language. I know a lot of
>>> people
>>> using either or both in English, but wanting another language for
>>> regular
>>> text (or being forced to use English because the software isn't
>>> localised
>>> for their language).
>>>
>>> --
>>> \\// Peter - http://www.softwolves.pp.se/
>>>     
>>
>>   
>
> -- 
> Najib TOUNSI (mailto:tounsi @ w3.org)
> W3C Office in Morocco (http://www.w3c.org.ma/)
> cole Mohammadia d'Ingénieurs, BP 765 Agdal-RABAT Maroc (Morocco)
> Phone : +212 (0) 537 68 71 50 (P1711)  Fax : +212 (0) 537 77 88 53
> Mobile: +212 (0) 661 22 00 30 

Received on Saturday, 25 April 2009 00:32:02 UTC