Re: Proposal: Locale Preferences API from Norbert Lindenberg on 2013-07-30 (www-international@w3.org from July to September 2013)

From: Norbert Lindenberg <w3@norbertlindenberg.com>
Date: Mon, 29 Jul 2013 18:32:31 -0700
To: Marcos Caceres <w3c@marcosc.com>
Cc: Norbert Lindenberg <w3@norbertlindenberg.com>, www-international@w3.org
Message-Id: <034B9749-8D43-450C-A81B-7470BCED28D1@norbertlindenberg.com>

Hi Marcos,

Thank you for writing this proposal - I agree with you that obtaining the user's language preferences is a problem for many applications, and your proposal is a reasonable approach to solving the problem.

In addition to the language vs. locale discussion, I have a few more comments below.

Thanks,
Norbert

On Jul 26, 2013, at 1:09 , Marcos Caceres <w3c@marcosc.com> wrote:

> ## Abstract
> 
> This document proposes an extension to HTML's `Navigator` interface to enable
> dynamic localization of content. The idea is to expose to script the language
> tags that represents the user's locale preferences (akin to the language tags
> that are normally sent with HTTP's `Accept-Languages` header).

The value of the languages attribute should directly reflect the contents of Accept-Language, minus the q values and plus canonicalization. "Akin" is too weak to describe this - Richard's wording is better.

Also, the HTTP header name is "Accept-Language" - please fix throughout the proposal.

> ## Extensions to Navigator interface

> Note: We've received feedback that TC39 is not in favor of API's using frozen
> /read-only arrays. Alternatives to the above attribute are:

Can you point me to that feedback? TC39 did approve the supportedLocalesOf methods in ECMA-402, which return semi-frozen arrays.

> ## The `languages` attribute
> 
> When getting, the languages attribute returns a read only platform Array
> [WebIDL] of valid language tags in canonical form [BCP47]. The array is ordered
> from most preferred to least preferred, where the first item is the language tag
> that represents the user's most preferred language.

The definition of "valid" in BCP 47 section 2.2.9 includes checking subtags against the IANA Language Subtag Registry. ECMA-402 uses "structurally well-formed", which is the BCP 47 "valid" except for that check. The working group felt that keeping an up-to-date copy of the registry around is an unnecessary burden on implementations - what really matters to users is whether a language is supported or not. You might use "structurally well-formed" here as well, with a reference to ECMA-402 section 6.2.2.

BCP 47 has several optional steps in its section on canonicalization (4.5). ECMA-402 section 6.2.3 says for most of them whether they should be applied or not - I recommend using that section as a normative reference.

> ## Privacy considerations
> 
> As with navigator.language, there are privacy implications in exposing the
> user's language preferences, as it can potentially be used to infer both the
> physical location (to at least a country level) and potentially the user's
> ethnic background (in those that choose have explicitly selected more than one
> language preference). These values can also be exploited, together, with other
> data to uniquely identify users.

I think applications usually get much better location information from the Geolocation API or from the IP address, so I'd drop that part.

> However, these values are already shared with servers with every HTTP request,
> thus this API does not exacerbate the finger-printing situation.

RFC 2616 section 15.1.4 actually describes how a user agent could protect the user's privacy by sensing whether the information would be useful and then prompting the user whether to send it. However, I don't know of any browser that implements this.

> Regardless, implementors are encouraged to reflect the value of
> navigator.language unless the user has explicitly indicated that the site in
> question is allowed access to the information.

I think the current spec of navigator.language makes the wrong recommendation in this respect, and have filed a bug:
https://www.w3.org/Bugs/Public/show_bug.cgi?id=22835
If a user agent doesn't want to provide language information, the attribute value should be undefined, not a value that looks like valid information but is wrong for most users.

Received on Tuesday, 30 July 2013 01:32:59 UTC