- From: Marcos Caceres <w3c@marcosc.com>
- Date: Wed, 31 Jul 2013 20:40:21 +0100
- To: Norbert Lindenberg <w3@norbertlindenberg.com>
- Cc: www-international@w3.org
On Tuesday, July 30, 2013 at 2:32 AM, Norbert Lindenberg wrote: > Hi Marcos, > > Thank you for writing this proposal - I agree with you that obtaining the user's language preferences is a problem for many applications, and your proposal is a reasonable approach to solving the problem. Happy to hear that :) > > On Jul 26, 2013, at 1:09 , Marcos Caceres <w3c@marcosc.com (mailto:w3c@marcosc.com)> wrote: > > > ## Abstract > > > > This document proposes an extension to HTML's `Navigator` interface to enable > > dynamic localization of content. The idea is to expose to script the language > > tags that represents the user's locale preferences (akin to the language tags > > that are normally sent with HTTP's `Accept-Languages` header). > > > > The value of the languages attribute should directly reflect the contents of Accept-Language, minus the q values and plus canonicalization. "Akin" is too weak to describe this - Richard's wording is better. Agreed. > > Also, the HTTP header name is "Accept-Language" - please fix throughout the proposal. Fixed. > > > ## Extensions to Navigator interface > > Note: We've received feedback that TC39 is not in favor of API's using frozen > > /read-only arrays. Alternatives to the above attribute are: > > Can you point me to that feedback? TC39 did approve the supportedLocalesOf methods in ECMA-402, which return semi-frozen arrays. I can't provide you with a pointer, unfortunately: It was actually given to me in "real time" during the last TC39 meeting by Annevk. The motivation for getLanguages() was actually supportedLocalesOf() - btw, thanks again for the illuminating discussion about that when we last chatted about ECMA-402. > > ## The `languages` attribute > > > > When getting, the languages attribute returns a read only platform Array > > [WebIDL] of valid language tags in canonical form [BCP47]. The array is ordered > > from most preferred to least preferred, where the first item is the language tag > > that represents the user's most preferred language. > > > > The definition of "valid" in BCP 47 section 2.2.9 includes checking subtags against the IANA Language Subtag Registry. Yikes. I filed a bug on HTML5 also! https://www.w3.org/Bugs/Public/show_bug.cgi?id=22848 This is used in navigator.language also. > ECMA-402 uses "structurally well-formed", which is the BCP 47 "valid" except for that check. The working group felt that keeping an up-to-date copy of the registry around is an unnecessary burden on implementations - what really matters to users is whether a language is supported or not. You might use "structurally well-formed" here as well, with a reference to ECMA-402 section 6.2.2. > > BCP 47 has several optional steps in its section on canonicalization (4.5). ECMA-402 section 6.2.3 says for most of them whether they should be applied or not - I recommend using that section as a normative reference. Added citations as recommended above. > > ## Privacy considerations > > > > As with navigator.language, there are privacy implications in exposing the > > user's language preferences, as it can potentially be used to infer both the > > physical location (to at least a country level) and potentially the user's > > ethnic background (in those that choose have explicitly selected more than one > > language preference). These values can also be exploited, together, with other > > data to uniquely identify users. > > > > I think applications usually get much better location information from the Geolocation API or from the IP address, so I'd drop that part. Although true - the user needs to explicitly opt into geolocation - and ip can be spoofed with things like Tor… but this bit of info could still catch people out. I would prefer to keep this in as privacy keeps coming up. This will eventually have to go to the privacy interest group so would like to have something to show that we've at least given this some thought. > > However, these values are already shared with servers with every HTTP request, > > thus this API does not exacerbate the finger-printing situation. > > > > RFC 2616 section 15.1.4 actually describes how a user agent could protect the user's privacy by sensing whether the information would be useful and then prompting the user whether to send it. However, I don't know of any browser that implements this. Yes, HTML5 says more or less the same thing: "user agent implementors are encouraged to return "en" unless the user has explicitly indicated that the site in question is allowed access to the information." No browser follows it, AFAIK :/ > > Regardless, implementors are encouraged to reflect the value of > > navigator.language unless the user has explicitly indicated that the site in > > question is allowed access to the information. > > > > I think the current spec of navigator.language makes the wrong recommendation in this respect, and have filed a bug: > https://www.w3.org/Bugs/Public/show_bug.cgi?id=22835 > If a user agent doesn't want to provide language information, the attribute value should be undefined, not a value that looks like valid information but is wrong for most users. good catch! Thanks again, Norbert, for this review! This was super helpful! :D
Received on Wednesday, 31 July 2013 19:40:52 UTC