W3C home > Mailing lists > Public > public-webapps@w3.org > April to June 2009

Preferences for the I18N model

From: Robin Berjon <robin@berjon.com>
Date: Wed, 29 Apr 2009 17:23:48 +0200
Message-Id: <95FD02A3-23E6-487A-82D7-0EFBE8690DC0@berjon.com>
To: public-webapps WG <public-webapps@w3.org>

here are my personal preferences for the I18N model[0].

- Folder-based localization: A2 (editor's choice)

I don't feel strongly about this choice. I prefer it because it is  
more flexible and more powerful, but I would understand if  
implementers felt it did a bit more than necessary (I believe it is  
possible if needed to start with A1 and transition to A2 in future).

Caveat: within the A2 option, I do not understand if the following:


causes x-dahut and x-nessie to "inherit" the content from x. Since  
"x-" is used for private codes, I don't think that they should  
(neither should "i-", any primary subtag in the "qaa-" to "qtz-"  
range, nor any primary subtag that is four letters long or longer than  
eight letters). The BCP47 lookup algorithm handles this (or at least  
part of it), but I'm unsure that it's enough for our purposes  we  
should ask.

- The user agent's locale: B1 (editor's choice)

I *strongly* support this option. It may be slightly more complex but  
it matches how this is resolved over HTTP. Applications that act as if  
users were only able to use one language are always wrong and badly  
broken (I'm looking at you, Apple spell checker, Nokia T9, etc.).

- Deriving the widget's locale: C1 (editor's choice)

The way in which the BCP47 lookup algorithm is expected to work could  
be slightly clarified; i.e. if I'm looking for "en-us" and in document  
(or directory) order I have "en" then "en-us", then "en-us" will match  
because it is tried first for the entire list ("en" would match on a  
second pass, if "en-us" were not defined).

Note that the BCP47 lookup algorithm specifies that a default value is  
returned if nothing matches, to be selected. We should make it clear  
that in our case the default value returned for the locale when  
nothing matches should be "", as that matches xml:lang (and sort of  
makes sense in relationship to the locale directories). I point this  
out because one of the options is returning "i-default", which could  
confusingly lead to a locales/i-default/ directory being selected by  
some implementations.

- Possible representations of the widget's locale: D2 (editor's choice)

I could live with D3 too, but it seems too complicated for real-world  
use. D1 makes it more painful to produce localised content.

- Dereferencing URIs in Configuration Documents: something else

I don't think that either of the provided "E" options is good. I think  
that the base URI for a widget resource should not include locale  



and three locales en, fr, es, the start content resource would  
*always* have a document URL of "widget://UUDI/index.html. If the  
locale changes (even at runtime) the document URL does not. What that  
content is *resolved* to inside the archive changes with the locale,  
but not the URL.

The justification behind this approach is that: a) locales should be  
transparent, and b) there is no requirement to have the widget URI map  
*directly* unto the widget's structure. In fact, it is probably best  
if it's not possible inside locales/en/index.html to go "<a href='../ 
fr/index.html'>Frog version</a>".

We can put that in the widget URI document, or somewhere else (I'm not  
sure where it fits).

- Finding missing localized content: something else

I think that the "F" options are wrong for reasons similar to those  
expressed in the previous section. I think that for all intents and  
purposes, the user agent should behave as if content from the most  
specific locale had been copied into less specific locales recursively  
until they are copied to the root, and the locales directory is  

For instance, assuming:


After applying the above algorithm with a UA locale of en-gb-xx we  
would have widget content equivalent to:

     index.html [from locales/en]
     a.svg      [from locales/en-gb-xx]
     b.svg      [from locales/en-gb-xx]
     c.svg      [from locales/en-gb]
     d.svg      [from the root]

The way this is defined is that a URI space is created that has the  
following resolution in place:

   widget://UUID/index.html -> locales/en/index.html
   widget://UUID/a.svg      -> locales/en-gb-xx/a.svg
   widget://UUID/b.svg      -> locales/en-gb-xx/b.svg
   widget://UUID/c.svg      -> locales/en-gb/c.svg
   widget://UUID/d.svg      -> index.html

I believe that this approach is both cleaner and more powerful.  
Besides, it makes specifying widget URI resolution easier :) We just  
have to agree on whether that's defined in the widgets URI  
specification or in P+C. I'm happy to put it in the former (but I'll  
need to reference the I18N model from P+C so it may be easier to put  
it there).

- HTML browsing contexts: something else

Same issue, same proposal. The example given in the I18N proposal  
document is:


Based on the algorithm defined above (and the fact that the UA locale  
is "en-us-xx"), this generates the following URI space:

   widget://UUID/index.html  -> index.html
   widget://UUID/a.gif       -> locales/en-us-xx/a.gif
   widget://UUID/b.gif       -> b.gif
   widget://UUID/c.gif       -> locales/en/c.gif
   widget://UUID/hello/d.gif -> hello/d.gif



Robin Berjon - http://berjon.com/
     Feel like hiring me? Go to http://robineko.com/
Received on Wednesday, 29 April 2009 15:24:28 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 18:12:53 UTC