Manifest internationalization Model

Hi,   
TL;DR: We need to settle on an i18n model for the manifest format. We have a few options here (based on the membership of the group):  

1. Firefox OS's model [1].  
2. Google packaged apps's [2].
3. W3C Packaged Web Apps (widgets) i18n model [3].  

Each of the models have their own pros and cons. Below I describe each model.  

My recommendation is that we use the FireFox OS one, but with a few modifications:  

* Mozilla's i18n model currently only supports language tags with only two sub-tags. This should be fixed to allow three or more sub tags (I've got confirmation from Mozilla that they are willing to change this: https://bugzilla.mozilla.org/show_bug.cgi?id=846269).  However, this depends on affected populations of people (see discussion on www-international: http://lists.w3.org/Archives/Public/www-international/2013JanMar/0305.html).    

* Language tag decomposition should follow BCP47's lookup algorithm (and an "application locale" should be derived from the union of the user agent locale and the manifest's declared locales).  

* The application locale should be exposed in JS through an interface.  

Ok… the rest is the long bla bla for each…. Will maybe put this into the Wiki or something.  





# Internationalization model of Firefox OS

A more human readable version of this section is at [1].  

This section lists a set of localization scenarios and describes how Firefox OS handles these different use cases. The section makes some recommendations about how particular use cases could be better addressed.

To simplify the discussion, this document assumes the runtime is running in locale "en-US".  

## CASE 1 - No localisation information.  

*Use case:* The author does not wish to explicitly localize any content.  

```JSON
{
  "name": "foo"
}
```  

*FxOS:* When there is no localized content declared, the user agent just uses what is at the root of the manifest. Hence, the name of the app is "foo".  

## CASE 2 - No default locale

*Use case:* The author provides the required ```name``` member, but chooses to localize the application's name for the "en-US" locale. However, the author neglects to add the ```default_locale``` member.

```JSON
{
  "name": "unknown-locale name",
  "locales": {
    "jp": {
      "name": "jp name"
    },
    "en-US": {
      "name": "en-US name"
    }
  }
}
```

*FxOS:* Despite there not being any ```default_locale```, the user agent still chooses "en-US name" as the name for the application. When neither localized content matches, the value at the root of the manifest is selected.  

## CASE 3 - No default locale, multiple matching ranges

*Use case:* The author declares the name of the application using a set of variants of the English language. The author omits the ```default_locale``` member.  

```JSON
{
  "name": "unknown-locale name",
  "locales": {
    "en-US-x-test": {
      "name": "en-US-x-test name"
    }
    "en-US": {
      "name": "en-US name"
    },
    "en": {
      "name": "en name"
    }
  }
}
```

*FxOS:* The user agent selects the localized content that exactly matches the user agent's default language (so, in this case, "en-US name" is shown).  

## CASE 4 - No default locale, with catch all

*Use case:* The author declares the name of the application using a set of variants of the English language. However, none of them match the user agent's locale settings exactly. Fortunately, the author has included a catch all ("en").     

```JSON
{
  "name": "unknown-locale name",
  "locales": {
    "en-AU": {
      "name": "en-AU name"
    }
    "en-GB": {
      "name": "en-GB name"
    },
    "en": {
      "name": "en name"
    }
  }
}
```

*FxOS:* The user agent first checks for "en-US", but failing that, it selects the next best match, which is "en name".  

## CASE 5 - No default locale, multiple matching ranges

*Use case:* The author wants to localize the name, but does not need to localize the developer information.  

```JSON
{
  "name": "unknown-locale name",
  "developer": {
    "name": "unknown-locale author"
  },
  "locales": {
    "en-US": {
      "name": "en-US name"
    },
    "jp":{
      "name": "jp name"
    }
  }
}
```

*FxOS:* The user agent selects "en-US" as the name, and "unknown-locale author" as the author.

## CASE 6 - No default locale, multiple matching ranges

*Use case:* the author wants to localize the developer name but not the developer URL.  

```JSON
{
  "name": "unknown-locale name",
  "developer": {
    "name": "unknown-locale author",
    "url": "http://unknown-locale.com/"
  },
  "locales": {
    "en-US": {
      "developer": {
        "name": "localized author"
      }
    }
  }
}
```

*FxOS:* The user agent selects the localized developer name and uses the unknown-locale developer ```url```.  

## CASE 7 - Default locale
Use case: When the author uses any value for ```default_locale```, but no localized content is given through a ```locale``` member, the author still expects some content to be displayed to the user (even if they user might not be able to understand it).  

```JSON
{
  "name": "unknown-locale name",
  "developer": {
    "name": "unknown-locale author"
  },
  "default_locale": "unknown-locale"
}
```

*FxOS:* The user agent selects "unknown-locale name" as the name of the application.  

## CASE 8 - Language tag decomposition and lookup
*Use case:*   

```JSON
{
  "name": "unknown-locale name",
  "developer": {
    "name": "unknown-locale author"
  },
  "locales": {
    "en-US": {
      "name": "en name"
    },
    "en": {
      "developer": {
        "name": "en developer"
      }
    }
  },
  "default_locale": "unknown-locale"
}
```
The name of the app is "en name".  

*FxOS:* When neither the ```default_locale``` nor the user agent locale matches any localized content, FxOS just uses the first sub-tag of the language range (in this case, just "en"). So, "en-US" becomes "en" ([see code](https://mxr.mozilla.org/mozilla-central/source/dom/apps/src/AppsUtils.jsm#427)).  

*[BUG 846269](https://bugzilla.mozilla.org/show_bug.cgi?id=846269)*: Because FxOS currently takes the first subtag in a language range, this will exclude language ranges initially composed of three or more subtags (e.g., zh-Hans-XQ). This means that if the user's regional preference is expressed as "zh-Hans-XQ", the following will not match any localized content (when zh-Hans would have been a reasonable match):

```JSON
{
  "name": "unknown-locale name",
  "developer": {"name": "unknown-locale author"},
  "locales": {
    "zh-Hans": {
      "name": "zh-Hans name"
    }
  },
  "default_locale": "unknown-locale"
}
```

## CASE 9 - Granularity
Use case: The author wishes for the name of the application to be localized for a particular locale. However, the author does not want to duplicate the developer information.  

```JSON
{
  "name": "您好!颜色",
  "locales": {
    "en-US": {
      "name": "Hi! Color"
    },
    "en-AU": {
      "name": "G'Day! Colour"
    },
    "en": {
      "developer": {
        "name": "en developer"
      }
    }
  },
  "developer": {
    "name": "中国开发者"
  },
  "default_locale": "zh-Hans"
}
```

*FxOS:* The user agent selects "Hi! Color" as the name, but then selects "中国开发者" as the developer. Expected ```"name": "en developer"``` to be selected. Filed [BUG 846432](https://bugzilla.mozilla.org/show_bug.cgi?id=846432).

*Proposal:* To address the above, the user agent should arrange the user's preferred locales and decompose them in order (removing any duplicates). So, if the user has "en-US, en-AU, jp" as her preferred language settings, those would decompose to "en-US, en-AU, en, jp".   


# Chrome i18n Model

Chrome's i18n model [2] for localisation of manifest data differs quite significantly from the Mozilla one and the W3C widgets one [3]. In particularly, it's vastly more complicated in that it follows a more traditional software 18n model - some aspects are tightly bound to the Chrome apps store for some fields.

Instead of allowing manifest content to be localized within the application manifest itself, all localized content is put into "messages.json" files in a special "_locales/language-TAG" directory (where language-TAG is, for example, en-US). The developer is then required to "key" all localised data and then Chrome reconstructs the localised content by matching keys to a magic string (__MSG_*__). For example:

manifest.json
==========
"name": "__MSG_application_title__", "description": "__MSG_application_description__"


_locales/de/messages.json
===============
"application_title": { "message": "Eine lokalisierte gehostete Beispielanwendung" }


Then the developer declares what the "default_locale" is, which works as a catch-all for when the user's locale does not match the locale of the application.

Chrome then provides an API to access localised strings from within the application itself. Having the API is not a bad thing (as it takes away some of the burden of having to read files from within a package through either XHR or a file reader API), but it does lock the developer into using a particular i18n model.

Other none standard features include using custom language tags [2]. These language tags are non-standard in that they don't conform to BCP 47. As such, some region, language, script combinations cannot, theoretically speaking, be adequately expressed. If this affects actual populations of people, I am unsure of.


# W3C Packaged Web Apps (widgets) 18n Model.   
The model is already fully described [3] - with plenty of examples. It supports both manifest level  localizations as all directory-based localisation in a manner similar to Google's.  

http://www.w3.org/TR/widgets/#internationalization-and-localization

[1] https://gist.github.com/marcoscaceres/5055717
[2] https://developers.google.com/chrome/web-store/docs/i18n
[3] http://www.w3.org/TR/widgets/#internationalization-and-localization
--  
Marcos Caceres
http://datadriven.com.au

Received on Monday, 11 March 2013 17:13:11 UTC