[widgets] i18n

In a recent email, Mike Smith let WAF know that i18n Core WG had some
concerns about the lack of i18n support in the widget specification
[1]. This email is an attempt to foster discussion about widgets and
i18n with the ultimate aim of reaching a resolution that closes this
issue [2].

As you may or may not know, the Widgets 1.0: Packaging and
Configuration specification [3] is composed of two parts. Firstly, we
define a Zip based packaging format where authors store resources
(images, html, js, etc) for their widgets. Secondly, we define a
simple XML-based configuration document format, which authors use to
record metadata and set various runtime configuration parameters.
Please see for details and examples [3].

As WAF understands it, there are essentially two issues that need to
be addressed for i18n in our widget spec: The first relates to XML
i18n best practices in the Configuration Document, and the second
relates to automatic i18n of Zip packages.

=I18n in the configuration document=

We have attempted to follow Best Practices for XML
Internationalization [4] by including support for xml:lang in our
configuration document format (and explicitly included the attribute
in our RelaxNG schema). However, contrary to what the guidelines
recommend, we have not included a <span>-like element because we don't
anticipate much use of such an element in our configuration document
format and because it complicates the processing model. However, we
don't limit vendors from including their own <span>-like elements in
their own namespace for the purpose of i18n:

For example:

<widgets xmlns="http://www.w3.org/ns/widgets"
xmlns:ex="http://widgextension.org/" xml:lang="en">
<description>Make Steve say <ex:span
xml:lang="en-au">crikey!</ex:span></description>
</widgets>

Our choice of excluding a <span>-like element in the configuration
document format is not set in stone and we still have the opportunity
to extend the processing model.

We would appreciate comments and expertise on how we can make the
configuration document better suited for i18n contexts.

=Widget resource (Zip) localization=
Although we have not yet formally specified anything about widget
resource internationalization in the spec, we have been debating about
it internally in the working group for over a year (some members have
opposed it, which is the main reason that it has not yet been worked
on).

If an internationalization model is to be included in the
specification, we would like to leverage the common practices of
relying on folder names using the iso 639-1 language and  3166 locale
pattern (eg. /en-us/) (or to explicitly rely on, at least, RFC3066).
The iso 639-3166 pattern is used across a variety of widget engines,
including Konfabulator, Dashboard, Google Desktop and Vista Sidebar
Gadgets as a means of aiding resource localization.

We are considering  two localization models, which are presented in
detail below. The first is based on reading string value pairs,
inspired by the models that Apple Dashboard and Konfabulator use. The
second is a bit more complicated as it allows for the localization of
all resources in a widget,  and is inspired by Window's Vista Sidebar.

For the sake of discussion, we present the models by way of what
authors need to do  (authoring requirements)  and what the widget
engine does for authors (Widget user agent processing). We currently
don't preference any one model over the other, and we are complete
open to ideas about other models we should consider. The aim is, as
always, to keep things as simple as possible for both authors and
implementers while retaining industry best practice.

MODEL 1 - Dashboard/Konfabulator-like model

Authoring requirements:
When an author creates a widget, they place a file called
"localized.strings" into directories that follow the the iso 639 and
3166 pattern (en-us). For example:

myWidget.zip
  	/en-gb/
		localized.strings
	/en/
		localized.strings
	/fr/
		localized.strings
	/config.xml
	/index.html

The localized.strings file contains name value pairs delimited by a
single "=" and terminated by a CR or LF, etc. For example,

	hello = howdy partner!
	good bye = see you later!
        local logo  = /en/images/logo.gif

Widget User Agent Processing:
The user agent gets the user's preferred system local as an iso 639
and 3166 pattern and searches for a folder that matches the system
locale. It first searches for the full iso 639 and 3166 pattern (eg.
"en-us") and then systematically reducing the search to the just the
language code ("en"). The search is done case insensitively. If a
match is found, then the system reads the contents of
'localized.strings' and parses it into a lookup table that is made
available to the instantiated widget at runtime.

The author is then able to access the localized content via the
following interface:

interface Widget {
     DOMString widget.getLocalizedString(in DOMString)
}

For example:

<script>
onload = function(){
    if(widget.locale != "jp"){//assume jp is default
        $("h1").innerHTML = widget.getLocalizedString("hello");
    }
}
</script>

Pros:
  simplicity: only string value pairs need to be defined by author
  probably addresses most use cases
  easy to process localized.strings
  easy to use API

Cons:
  the config.xml document is not localized
  requires that whole application be written with script hooks
wherever text needs to be localized
  might not work well for when layout needs to be radically different
  relies on a single i18n neutral HTML layout and structure (index.html)

MODEL 2: Windows Sidebar-like model

Authoring requirements:
When an author creates a widget, they can place a localized version of
its content into directories that follow the the iso 639-1 language
and  3166 locale pattern. For each localized version, the author
creates a folder and includes the appropriate localized resources.

myWidget.zip
	/en-gb/
		config.xml
		index_gb.html
               /images/flag.png
	/en-au/
		index.html
               images/flag.png
	/fr/
		config.xml
		index.svg
               images/flag.png
	/images/
		logo.png
               flag.png
	/scripts/
		engine.js
	/config.xml
	/index.html

Widget user agent processing:
Like with model 1, the user agent gets the user's preferred system
local as an iso 639 and 3166 pattern and searches for a folder that
matches the system locale. It first searches for the full iso 639 and
3166 pattern (eg. "en-us") and then systematically reducing the search
to the just the language code ("en"). The search is done case
insensitively.

If a match is found, and the folder contains a config.xml file, then
the system parses the content of config.xml. If no match is made, or
the matched folder does not contain a config.xml, the user agent uses
the config.xml file at the root. However, if the matching folder
contains an index.html, but no config.xml, and the config.xml at the
root says to use "index.html" as the start file, then, for instance,
"/en-au/index.html" will be used.

Once the widget is instantiated, authors can get the localized content
they need easily (including using shared resources). For example, say
/en-au/index.html was loaded:

<html>
{use absolute script, which is shared amongst all widgets}
<script sr="/scripts/engine.js">
{use relative/localized flag image}
<body style="background-image: images/flag.png; ...">
{use shared logo image}
<h1><img src="/images/logo.png" /> </h1>
{localized content}
<h2>G'day Mate</h2>

Pros:
   both config and index.html can be localized (localize whole widgets
in one package)
   individual resources can be localized more effectively
   no API for localized strings
Cons:
  more difficult to process than model 1
  behavior can be be confusing unless well understood

Hopefully that gives you an idea of what direction we are currently
heading. Again, we would appreciate comments and expertise that you
can share on creating an effective internationalization model for
widget packages.

Kind regards,
Marcos

[1] http://www.w3.org/2008/02/20-core-minutes.html#item05
[2] http://www.w3.org/2005/06/tracker/waf/issues/23
[3] http://www.w3.org/TR/widgets/
[4] http://www.w3.org/TR/xml-i18n-bp/

-- 
Marcos Caceres
http://datadriven.com.au

Received on Tuesday, 15 April 2008 04:13:37 UTC