W3C home > Mailing lists > Public > public-webapps@w3.org > July to September 2008

FW: Widgets i18n feedback

From: Phillips, Addison <addison@amazon.com>
Date: Thu, 31 Jul 2008 14:21:38 -0700
To: "public-webapps@w3.org" <public-webapps@w3.org>, "WAF WG (public)" <public-appformats@w3.org>
CC: "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Message-ID: <4D25F22093241741BC1D0EEBC2DBB1DA014A3C1A62@EX-SEA5-D.ant.amazon.com>
Hi,

This note is on behalf of the I18N Core WG. The comments (below) that I sent to members of your WG previously have been endorsed by the I18N Core WG [1], with the following additional comments:

1. In my comment #3 below, we would strongly prefer the second suggested paragraph (MUST rather than SHOULD).

2. In my comment #15 below, we feel that it would be best if the charset parameter were required (MUST) and think that it should at least be strongly recommended (via the SHOULD keyword).

Finally, (this comment is personal and not yet endorsed by I18N-WG), I note that the text in [2] still has a few flaws (this is "Step 6"), as it is confusing and doesn't really describe the lookup algorithm cleanly. There first two paragraphs are fine, but the remainder I would suggest changing to read (notes on changes follow //):

--
The algorithm to determine the base URI and widget locale are as follows:

   1. If the lang-priority-list is empty, or null, or i-default, or contains a single *, or is a sequence of items that only contain the * character, then terminate this algorithm and attempt to locate the configuration document. 
// deleted the 'default' keyword
   2. For each range in the lang-priority-list:
         a. If this range is a single *, then terminate this algorithm and attempt to locate the configuration document. 
// dropped "and this is the first subtag in the l-p-l"
         b. Else if this range begins with the subtag '*', then skip this range and, skipping all the steps below, repeat this step 2. 
         c. Else if this range contains a subtag "*", remove the "*" and its preceding hyphen and continue. For example, "en-*-US" becomes "en-US".
// skip ranges that start with *, which are indeterminate
         d. Case-insensitively compare the range to each file name field for each file entry that is a folder in the widget resource.
            i. If they match:
               1. Let widget locale be the name of the folder that matched the current range in lowercase form.
               2. Let base URI be an absolute URI reference to this same folder.
               3. Terminate this algorithm and attempt to locate the configuration document
            ii. Else, remove the last subtag of the range and repeat this step 2d. For example, if the range is currently "en-US", make the range "en".
   3. If no match is made, attempt to locate the configuration document.

For example, if the range is "en-AU" and a match is made on file entry whose name is "en-au/config.xml", then base URI would be "widget://f81d4fae-7dec-11d0-a765-00a0c91e6bf6/en-au", and the widget locale would be "en-au".

For example, if the language priority list is "de-CH,fr-CH,it-CH" and folders included "de", "fr-FR" , and "IT", then the folder "de" would be matched, then base URI would be "widget://f81d4fae-7dec-11d0-a765-00a0c91e6bf6/de"and the widget locale would be "de".


--



[1] http://www.w3.org/2008/07/23-core-minutes.html#item04

[2] http://dev.w3.org/2006/waf/widgets/#widget12



Addison Phillips
Globalization Architect -- Lab126
Chair -- W3C Internationalization Core WG

Internationalization is not a feature.
It is an architecture.


-----Original Message-----
From: Phillips, Addison 
Sent: Thursday, July 10, 2008 2:02 PM
To: public-i18n-core-comments@w3.org
Cc: Marcos Caceres; Arthur Barstow; Felix Sasaki; mike@w3.org
Subject: RE: Widgets i18n feedback

Hi,

My personal comments on the Widgets specs located at [1] follow. I have copied a few members of the WebApps WG on this message so they can see progress; these comments will be a Topic in our next teleconference, for consideration as the Internationalization WG's official comments. 

Comments follow.

First, the requirements document:

1. In the introduction we see the (much better) comment:

--
As argued by the Widget Landscape  document, there is currently no formally standardized way to author, package, digitally sign and internationalize a widget resource for distribution and deployment on the Web.
--

The widget-land document focuses on localization of widgets, which is important. This document should provide a solution to the above and this should be referred to as "localization". Internationalization remains a problem because JavaScript has no locale facet. Internationalized formatting and processing is only possible as long as one is happy with the default system locale and formats on the host platform. Note: this is usually less of a problem for widgets than for AJAX style interactions in a Web page, since most widgets are perceived as applications running locally, whereas most Web properties manage the user language/locale themselves and need a locale in JS to "do the right thing".

2. [R5] "For example, addressing a resource via an IRI (e.g. new Image('../images/pane.png'))."

The use of IRI here is good to see.

3. [R6] Multilingual Resource Names. The current text is not really as strong as I'd like it to be (as you probably would suspect). I'm also not sure that it's quite correct. The current text reads:

--
A conforming specification SHOULD recommend a packaging format that is suitable for multilingual contexts, giving authors the ability to name files and directories using characters from the Unicode character repertoire; in such a case, a conforming specification SHOULD recommend the UTF-8 encoding.
--

Keeping the same "strength" of requirement, I would probably phrase this as:

--
A conforming specification SHOULD recommend a packaging format that allows for non-ASCII characters in file and directory names, allowing authors to create widgets suitable for various cultures and languages, as well as multilingual contexts. The packaging format MUST either provide for a declaration of the character encoding used or specify what it is. The UTF-8 character encoding SHOULD be either the default (if multiple encodings are allowed) or sole encoding used.
--

It would be far better to strongly require encoding support:

--
A conforming specification MUST declare the character encoding of file and directory names used in the packaging format and SHOULD use the UTF-8 character encoding. If the UTF-8 encoding is not used, the specific encoding MUST either be specified by the packaging format or be specifiable in the package itself. Since packaged widgets are widely distributed, variation in character encoding between different platforms or configurations may render a widget with non-ASCII resources inoperable or otherwise degrade the user experience unless a comment character encoding is used.
--

4. [R7] Internationalization guidelines. Really this should be "localization guidelines", since internationalization support is not really dealt with anywhere.

5. [R14] Authorship and Widget Metadata. Note that the author's name, email, and organization can all be non-ASCII values.

Next document: [1a]

6. "File and Folder names". This section contains the following text:

--
Author requirements: The zip relative path must be encoded as either [CP437] or [UTF-8]. Encoding the file name field using [UTF-8] is recommended. If the zip relative path is encoded using [UTF-8], then the general purpose bit 11 of the local file header must  be set to 1, otherwise it must be set to 0.
--

This is good (and addresses my comment 3).

7. (same section). The text says:

--
Author requirements: It is recommended that authors keep their path lengths below 255 characters. Having excessively long path names (eg. over 120 characters) can also result in interoperability issues on some operating systems.
--

I'm pretty sure you do not mean "characters" here. You probably mean "bytes", since that is the limitation on some operating environments. In a multibyte encoding, such as UTF-8, this means that there may be fewer than 255 characters in a 255 byte sequence (as few as 63 in the worst case). Please use the word "bytes" here if that is what is meant and include a note along the lines of: "Note that a character can require more than one byte to encode, making the length of the path in characters less than 255."

8. [non-i18n] I think you mean to s/260/255 in this note: 

Note for implementers: as this specification does not put a restriction on path length, implementers need to be prepared to deal with path lengths longer than 260 characters.

9. [non-i18n] The "valid zip relative path" ABNF is flawed. It has a production called 'ascii-chars', but both utf8-chars and cp437-chars use 'ascii-range' instead.

10. Section 6. The example has no language attributes, non-ASCII, or IRI-style values. Probably this example is fine, but it is always nice to see examples that use international capabilities.

11. "Attribute Values and Types". Put quotes around the example numbers to make clear where they are. Some cultures use comma as a decimal separator and it will be clearer what your examples are if you do this.

12. "URI attribute" You specify BOTH URI and IRI here. And you specify path as being strictly URI (3986). But elsewhere you consistently use the term IRI. Even more confusingly, the word "token" doesn't even appear in 3986 and appears only once (in another context) in 3987. I think you should specify the specific format, preferably IRI.

13. The "name" and "description" elements (6.5, 6.6). Here are human readable strings with no xml:lang. You should allow an xml:lang on these elements. Ideally, you should permit multiple descriptions (at least) in multiple languages. These elements are often displayed in widget management environments (before the widget is invoked or running).

14. The "author" element (and its attributes) (6.7) The uri and email attributes should both be of the IRI-flavor, although non-ASCII email names are currently controversial. 

15. The "content" element (6.11) The default 'type' is "text/html" with no charset. It would be better if the default included a charset. The 'src' attribute uses the word "URI", where you should say "IRI".

16. Section 7. "widget locale". This is specified as:

--
widget Locale
    The system locale as an RFC3066 language code (eg. en-us)
--

This should say "The system locale as a BCP 47 language tag (e.g. en-US or zh-Hant-TW)"

17. In this same section, "rules for getting text content": I notice that any bidi markup will be removed. 

18. General observation: there is no way to set base directionality (for bidi text) on any of the description/name/etc. elements or for the widget overall. This may make some bidi languages (Arabic, Hebrew, Urdu, etc.) work poorly in a widget.

===

That's it for now.


Best Regards,

Addison


[1] http://www.w3.org/TR/2008/WD-widgets-reqs-20080625/

    [a] http://www.w3.org/TR/widgets/



Addison Phillips
Globalization Architect -- Lab126

Internationalization is not a feature.
It is an architecture.

> -----Original Message-----
> From: Michael(tm) Smith [mailto:mike@w3.org]
> Sent: Wednesday, July 09, 2008 8:10 PM
> To: Phillips, Addison; Addison Phillips
> Cc: Marcos Caceres; Arthur Barstow; Felix Sasaki
> Subject: Re: Widgets i18n feedback
> 
> Addison,
> 
> We really need to get your input on this by the week of July 28 at
> the latest.
> 
>   --Mike
> 
> Marcos Caceres <marcosscaceres@gmail.com>, 2008-07-01 11:41 +1000:
> 
> > Hi Addison,
> > I'm just wondering if you could give us an ETA on the i18n input
> for the
> > Widget spec?
> > Kind regards,
> > Marcos
> 
> --
> Michael(tm) Smith
> http://people.w3.org/mike/

> http://sideshowbarker.net/

Received on Thursday, 31 July 2008 21:22:17 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:49:27 GMT