Re: [widgets] i18n

On Tue, May 20, 2008 at 3:50 AM, Phillips, Addison <addison@amazon.com> wrote:
> (personal comments follow)
>
> Section: http://dev.w3.org/2006/waf/widgets/#localization
>
> 1. The "system locale" example is a little odd looking. Most locale systems are of the language-then-region variety. You should probably replace "Australia, Australian English" with something like "English, Australia").

Done.... but might ditch "system local" in later drafts as it is not
used anywhere else in the document.

>Perhaps provide a reference to the jargon word "locale". Two good references are:
>
>  http://www.w3.org/TR/2004/NOTE-ws-i18n-scenarios-20040730/#IDARXSO
>  http://www.unicode.org/reports/tr35/#Locale

I've added the following note directly underneath the system locale
definition (with the appropriate links):

Note: see also the Web Services Internationalization Usage Scenarios
and the Unicode Locale Data Markup Language for an informative
discussion on term locale.

> 2. The description of "localized resource" seems a bit too complicated. It says:
<snip>
> I would tend to say:
> --
> A localized resource is any resource that has been placed inside a localized folder. It's contents might (or might not) be translated, localized, or altered for the given context. For example, a localized folder "ja" (Japanese) might contain an HTML document translated into Japanese. A widget resource MAY contain zero or more localized resources. All resources, except a digital signature, MAY be localized.
> --

I've replaced the current text with your text.

>
> Note that the original phrase "internationalized context" should be avoided in favor of terms like "localized" or possibly "translated". Internationalization is the "how this works" part of this feature.

Understood. I've removed "internationalized context" from the abstract
also (which was the only other occurrence).

> 3. One of the guidelines says:
>
> --
> At runtime, the widget user agent will set the (HTML or XML) base of the start file to the localized folder (even if the start file does not reside inside the localized folder!).
> --
>
> This implies that ALL resources that are localized must appear in the localized folder. That is, if you have files 'a', 'b', and 'c' required by the widget, all of them must appear in the localized folder.

It depends. My understanding is that when the base is set, all
relative URIs are resolved from the (HTML/XML) base of the start file.
If an author wants to use a shared resource (eg. some javascript),
then they can address that resource with an absolute URI. Eg. if this
is /es/gatos.html:

<!-- base is automatically set to 'widget://uuid/es/' -->
<html>
<script src="/scripts/engine.js"> <-- resolves to
widget://[uuid]/scripts/engine.js -->
<img src="gato.gif" alt="resolved to widget://[uuid]/es/gato.gif" />
<!-- authors can even do this, if they really want -->
<img src="../shared/images/logo.gif" />
<-- resolves to widget://[uuid]/shared/images/logo.gif -->

> 4. In the example, the file 'cats.html' has its name localized to 'gatos.html' in the Spanish version (or one might say that 'gatos.html' has its name localized to 'cats.html', I suppose). The root file is called "cats.html" and is described as being in Korean. Really the files should all have the same name. Otherwise, a widget component that refers to 'cats.html' will not load 'gatos.html'---it has no way of knowing what the localized resource is called.

Which start file is loaded is controlled by the config.xml file (or by
the widget engine matching one of the Default Start Files). Because
the i18n model allows every resource to be localized, a widget can
have multiple configuration files, each pointing to a different start
file. Once the base URI has been established, the widget user agent
looks in the localized folder first to see if it can find a
"config.xml" file. If that fails, it drops to the root and searches
for a "config.xml".

For example, for Spanish the the base URI would become "/es/". This
would cause the engine to load "/es/config.xml", which contains a
<content> element pointing to "gatos.html":

<widget ...>
   <content src="gatos.html" />
</widget>

The base URI of the config.xml file is also set to the localized
folder. The effect is that you can have many start files and one (or
more) configuration documents:

/config.xml
/cats.html
/en/cats.html
/es/cats.html
/jp/cats.html

Where config.xml is:

 <widget ...>
    <!-- resolved to localized folder, otherwise resolve to the widget
root (/cats.html)-->
    <content src="cats.html">
</widget>

I'll make this behavior more clear in the spec.

> Section: http://dev.w3.org/2006/waf/widgets/#step-6
>
> 1. In general, I find this section confused about how lookup is intended to work (lookup is modeled on how most resource systems in language such as C, Java, PHP, perl, C#, etc.) work: you start with the user's locale and fall back until you find a match. The elaboration 4647 is that we allow more than one "locale" in a language priority list and we allow for more complex defaulting behavior.

Hmm. Yeah, I got that... but I obviously did not articulate it
correctly:( The lang-priority-list is supposed to hold multiple ranges
in accordance with 4647.

> 2. I note that you've provided for the use of extended language ranges. RFC 4647 says quite a bit about extended language ranges, ending with this requirement, which is not addressed:
>
>     Applications, protocols, or specifications that accept extended
>   language ranges MUST define which item is returned when more than one
>   item matches the extended language range.
>
> Personally, I would strongly recommend that you NOT allow extended language ranges. Web browsers use basic language ranges today and there is no reason I can see for widgets to use them.

The spec uses basic language ranges. I've searched the document but
could not find any reference to extended language ranges.

> 3. You provide an example with a trailing "*" in a range (en-*). Note that, other than in the first position, if you accept extended language ranges the * outside the first position has no meaning and should just be ignored. Note also that the * matches more than just region. For example, it would match "en-Brai" (that's the Braille script subtag).

The purpose of the example was to highlight to implementers that such
ranges can occur and they should be prepared to treat them in
accordance with 4647.

> 4. "lang-priority-list" is introduced here without any reference or definition.

My bad. Forgot to link to the Configuration Defaults table (fixed and
included the text from your next email).

> 5. The phrase "This step makes use of lookup filtering mechanism..." is incorrect. "Filtering" is the *other* type of matching scheme. The proper terminology should be "This step makes use of the lookup matching scheme..."

Fixed.

> 6. This description is unclear:
>
> --
> The aim is to match, in order, one of the subtags in the lang-priority-list to a localized folder in the widget package.
> --
>
> I would tend to say instead:
>
> --
> The aim is to match, in order, one of the language ranges in the lang-priority-list to a localized folder in the widget package.
> --

Replaced the text with your suggestion.

> 7. Step 1 includes the keyword "default". You should also add the special tag "i-default" to this list.

Done.

> 8. Lookup provides clear guidance on a range of '*' in the middle of a list (i.e. there are ranges that appear after it), when it says:
>
>  If the language range "*" is followed by other
>   language ranges, it is skipped.
>
> But your text says:
>
>  If this subtag is a single * and this is not the
>  first subtag in the lang-priority-list, then
>  terminate this algorithm and attempt to locate
>  the configuration document.

Fixed. However (like you state below), this whole section should
probably just refer to 4647.

> 9. Step 2 says "subtag". I think you should either called it "range" or call it "tag" (range is preferred here). I think that you would be better of to eliminate most of the instructions here and just reference the algorithm in RFC 4647.

True. Changed it to range regardless.

> 10. There is a normative bit that says:
>
> --
> During lookup, a widget user agent is not required  to alphabetize, or otherwise arrange in order, the file name fields of file entries in the widget archive.
> --
>
> This can be safely removed, as lookup is not affected by alphabetization.

removed.

> 12. I think David already pointed out the problems with the "ch" example.

Yep, fixed.

> 13. Note that "language" is misspelled in the first paragraph.

Fixed.

> --
>
> If you were to convert from extended ranges and follow the standard algorithm, then your section would be a lot simpler.
>
> Would you prefer if I were to suggest a complete text for this section?

Sure! if you can spare the time, that would be very helpful.

Kind regards,
Marcos
-- 
Marcos Caceres
http://datadriven.com.au

Received on Tuesday, 20 May 2008 06:28:08 UTC