RE: [widgets] i18n from Phillips, Addison on 2008-05-19 (public-i18n-core@w3.org from April to June 2008)

From: Phillips, Addison <addison@amazon.com>
Date: Mon, 19 May 2008 11:05:47 -0700
To: "Phillips, Addison" <addison@amazon.com>, Marcos Caceres <marcosscaceres@gmail.com>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>, "WAF WG (public)" <public-appformats@w3.org>
Message-ID: <4D25F22093241741BC1D0EEBC2DBB1DA013A219169@EX-SEA5-D.ant.amazon.com>
A bit to add: I note that you do have a definition of lang-priority-list, which I missed the first time, which says:

--
The user's language preferences as a basic language range as defined in [BCP47] where the first item represents the user's most preferred language range [BCP47] (eg. "en, fr, es-*", where English is preferred over French, and French is preferred over Spanish of any region, and Spanish of any region is preferred over the default.). Being an basic language range, the widget user agent is not required to check the items of the lang-priority-list for well-formness or validate the the items against the IANA Language Subtag Registry. Widget user agents may, nevertheless, attempt to populate the lang-priority-list with valid language subtags that are well-formed (which can help with canonicalization and matching obselete and grandfathered subtags to ).
--

This isn't quite right. I would eliminate the reference to the range "es-*". The trailing "-*" is superfluous. The phrase about well-formedness is also not quite correct. What's missing is that language *ranges* do not use the same grammar as language *tags*--they use the simpler syntax used by RFC 3066 and RFC 2616.

Note also that the "MAY" in the last sentence really should be a "SHOULD".

What you should probably say is:

--
The user's language preferences as a comma-separated list of basic language ranges as defined in [BCP47]. The first item represents the user's most preferred language range [BCP47], followed by the next most preferred language and so forth. For example, "en, fr, es", where English is preferred over French, and French is preferred over Spanish, and Spanish is preferred over the default. Each language range MUST match the 'language-range' production in RFC 4647 (part of BCP 47) [improperly formed ranges are ignored]. Widget user-agents are not required to validate the contents of a range against the IANA Language Subtag Registry. Widget user agents SHOULD, nevertheless, attempt to populate the lang-priority-list with valid language subtags that are well-formed (which can help with canonicalization and matching obselete and grandfathered subtags to ).
--

Best Regards,

Addison

Addison Phillips
Globalization Architect -- Lab126

Internationalization is not a feature.
It is an architecture.


> -----Original Message-----
> From: public-i18n-core-request@w3.org [mailto:public-i18n-core-
> request@w3.org] On Behalf Of Phillips, Addison
> Sent: Monday, May 19, 2008 10:50 AM
> To: Marcos Caceres; public-i18n-core@w3.org; WAF WG (public)
> Subject: RE: [widgets] i18n
>
> (personal comments follow)
>
> Section: http://dev.w3.org/2006/waf/widgets/#localization

>
> 1. The "system locale" example is a little odd looking. Most locale
> systems are of the language-then-region variety. You should probably
> replace "Australia, Australian English" with something like "English,
> Australia"). Perhaps provide a reference to the jargon word "locale".
> Two good references are:
>
>   http://www.w3.org/TR/2004/NOTE-ws-i18n-scenarios-20040730/#IDARXSO

>   http://www.unicode.org/reports/tr35/#Locale

>
> 2. The description of "localized resource" seems a bit too complicated.
> It says:
>
> --
> A localized resource is a resource that may have had some aspect
> localized for use in an internationalized context (eg. a HTML document
> written in Japanese) and has been placed inside a localized folder. A
> widget resource may contain zero or more localized resources. All
> resources, except a digital signature, can be localized.
> --
>
> I would tend to say:
>
> --
> A localized resource is any resource that has been placed inside a
> localized folder. It's contents might (or might not) be translated,
> localized, or altered for the given context. For example, a localized
> folder "ja" (Japanese) might contain an HTML document translated into
> Japanese. A widget resource MAY contain zero or more localized
> resources. All resources, except a digital signature, MAY be localized.
> --
>
> Note that the original phrase "internationalized context" should be
> avoided in favor of terms like "localized" or possibly "translated".
> Internationalization is the "how this works" part of this feature.
>
> 3. One of the guidelines says:
>
> --
> At runtime, the widget user agent will set the (HTML or XML) base of
> the start file to the localized folder (even if the start file does not
> reside inside the localized folder!).
> --
>
> This implies that ALL resources that are localized must appear in the
> localized folder. That is, if you have files 'a', 'b', and 'c' required
> by the widget, all of them must appear in the localized folder.
>
> 4. In the example, the file 'cats.html' has its name localized to
> 'gatos.html' in the Spanish version (or one might say that 'gatos.html'
> has its name localized to 'cats.html', I suppose). The root file is
> called "cats.html" and is described as being in Korean. Really the
> files should all have the same name. Otherwise, a widget component that
> refers to 'cats.html' will not load 'gatos.html'---it has no way of
> knowing what the localized resource is called.
>
> Section: http://dev.w3.org/2006/waf/widgets/#step-6

>
> 1. In general, I find this section confused about how lookup is
> intended to work (lookup is modeled on how most resource systems in
> language such as C, Java, PHP, perl, C#, etc.) work: you start with the
> user's locale and fall back until you find a match. The elaboration
> 4647 is that we allow more than one "locale" in a language priority
> list and we allow for more complex defaulting behavior.
>
> 2. I note that you've provided for the use of extended language ranges.
> RFC 4647 says quite a bit about extended language ranges, ending with
> this requirement, which is not addressed:
>
>      Applications, protocols, or specifications that accept extended
>    language ranges MUST define which item is returned when more than
> one
>    item matches the extended language range.
>
> Personally, I would strongly recommend that you NOT allow extended
> language ranges. Web browsers use basic language ranges today and there
> is no reason I can see for widgets to use them.
>
> 3. You provide an example with a trailing "*" in a range (en-*). Note
> that, other than in the first position, if you accept extended language
> ranges the * outside the first position has no meaning and should just
> be ignored. Note also that the * matches more than just region. For
> example, it would match "en-Brai" (that's the Braille script subtag).
>
> 4. "lang-priority-list" is introduced here without any reference or
> definition.
>
> 5. The phrase "This step makes use of lookup filetering mechanism..."
> is incorrect. "Filtering" is the *other* type of matching scheme. The
> proper terminology should be "This step makes use of the lookup
> matching scheme..."
>
> 6. This description is unclear:
>
> --
> The aim is to match, in order, one of the subtags in the lang-priority-
> list to a localized folder in the widget package.
> --
>
> I would tend to say instead:
>
> --
> The aim is to match, in order, one of the language ranges in the lang-
> priority-list to a localized folder in the widget package.
> --
>
> 7. Step 1 includes the keyword "default". You should also add the
> special tag "i-default" to this list.
>
> 8. Lookup provides clear guidance on a range of '*' in the middle of a
> list (i.e. there are ranges that appear after it), when it says:
>
>   If the language range "*" is followed by other
>    language ranges, it is skipped.
>
> But your text says:
>
>   If this subtag is a single * and this is not the
>   first subtag in the lang-priority-list, then
>   terminate this algorithm and attempt to locate
>   the configuration document.
>
> 9. Step 2 says "subtag". I think you should either called it "range" or
> call it "tag" (range is preferred here). I think that you would be
> better of to eliminate most of the instructions here and just reference
> the algorithm in RFC 4647.
>
> 10. There is a normative bit that says:
>
> --
> During lookup, a widget user agent is not required  to alphabetize, or
> otherwise arrange in order, the file name fields of file entries in the
> widget archive.
> --
>
> This can be safely removed, as lookup is not affected by
> alphabetization.
>
> 12. I think David already pointed out the problems with the "ch"
> example.
>
> 13. Note that "language" is misspelled in the first paragraph.
>
> --
>
> If you were to convert from extended ranges and follow the standard
> algorithm, then your section would be a lot simpler.
>
> Would you prefer if I were to suggest a complete text for this section?
>
> Best Regards,
>
> Addison
>
> Addison Phillips
> Globalization Architect -- Lab126
>
> Internationalization is not a feature.
> It is an architecture.
>
> > -----Original Message-----
> > From: Marcos Caceres [mailto:marcosscaceres@gmail.com]
> > Sent: Monday, May 19, 2008 12:10 AM
> > To: Phillips, Addison; public-i18n-core@w3.org; WAF WG (public)
> > Subject: Re: [widgets] i18n
> >
> > Hi Addison, All,
> > I've attempted to make the widget spec conformant with BCP437. In
> > particular, I've deferred searching for localized folders to RFC4647
> > using "lookup" against a language priority list composed of a basic
> > language range. WAF would be grateful if (i18n) people could take a
> > look at the current draft in the spec and make sure that it is ok (so
> > we can close issue 23 [1]). The i18n text is fairly short (only about
> > 1 page), so it should not take much time to read, and, hopefully,
> > respond to.
> >
> > In the widget spec, the localization definitions and author
> > requirements are described in section:
> > http://dev.w3.org/2006/waf/widgets/#localization

> >
> > The i18n processing model (which makes use of RFC4647) is described
> in
> > section:
> > http://dev.w3.org/2006/waf/widgets/#step-6

> >
> > Hope it all makes sense.
> >
> > Looking forward to hearing your feedback,
> > Marcos
> >
> > [1] http://www.w3.org/2005/06/tracker/waf/issues/23
> >
> > --
> > Marcos Caceres
> > http://datadriven.com.au
Received on Monday, 19 May 2008 18:06:31 UTC