Re: Widgets 1.0 Packaging and Configuration: I18N comments...

Dear i18n WG,
Inline comments below and just two quick final questions...

On Thu, Jan 29, 2009 at 10:56 PM, Phillips, Addison <addison@amazon.com> wrote:
> Dear Webapps WG,
>
> The Internationalization Core WG has reviewed the following document: http://www.w3.org/TR/2008/WD-widgets-20081222/
>
> Here are our comments:
>
> 1. In section 7, starting in 7.3 and encompassing each of the text bearing elements that follow, "xml:lang" is defined as an attribute (good!), but the definition refers to "basic language range". I don't believe this is what is intended. The value of xml:lang is a language tag. Ranges are used for selecting which tagged item to display. For example, an item tagged as <name xml:lang="pt"/> might be selected for display in cases where the default locale is "pt-BR". In that example, "pt-BR" is the range and "pt" is the value (tag) matching it.
>

Woops! Fixed.

> 2. Section 7.4 (Widget) The various language bearing elements such as <name>, <description>, etc. are of the zero-or-one type. However, it is typically better to allow any number of these elements to occur, provided that none share the same xml:lang. This allows for localization (which is part of the point in allowing xml:lang on the element).
>

We followed "Best Practice 12: Working with multilingual documents" in
Best Practices for XML Internationalization [1], where it says we
should have different documents for this kind of localization (to
achieve what you propose, we allow multiple configuration documents in
a widget).

Does i18n core recommend we drop allowing multiple configuration
documents and use xml:lang in multiple elements in the manner
suggested above? We have built a lot infrastructure around the current
model in the spec, so if it's all the same we would prefer to keep it.

> 3. Section 7.11 (content element). The charset attribute "assumes" UTF-8 if charset is not present. Note that if the encoding isn't UTF-8, this can almost always be detected reliably and an error can be generated (or some other fallback assumed). Probably the best pattern would be:
>
>  - if charset is present, use that encoding
>  - if charset is absent, check if it is UTF-8
>  - if not UTF-8, assume Cp437 (or ISO 8859-1 if that's more appropriate)

Done. The section now reads:

[[
If the charset attribute is used and the widget start file is not
null, then check if the encoding represented by the value of the
charset attribute is supported by the user agent. If the encoding is
supported, then let the start file encoding be the value of the
charset attribute. If the encoding is invalid or unsupported by the
user agent then a user agent may use the default encoding (ISO-8859-1)
or treat the widget as an invalid widget. Otherwise, the attribute is
in error and must be ignored.

If the charset attribute was not used and the widget start file is not
null, check if the first three bytes of the widget start file match
the sequence EF BB BF. If there is a match, then let start file
encoding be the value UTF-8.
]]

Should we recommend that values for the charset attributes come from
the IANA charset registry?
http://www.iana.org/assignments/character-sets

If so, is there anything we should make implementers and authors aware of?

> 4. Section 7.15 (ITS tags). Thank you for including ITS support and support for Bidi in particular.

Thanks to i18n WG, particularly Felix, for helping us integrate ITS
into the spec.

Thanks again for taking the time to review our spec! It has been a
huge help and it is very much appreciated. We are currently working on
revising our i18n model a bit, so we might again need your feedback in
the Second Last call.

Kind regards,
Marcos
[1] http://www.w3.org/TR/xml-i18n-bp/#DevMLDoc

--
Marcos Caceres
http://datadriven.com.au

Received on Sunday, 22 February 2009 21:29:11 UTC