Re: FW: Widgets i18n feedback

Hi Addison, i18n WG,
Firstly, thank you again for taking the time to provide feedback. Your
comments has substantially improved both the Requirements and the
Packaging specification. Please see below details on how I executed
the changes you recommended. For record keeping purposed as required
by the disposition of comments, can you, or a representative from the
i18n WG, please acknowledge that the WG is satisfied with the changes
I have made to the Requirements document.

On Fri, Aug 1, 2008 at 7:21 AM, Phillips, Addison <addison@amazon.com> wrote:
> Hi,
>
> This note is on behalf of the I18N Core WG. The comments (below) that I sent to members of your WG previously have been endorsed by the I18N Core WG [1], with the following additional comments:
>
> 1. In my comment #3 below, we would strongly prefer the second suggested paragraph (MUST rather than SHOULD).

Done, but with some modifications. Please see the inline comments
below and feel free to make additional comments.

> 2. In my comment #15 below, we feel that it would be best if the charset parameter were required (MUST) and think that it should at least be strongly recommended (via the SHOULD keyword).

Added a "charset" attribute with UTF-8 as the default charset. See below.

> Finally, (this comment is personal and not yet endorsed by I18N-WG), I note that the text in [2] still has a few flaws (this is "Step 6"), as it is confusing and doesn't really describe the lookup algorithm cleanly. There first two paragraphs are fine, but the remainder I would suggest changing to read (notes on changes follow //):
>
> --
> The algorithm to determine the base URI and widget locale are as follows:
>
>   1. If the lang-priority-list is empty, or null, or i-default, or contains a single *, or is a sequence of items that only contain the * character, then terminate this algorithm and attempt to locate the configuration document.
> // deleted the 'default' keyword
>   2. For each range in the lang-priority-list:
>         a. If this range is a single *, then terminate this algorithm and attempt to locate the configuration document.
> // dropped "and this is the first subtag in the l-p-l"
>         b. Else if this range begins with the subtag '*', then skip this range and, skipping all the steps below, repeat this step 2.
>         c. Else if this range contains a subtag "*", remove the "*" and its preceding hyphen and continue. For example, "en-*-US" becomes "en-US".
> // skip ranges that start with *, which are indeterminate
>         d. Case-insensitively compare the range to each file name field for each file entry that is a folder in the widget resource.
>            i. If they match:
>               1. Let widget locale be the name of the folder that matched the current range in lowercase form.
>               2. Let base URI be an absolute URI reference to this same folder.
>               3. Terminate this algorithm and attempt to locate the configuration document
>            ii. Else, remove the last subtag of the range and repeat this step 2d. For example, if the range is currently "en-US", make the range "en".
>   3. If no match is made, attempt to locate the configuration document.
>
> For example, if the range is "en-AU" and a match is made on file entry whose name is "en-au/config.xml", then base URI would be "widget://f81d4fae-7dec-11d0-a765-00a0c91e6bf6/en-au", and the widget locale would be "en-au".
>
> For example, if the language priority list is "de-CH,fr-CH,it-CH" and folders included "de", "fr-FR" , and "IT", then the folder "de" would be matched, then base URI would be "widget://f81d4fae-7dec-11d0-a765-00a0c91e6bf6/de"and the widget locale would be "de".
>
>
> --

Ok, I used your text  but made some tiny stylistic changes (but
essentially they are identical). Thank you again for helping us with
this section, it's solved a lot of problems.

<snip>
>
> -----Original Message-----
> From: Phillips, Addison
> Sent: Thursday, July 10, 2008 2:02 PM
> To: public-i18n-core-comments@w3.org
> Cc: Marcos Caceres; Arthur Barstow; Felix Sasaki; mike@w3.org
> Subject: RE: Widgets i18n feedback
>
> Hi,
>
> My personal comments on the Widgets specs located at [1] follow. I have copied a few members of the WebApps WG on this message so they can see progress; these comments will be a Topic in our next teleconference, for consideration as the Internationalization WG's official comments.
>
> Comments follow.
>
> First, the requirements document:
>
> 1. In the introduction we see the (much better) comment:
>
> --
> As argued by the Widget Landscape  document, there is currently no formally standardized way to author, package, digitally sign and internationalize a widget resource for distribution and deployment on the Web.
> --
>
> The widget-land document focuses on localization of widgets, which is important. This document should provide a solution to the above and this should be referred to as "localization". Internationalization remains a problem because JavaScript has no locale facet. Internationalized formatting and processing is only possible as long as one is happy with the default system locale and formats on the host platform. Note: this is usually less of a problem for widgets than for AJAX style interactions in a Web page, since most widgets are perceived as applications running locally, whereas most Web properties manage the user language/locale themselves and need a locale in JS to "do the right thing".
>

Agreed. I'll make sure I expand the Landscape document to highlight
the point your make above.

> 2. [R5] "For example, addressing a resource via an IRI (e.g. new Image('../images/pane.png'))."
>
> The use of IRI here is good to see.

Thanks.

> 3. [R6] Multilingual Resource Names. The current text is not really as strong as I'd like it to be (as you probably would suspect). I'm also not sure that it's quite correct. The current text reads:
>
> --
> A conforming specification SHOULD recommend a packaging format that is suitable for multilingual contexts, giving authors the ability to name files and directories using characters from the Unicode character repertoire; in such a case, a conforming specification SHOULD recommend the UTF-8 encoding.
> --
>
> Keeping the same "strength" of requirement, I would probably phrase this as:
>
> --
> A conforming specification SHOULD recommend a packaging format that allows for non-ASCII characters in file and directory names, allowing authors to create widgets suitable for various cultures and languages, as well as multilingual contexts. The packaging format MUST either provide for a declaration of the character encoding used or specify what it is. The UTF-8 character encoding SHOULD be either the default (if multiple encodings are allowed) or sole encoding used.
> --
>
> It would be far better to strongly require encoding support:
>
> --
> A conforming specification MUST declare the character encoding of file and directory names used in the packaging format and SHOULD use the UTF-8 character encoding. If the UTF-8 encoding is not used, the specific encoding MUST either be specified by the packaging format or be specifiable in the package itself. Since packaged widgets are widely distributed, variation in character encoding between different platforms or configurations may render a widget with non-ASCII resources inoperable or otherwise degrade the user experience unless a comment character encoding is used.
> --

Ok, I added the second paragraph. I am a bit reluctant to use the
third paragraph as there is a general lack of support for the unicode
features provided by Zip 6.3 in operating systems and zipping tools. I
don't see this changing any time soon. What I did instead is take the
second part of the suggested third paragraph and added it to the
Rationale. The requirement now reads:

----
R7. Multilingual Resource Names

A conforming specification MUST recommend a packaging format that
allows for non-ASCII characters in file and directory names, allowing
authors to create widgets suitable for various cultures and languages,
as well as multilingual contexts. The packaging format MUST either
provide for a declaration of the character encoding used or specify
what it is. The UTF-8 character encoding SHOULD be either the default
(if multiple encodings are allowed) or sole encoding used.

Rationale:
To allow authors to create files and folders using characters beyond
the ASCII character repertoire. Since packaged widgets are widely
distributed, variation in character encoding between different
platforms or configurations may render a widget with non-ASCII
resources inoperable or otherwise degrade the user experience unless a
comment character encoding is used.
----

The reality is that we are writing this requirement around Zip (or,
more precisely, the Zip 6.3 spec). I do, however, think that Zip can
meet these requirements... particularly anything after Zip v6.3, which
supports UTF-8 and the declaration of other encodings (though PKWare's
solution of inserting the name of the encoding into the "extended
language encoding extra field" might cause some problems because of
lack of standardization in the names of encodings).

> 4. [R7] Internationalization guidelines. Really this should be "localization guidelines", since internationalization support is not really dealt with anywhere.

Fixed.

> 5. [R14] Authorship and Widget Metadata. Note that the author's name, email, and organization can all be non-ASCII values.

To address this, I've added the following sentence to R13. : "A
conforming specification MUST recommend that configuration documents
be encoded in UTF-8."

> Next document: [1a]
>
> 6. "File and Folder names". This section contains the following text:
>
> --
> Author requirements: The zip relative path must be encoded as either [CP437] or [UTF-8]. Encoding the file name field using [UTF-8] is recommended. If the zip relative path is encoded using [UTF-8], then the general purpose bit 11 of the local file header must  be set to 1, otherwise it must be set to 0.
> --
>
> This is good (and addresses my comment 3).
>

Great.

> 7. (same section). The text says:
>
> --
> Author requirements: It is recommended that authors keep their path lengths below 255 characters. Having excessively long path names (eg. over 120 characters) can also result in interoperability issues on some operating systems.
> --
>
> I'm pretty sure you do not mean "characters" here. You probably mean "bytes", since that is the limitation on some operating environments. In a multibyte encoding, such as UTF-8, this means that there may be fewer than 255 characters in a 255 byte sequence (as few as 63 in the worst case). Please use the word "bytes" here if that is what is meant and include a note along the lines of: "Note that a character can require more than one byte to encode, making the length of the path in characters less than 255."
>

That is correct. Changed it to "bytes" and included your note.

> 8. [non-i18n] I think you mean to s/260/255 in this note:
>
> Note for implementers: as this specification does not put a restriction on path length, implementers need to be prepared to deal with path lengths longer than 260 characters.
>

This is intended as warning for Windows implementers. The actual path
length supported by Windows XP/Vista is 259 characters (255 + when you
include the "x:\" and the terminator). We did some research about this
a while ago. I've read this restriction goes away when Windows is
running in Unicode, but I've never seen it first hand.

> 9. [non-i18n] The "valid zip relative path" ABNF is flawed. It has a production called 'ascii-chars', but both utf8-chars and cp437-chars use 'ascii-range' instead.
>

Fixed.

> 10. Section 6. The example has no language attributes, non-ASCII, or IRI-style values. Probably this example is fine, but it is always nice to see examples that use international capabilities.
>

I've added a note to include this to the example. I might get you to
check it once I get around to actually doing it.

> 11. "Attribute Values and Types". Put quotes around the example numbers to make clear where they are. Some cultures use comma as a decimal separator and it will be clearer what your examples are if you do this.
>

Good point. Added quotes to examples.

> 12. "URI attribute" You specify BOTH URI and IRI here. And you specify path as being strictly URI (3986). But elsewhere you consistently use the term IRI. Even more confusingly, the word "token" doesn't even appear in 3986 and appears only once (in another context) in 3987. I think you should specify the specific format, preferably IRI.
>

Ok, I've marked this as an issue in the spec. I will go back and clean
up everything to make it consistent with IRIs. However, I want to
finish the Requirements document before I dive back into the Packaging
spec. I should have the URI/IRI issue addressed by the end of August.

> 13. The "name" and "description" elements (6.5, 6.6). Here are human readable strings with no xml:lang. You should allow an xml:lang on these elements.

Ok, I've added xml:lang as an optional to the name and description
elements. I've also allowed xml:lang on license and on the widget
element. When applied to the widget element, it means that name,
description, license are in the language specified in xml:lang unless
overridden by the declaration of an xml:lang on an element. E.g.:

<widget xml:lang="en-au">
     <license>bla bla</license>
     <name xml:lang="fr">.. </name>
</widget>

>Ideally, you should permit multiple descriptions (at least) in multiple languages. These elements are often displayed in widget management environments (before the widget is invoked or running).
>

>From my reading of  "Best Practice 12: Working with multilingual
documents" of the Best Practices for XML Internationalization [1],
what you are suggesting is considered bad practice. We followed the
recommendation of [1] and made authors declare internationalized
content in separate documents.

> 14. The "author" element (and its attributes) (6.7) The uri and email attributes should both be of the IRI-flavor, although non-ASCII email names are currently controversial.
>

I will fix the author's URL to be an IRI once I go through and fix up
all the URI/IRI issues.
I think I will leave the email as a string, rather than requiring
authors to declare them as an IRI... though it might be that emails
are represented as IRIs internally. I've made a note in the spec as a
potential issue.

> 15. The "content" element (6.11) The default 'type' is "text/html" with no charset. It would be better if the default included a charset. The 'src' attribute uses the word "URI", where you should say "IRI".
>

I've added a charset attribute. The attribute defaults ot UTF-8.
However, I think we will still recommend that parsers use sniffing to
determine the encoding when possible (as defined in HTML5).

> 16. Section 7. "widget locale". This is specified as:
>
> --
> widget Locale
>    The system locale as an RFC3066 language code (eg. en-us)
> --
>
> This should say "The system locale as a BCP 47 language tag (e.g. en-US or zh-Hant-TW)"

Fixed.

> 17. In this same section, "rules for getting text content": I notice that any bidi markup will be removed.
>
> 18. General observation: there is no way to set base directionality (for bidi text) on any of the description/name/etc. elements or for the widget overall. This may make some bidi languages (Arabic, Hebrew, Urdu, etc.) work poorly in a widget.
>

I've raised this as an issue [2]. I will need to change the parsing
algorithm to introduce a <span dir="ltr|rtl"> element and possibly a
dir attribute on elements that might require it (e.g. description,
name, etc.)... If you have any suggestions on what would be suitable
here, they would be greatly appreciated.


Thanks again!

[1] http://www.w3.org/TR/xml-i18n-bp/#DevSpan
[2] http://lists.w3.org/Archives/Public/public-webapps/2008JulSep/0388.html
-- 
Marcos Caceres
http://datadriven.com.au

Received on Wednesday, 13 August 2008 04:28:59 UTC