I18N issues for Widgets Spec [Was: Re: [Widgets] ASCII File names - request for comments]

Hi Richard - Marcos is the Editor of the Web Application Format WG's  
Widgets spec (NB. section 2.5):

  <http://www.w3.org/TR/widgets/>

We have some i18n-related issues we would like to discuss with you  
and the I18N community e.g. characters to use in file names, max file  
lengths, etc. Marcos agreed to summarize our issues.

Is www-international a good place to discussion the issues?

Thanks,

Art
---



On Nov 22, 2007, at 4:53 AM, ext Thomas Roessler wrote:

>
> (+cc Richard Ishida; Richard, despite the subject, there's an i18n
> angle to this.)
>
> On 2007-11-22 17:02:44 +1000, Marcos Caceres wrote:
>
>> The zip relative path will represent one of:
>>
>> * the name of a file (eg. index.html),
>> * the name of a folder (eg. logs/),
>> * the name of a folder within a hierarchy of folders (eg. styles/ 
>> sounds/),
>> * or the name of a file within a hierarchy of folders (eg.
>> styles/images/background.png).
>
> Is there a BNF grammar for the zip relative path?
>
>> For each file name field in a Zip archive, the zip relative path
>> must be encoded as either US-ASCII or UTF-8. Other encodings must
>> not be used and if encountered a widget user agent must treat the
>> zip archive as an invalid Zip archive.
>
>> For interoperability, and where possible, encoding in US-ASCII is
>> preferred.
>
> Don't say things like that.  Either say that user agents MUST
> support both kinds of relative paths, or pick one.
>
>> Irrespective of encoding, a zip relative path must be treated as
>> case insensitive. As such, if a Zip archive contains two or more
>> file names in the same folder that map to the same string
>> following normalization on caseless matching as described in
>> [Unicode Case Mapping], then the widget user agent must treat the
>> zip archive as being an invalid Zip archive.
>
> I seem to recall that case-insensitive comparisons outside the
> US-ASCII range are a can of worms; at least the IDN community punted
> on the issue.  Please consult with the i18n activity before
> mandating UTF-8 case insensitivity.  You might be better off not
> going down that route.
>
>> ==Rules for validating US-ASCII paths==
>
> Why only for US-ASCII paths, and not also for non-ASCII paths?
>
>> Unless otherwise stated, any violation of the following conformance
>> statements means that the Zip archive is non-conforming and a  
>> widget user
>> agent must treat it as an invalid Zip archive.
>>
>> A US-ASCII relative path is the string derived from the zip  
>> relative path
>
> *snip*
>
> So how do I derive that string?  By removing all delimiters and
> space characters, and then replacing everything else by the
> character "z"? The result would conform to the grammar.
>
> Cheers,
> -- 
> Thomas Roessler, W3C  <tlr@w3.org>
>

Received on Monday, 26 November 2007 13:18:16 UTC