Re: [Widgets] ASCII File names - request for comments

(+cc Richard Ishida; Richard, despite the subject, there's an i18n
angle to this.)

On 2007-11-22 17:02:44 +1000, Marcos Caceres wrote:

> The zip relative path will represent one of:
> 
> * the name of a file (eg. index.html),
> * the name of a folder (eg. logs/),
> * the name of a folder within a hierarchy of folders (eg. styles/sounds/),
> * or the name of a file within a hierarchy of folders (eg.
> styles/images/background.png).

Is there a BNF grammar for the zip relative path?

> For each file name field in a Zip archive, the zip relative path
> must be encoded as either US-ASCII or UTF-8. Other encodings must
> not be used and if encountered a widget user agent must treat the
> zip archive as an invalid Zip archive.

> For interoperability, and where possible, encoding in US-ASCII is
> preferred.

Don't say things like that.  Either say that user agents MUST
support both kinds of relative paths, or pick one.

> Irrespective of encoding, a zip relative path must be treated as
> case insensitive. As such, if a Zip archive contains two or more
> file names in the same folder that map to the same string
> following normalization on caseless matching as described in
> [Unicode Case Mapping], then the widget user agent must treat the
> zip archive as being an invalid Zip archive.

I seem to recall that case-insensitive comparisons outside the
US-ASCII range are a can of worms; at least the IDN community punted
on the issue.  Please consult with the i18n activity before
mandating UTF-8 case insensitivity.  You might be better off not
going down that route.

> ==Rules for validating US-ASCII paths==

Why only for US-ASCII paths, and not also for non-ASCII paths?

> Unless otherwise stated, any violation of the following conformance
> statements means that the Zip archive is non-conforming and a widget user
> agent must treat it as an invalid Zip archive.
> 
> A US-ASCII relative path is the string derived from the zip relative path

*snip*

So how do I derive that string?  By removing all delimiters and
space characters, and then replacing everything else by the
character "z"? The result would conform to the grammar.

Cheers,
-- 
Thomas Roessler, W3C  <tlr@w3.org>

Received on Thursday, 22 November 2007 09:53:55 UTC