- From: Thomas Roessler <tlr@w3.org>
- Date: Thu, 22 Nov 2007 10:53:47 +0100
- To: Marcos Caceres <marcosscaceres@gmail.com>
- Cc: "public-appformats@w3.org" <public-appformats@w3.org>, Arve Bersvendsen <arveb@opera.com>, ishida@w3.org
(+cc Richard Ishida; Richard, despite the subject, there's an i18n angle to this.) On 2007-11-22 17:02:44 +1000, Marcos Caceres wrote: > The zip relative path will represent one of: > > * the name of a file (eg. index.html), > * the name of a folder (eg. logs/), > * the name of a folder within a hierarchy of folders (eg. styles/sounds/), > * or the name of a file within a hierarchy of folders (eg. > styles/images/background.png). Is there a BNF grammar for the zip relative path? > For each file name field in a Zip archive, the zip relative path > must be encoded as either US-ASCII or UTF-8. Other encodings must > not be used and if encountered a widget user agent must treat the > zip archive as an invalid Zip archive. > For interoperability, and where possible, encoding in US-ASCII is > preferred. Don't say things like that. Either say that user agents MUST support both kinds of relative paths, or pick one. > Irrespective of encoding, a zip relative path must be treated as > case insensitive. As such, if a Zip archive contains two or more > file names in the same folder that map to the same string > following normalization on caseless matching as described in > [Unicode Case Mapping], then the widget user agent must treat the > zip archive as being an invalid Zip archive. I seem to recall that case-insensitive comparisons outside the US-ASCII range are a can of worms; at least the IDN community punted on the issue. Please consult with the i18n activity before mandating UTF-8 case insensitivity. You might be better off not going down that route. > ==Rules for validating US-ASCII paths== Why only for US-ASCII paths, and not also for non-ASCII paths? > Unless otherwise stated, any violation of the following conformance > statements means that the Zip archive is non-conforming and a widget user > agent must treat it as an invalid Zip archive. > > A US-ASCII relative path is the string derived from the zip relative path *snip* So how do I derive that string? By removing all delimiters and space characters, and then replacing everything else by the character "z"? The result would conform to the grammar. Cheers, -- Thomas Roessler, W3C <tlr@w3.org>
Received on Thursday, 22 November 2007 09:53:55 UTC