[Widgets] ASCII File names - request for comments

Hi all,
I've drafted some initial text for file and folder naming restrictions for
widgets. I would really appreciate any feedback:

=File and folder names=

For the purpose of this specification, a zip relative path is the
variable-length string value of the file name field of a local file header
of a Zip archive (see [Zip] for definitions and details of the file name
field and local file header). Each file stored in the Zip archive is
assigned its own local file header [Zip]. A zip relative path is said to be
"relative" as it stores the string that represents file and folder names
relative to where the zip archive was created on a file system (eg.
images/bg.png), as opposed to storing an absolute path on the file system
(eg. c:\temp\images\bg.png). The value of a zip relative path will generally
match the string value of a name of the file or folder(s) on the device on
which the zip archive was created.

The zip relative path will represent one of:

* the name of a file (eg. index.html),
* the name of a folder (eg. logs/),
* the name of a folder within a hierarchy of folders (eg. styles/sounds/),
* or the name of a file within a hierarchy of folders (eg.
styles/images/background.png).

For each file name field in a Zip archive, the zip relative path must be
encoded as either US-ASCII or UTF-8. Other encodings must not be used and if
encountered a widget user agent must treat the zip archive as an invalid Zip
archive.

For interoperability, and where possible, encoding in US-ASCII is preferred.


In a Zip archive, when general purpose bit 11 of a local file header is set
to 0, the zip relative path must be processed as US-ASCII in accordance with
the rules for validating US-ASCII paths (below). When general purpose bit 11
of a local file header is set to 1, the zip relative path must be processed
as UTF-8 in accordance with the rules for validating UTF-8 paths.

Irrespective of encoding, a zip relative path must be treated as case
insensitive. As such, if a Zip archive contains two or more file names in
the same folder that map to the same string following normalization on
caseless matching as described in [Unicode Case Mapping], then the widget
user agent must treat the zip archive as being an invalid Zip archive.

==Rules for validating US-ASCII paths==

Unless otherwise stated, any violation of the following conformance
statements means that the Zip archive is non-conforming and a widget user
agent must treat it as an invalid Zip archive.

A US-ASCII relative path is the string derived from the zip relative path
that matches the production for ascii-rel-path in the following ABNF and
conforms to the proceeding conformance clauses of this section:

ascii-rel-path     = ( *folder [ filename ] )
folder-name         = 1*243allowed-characters delimiter
delimiter          = "/"
filename           = 1*255( *basename [file-extension] )
basename           = allowed-characters
file-extension     = "." 1*allowed-characters
allowed-characters = ALPHA / DIGIT / SP / "$" / "%" / "'" / "-" / "_" / "@"
                     / "~" / "`" / "!" / "(" / ")" / "^" / "#" / "&" / "+"
                     / "," / "." / ";" / "=" / "[" / "]" / %x80-FF

ALPHA, DIGIT, and SP are defined in the [ABNF] specification, but
essentially represent alphanumerical characters and the space (x20)
character.

The first or last character of US-ASCII relative path must not be space
characters. A US-ASCII relative path must not be an empty string, meaning
that widget resources must not be created by storing or compressing data
from standard out straight into the zip archive.

The last character of a US-ASCII relative path must not be a "." (x2E).

The following forbidden characters must not appear anywhere in a US-ASCII
relative path:

* < (0x3C)
*> (0x3E)
* : (0x3A)
* " (0x22)
* \ (0x5C)
*| (0x7C)
* ?( 0x3F)
* * ( 0x2A)
* / ( 0x2F)
* control characters (x0-1F)

In addition, the following reserved words must not appear as either a folder
or a basename in a US-ASCII relative path: CON, PRN, AUX, NUL, COM1, COM2,
COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5,
LPT6, LPT7, LPT8, LPT9.

For example, the following files and folder names are allowed: "CON-tact.txt",
"LPT11/", "DCOM1.pdf". The following names are not allowed: "com3.txt"
"Lpt1/", "COM9.gif"

For interoperability, it is preferred that total number of characters in
US-ASCII relative path does not exceed 255 characters.

===
Kind regards,
-- 
Marcos Caceres
http://datadriven.com.au

Received on Thursday, 22 November 2007 07:03:00 UTC