Re: [widgets] P&C Last Call comments, zip-rel-path ABNF from Marcos Caceres on 2009-06-01 (public-webapps@w3.org from April to June 2009)

From: Marcos Caceres <marcosc@opera.com>
Date: Mon, 1 Jun 2009 21:19:36 +0200
To: Marcin Hanclik <Marcin.Hanclik@access-company.com>
Cc: "public-webapps@w3.org" <public-webapps@w3.org>
Message-ID: <b21a10670906011219y1dead0c6t82d917a43e48efbd@mail.gmail.com>
On Mon, Jun 1, 2009 at 12:44 AM, Marcin Hanclik
<Marcin.Hanclik@access-company.com> wrote:
> Error in ABNF:
> localized-folder vs. locale-folder

Fixed.

> Error with ABNF
> utf8-chars       = safe-chars / U+0080 and beyond
> "and beyond" does not fit here

right. What should be there is:

utf-8-chars      = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
                  / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
                  / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
                  / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
                  / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
                  / %xD0000-DFFFD / %xE1000-EFFFD

> Section 2. of RFC2279 shows that all UTF-8 characters above U+0080 are encoded with byte values over 0x80.
> So utf-8 production equals to cp437 production on the byte level within the context that is important for us.
>

Correct, I think.

> So both productions can be equalized and removed, since allowed-char may be used.

right.

> I think the problem is similar to this one about encoding (I just had a brief look on it):
> http://lists.w3.org/Archives/Public/public-html/2009May/0643.html

Yes, they are just byte ranges.

> Error with ABNF
> cp437-chars      = safe-chars / x80-FF
> should be according to RFC2234:
> cp437-chars      = safe-chars / %x80-FF

Right, but safe chars does not cover the whole CP437 range.

> Due to many issues I would rewrite the whole ABNF as follows.
> ABNF issues, additionally to the above, are:
> 1. plural form used for just "one-of" value

Where?

> 2. the zip-rel-path may have problems with existence, since all productions are optional. The below format seems equal and is
> shorter

See below.

> 3. the production of file-name is wrongly specified, since there file-extension could appear up to 254 times in a file name
>

yes, that is wrong.

> 4. I am not sure whether the file extension could be more than 3 chars or not in the existing ABNF?

Yes, it's at least 1 to many. It's not restricted to 3 and I'm not
sure why you are saying we should restrict it to 3?

> If so, the actual file name shall match 2 rules simultaneously, e.g.:
> file-name1       = 1*allowed-char [ "." 1*allowed-char ]
> file-name2       = 1*254 ( allowed-char )
> Matching of those 2 rules is not expressible in ABNF, so prose would be needed.
>
> New ABNF (problem of file extension length as above still remains):
> **************
> A valid Zip relative path is one that case-insensitively matches the production of Zip-rel-path in the following [ABNF] that
> operates on bytes, not on characters, i.e. after any encoding (CP437 or UTF-8) has been applied:
>
> zip-rel-path     = [ locale-folder ] [ *folder-name ] [ file-name ]

Everything in the above is optional too... so it's the same problem...

> locale-folder    = "locales" "/" Language-Tag "/"
> folder-name      = file-name "/"
> file-name        = base-name [ file-extension ]
> file-extension   = "." 1*3 ( allowed-char )
>
> base-name        = 1*250( allowed-char )
> allowed-char     = safe-char / %x80-FF
> safe-char        = ALPHA / DIGIT / SP / "$" / "%"
>                    / "'" / "-" / "_" / "@"
>                    / "~" / "(" / ")" / "&" / "+"
>                    / "," / "." / "=" / "[" / "]"
> **************
>

Here is another crack at it, taking the bugs your found into
consideration. I also dropped the length restriction:

zip-rel-path   =  [ *folder-name ] file-name /
                         [ locale-folder ] 1*folder-name /
                          locale-folder [ *folder-name ] file-name
locale-folder  = "locales" "/" Language-Tag "/"
folder-name   = file-name "/"
file-name       = base-name [ file-extension ]
base-name     = 1*allowed-char
file-extension = "." 1*allowed-char
allowed-char   = safe-char / utf8-char
safe-char      = ALPHA / DIGIT / SP / "$" / "%"
                    / "'" / "-" / "_" / "@"
                    / "~" / "(" / ")" / "&" / "+"
                    / "," / "." / "=" / "[" / "]"
utf8-char      =  %x80-D7FF     / %xF900-FDCF   / %xFDF0-FFEF
                / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
                / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
                / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
                / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
                / %xD0000-DFFFD / %xE1000-EFFFD


> Authors need to keep path lengths below 250 bytes. Unicode code points can require more than one byte to encode, which can result in
> a path whose length is less than 250 characters.
> should be
> Authors need to keep path lengths below 250 bytes. Unicode code points may require more than one byte to encode a character, which
> can result in a path whose length is less than 250 characters to be represented in more than 250 bytes.

fixed.

> UTF8-chars
> should be
> utf8-chars or utf8-char or something new (after the ABNF is updated) .

Fixed.



-- 
Marcos Caceres
http://datadriven.com.au
Received on Monday, 1 June 2009 19:20:45 UTC