- From: Marcos Caceres <marcosc@opera.com>
- Date: Mon, 1 Jun 2009 21:19:36 +0200
- To: Marcin Hanclik <Marcin.Hanclik@access-company.com>
- Cc: "public-webapps@w3.org" <public-webapps@w3.org>
On Mon, Jun 1, 2009 at 12:44 AM, Marcin Hanclik
<Marcin.Hanclik@access-company.com> wrote:
> Error in ABNF:
> localized-folder vs. locale-folder
Fixed.
> Error with ABNF
> utf8-chars = safe-chars / U+0080 and beyond
> "and beyond" does not fit here
right. What should be there is:
utf-8-chars = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
/ %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
/ %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
/ %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
/ %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
/ %xD0000-DFFFD / %xE1000-EFFFD
> Section 2. of RFC2279 shows that all UTF-8 characters above U+0080 are encoded with byte values over 0x80.
> So utf-8 production equals to cp437 production on the byte level within the context that is important for us.
>
Correct, I think.
> So both productions can be equalized and removed, since allowed-char may be used.
right.
> I think the problem is similar to this one about encoding (I just had a brief look on it):
> http://lists.w3.org/Archives/Public/public-html/2009May/0643.html
Yes, they are just byte ranges.
> Error with ABNF
> cp437-chars = safe-chars / x80-FF
> should be according to RFC2234:
> cp437-chars = safe-chars / %x80-FF
Right, but safe chars does not cover the whole CP437 range.
> Due to many issues I would rewrite the whole ABNF as follows.
> ABNF issues, additionally to the above, are:
> 1. plural form used for just "one-of" value
Where?
> 2. the zip-rel-path may have problems with existence, since all productions are optional. The below format seems equal and is
> shorter
See below.
> 3. the production of file-name is wrongly specified, since there file-extension could appear up to 254 times in a file name
>
yes, that is wrong.
> 4. I am not sure whether the file extension could be more than 3 chars or not in the existing ABNF?
Yes, it's at least 1 to many. It's not restricted to 3 and I'm not
sure why you are saying we should restrict it to 3?
> If so, the actual file name shall match 2 rules simultaneously, e.g.:
> file-name1 = 1*allowed-char [ "." 1*allowed-char ]
> file-name2 = 1*254 ( allowed-char )
> Matching of those 2 rules is not expressible in ABNF, so prose would be needed.
>
> New ABNF (problem of file extension length as above still remains):
> **************
> A valid Zip relative path is one that case-insensitively matches the production of Zip-rel-path in the following [ABNF] that
> operates on bytes, not on characters, i.e. after any encoding (CP437 or UTF-8) has been applied:
>
> zip-rel-path = [ locale-folder ] [ *folder-name ] [ file-name ]
Everything in the above is optional too... so it's the same problem...
> locale-folder = "locales" "/" Language-Tag "/"
> folder-name = file-name "/"
> file-name = base-name [ file-extension ]
> file-extension = "." 1*3 ( allowed-char )
>
> base-name = 1*250( allowed-char )
> allowed-char = safe-char / %x80-FF
> safe-char = ALPHA / DIGIT / SP / "$" / "%"
> / "'" / "-" / "_" / "@"
> / "~" / "(" / ")" / "&" / "+"
> / "," / "." / "=" / "[" / "]"
> **************
>
Here is another crack at it, taking the bugs your found into
consideration. I also dropped the length restriction:
zip-rel-path = [ *folder-name ] file-name /
[ locale-folder ] 1*folder-name /
locale-folder [ *folder-name ] file-name
locale-folder = "locales" "/" Language-Tag "/"
folder-name = file-name "/"
file-name = base-name [ file-extension ]
base-name = 1*allowed-char
file-extension = "." 1*allowed-char
allowed-char = safe-char / utf8-char
safe-char = ALPHA / DIGIT / SP / "$" / "%"
/ "'" / "-" / "_" / "@"
/ "~" / "(" / ")" / "&" / "+"
/ "," / "." / "=" / "[" / "]"
utf8-char = %x80-D7FF / %xF900-FDCF / %xFDF0-FFEF
/ %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
/ %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
/ %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
/ %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
/ %xD0000-DFFFD / %xE1000-EFFFD
> Authors need to keep path lengths below 250 bytes. Unicode code points can require more than one byte to encode, which can result in
> a path whose length is less than 250 characters.
> should be
> Authors need to keep path lengths below 250 bytes. Unicode code points may require more than one byte to encode a character, which
> can result in a path whose length is less than 250 characters to be represented in more than 250 bytes.
fixed.
> UTF8-chars
> should be
> utf8-chars or utf8-char or something new (after the ABNF is updated) .
Fixed.
--
Marcos Caceres
http://datadriven.com.au
Received on Monday, 1 June 2009 19:20:45 UTC