- From: Marcos Caceres <marcosc@opera.com>
- Date: Mon, 1 Jun 2009 21:19:36 +0200
- To: Marcin Hanclik <Marcin.Hanclik@access-company.com>
- Cc: "public-webapps@w3.org" <public-webapps@w3.org>
On Mon, Jun 1, 2009 at 12:44 AM, Marcin Hanclik <Marcin.Hanclik@access-company.com> wrote: > Error in ABNF: > localized-folder vs. locale-folder Fixed. > Error with ABNF > utf8-chars = safe-chars / U+0080 and beyond > "and beyond" does not fit here right. What should be there is: utf-8-chars = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD / %xD0000-DFFFD / %xE1000-EFFFD > Section 2. of RFC2279 shows that all UTF-8 characters above U+0080 are encoded with byte values over 0x80. > So utf-8 production equals to cp437 production on the byte level within the context that is important for us. > Correct, I think. > So both productions can be equalized and removed, since allowed-char may be used. right. > I think the problem is similar to this one about encoding (I just had a brief look on it): > http://lists.w3.org/Archives/Public/public-html/2009May/0643.html Yes, they are just byte ranges. > Error with ABNF > cp437-chars = safe-chars / x80-FF > should be according to RFC2234: > cp437-chars = safe-chars / %x80-FF Right, but safe chars does not cover the whole CP437 range. > Due to many issues I would rewrite the whole ABNF as follows. > ABNF issues, additionally to the above, are: > 1. plural form used for just "one-of" value Where? > 2. the zip-rel-path may have problems with existence, since all productions are optional. The below format seems equal and is > shorter See below. > 3. the production of file-name is wrongly specified, since there file-extension could appear up to 254 times in a file name > yes, that is wrong. > 4. I am not sure whether the file extension could be more than 3 chars or not in the existing ABNF? Yes, it's at least 1 to many. It's not restricted to 3 and I'm not sure why you are saying we should restrict it to 3? > If so, the actual file name shall match 2 rules simultaneously, e.g.: > file-name1 = 1*allowed-char [ "." 1*allowed-char ] > file-name2 = 1*254 ( allowed-char ) > Matching of those 2 rules is not expressible in ABNF, so prose would be needed. > > New ABNF (problem of file extension length as above still remains): > ************** > A valid Zip relative path is one that case-insensitively matches the production of Zip-rel-path in the following [ABNF] that > operates on bytes, not on characters, i.e. after any encoding (CP437 or UTF-8) has been applied: > > zip-rel-path = [ locale-folder ] [ *folder-name ] [ file-name ] Everything in the above is optional too... so it's the same problem... > locale-folder = "locales" "/" Language-Tag "/" > folder-name = file-name "/" > file-name = base-name [ file-extension ] > file-extension = "." 1*3 ( allowed-char ) > > base-name = 1*250( allowed-char ) > allowed-char = safe-char / %x80-FF > safe-char = ALPHA / DIGIT / SP / "$" / "%" > / "'" / "-" / "_" / "@" > / "~" / "(" / ")" / "&" / "+" > / "," / "." / "=" / "[" / "]" > ************** > Here is another crack at it, taking the bugs your found into consideration. I also dropped the length restriction: zip-rel-path = [ *folder-name ] file-name / [ locale-folder ] 1*folder-name / locale-folder [ *folder-name ] file-name locale-folder = "locales" "/" Language-Tag "/" folder-name = file-name "/" file-name = base-name [ file-extension ] base-name = 1*allowed-char file-extension = "." 1*allowed-char allowed-char = safe-char / utf8-char safe-char = ALPHA / DIGIT / SP / "$" / "%" / "'" / "-" / "_" / "@" / "~" / "(" / ")" / "&" / "+" / "," / "." / "=" / "[" / "]" utf8-char = %x80-D7FF / %xF900-FDCF / %xFDF0-FFEF / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD / %xD0000-DFFFD / %xE1000-EFFFD > Authors need to keep path lengths below 250 bytes. Unicode code points can require more than one byte to encode, which can result in > a path whose length is less than 250 characters. > should be > Authors need to keep path lengths below 250 bytes. Unicode code points may require more than one byte to encode a character, which > can result in a path whose length is less than 250 characters to be represented in more than 250 bytes. fixed. > UTF8-chars > should be > utf8-chars or utf8-char or something new (after the ABNF is updated) . Fixed. -- Marcos Caceres http://datadriven.com.au
Received on Monday, 1 June 2009 19:20:45 UTC