- From: Marcin Hanclik <Marcin.Hanclik@access-company.com>
- Date: Tue, 29 Sep 2009 21:09:01 +0200
- To: Marcin Hanclik <Marcin.Hanclik@access-company.com>, "marcosc@opera.com" <marcosc@opera.com>, public-webapps <public-webapps@w3.org>
A little corrections to the grammar: - removal of dot (safe-char-no-dot rule) - allowing file name ending with a dot (file-extension rule) [Is it a correct file extension?] file-name = file-name-with-extension | file-name-no-extension file-name-with-extension = base-name file-extension base-name = *allowed-char file-extension = "." [1*allowed-char-no-dot] allowed-char-no-dot = safe-char-no-dot / utf8-char safe-char-no-dot = ALPHA / DIGIT / SP / "$" / "%" / "'" / "-" / "_" / "@" / "~" / "(" / ")" / "&" / "+" / "," / "=" / "[" / "]" file-name-no-extension = base-name-no-ext base-name-no-ext = 1*allowed-char-no-dot ________________________________________ From: public-webapps-request@w3.org [public-webapps-request@w3.org] On Behalf Of Marcin Hanclik [Marcin.Hanclik@access-company.com] Sent: Tuesday, September 29, 2009 7:15 PM To: marcosc@opera.com; public-webapps Subject: RE: [widgets] Potential bug in Rule for Identifying the Media Type of a File Hi Marcos, Good spot! >>2. If file has a file-extension, attempt to match the file-extension >>to one in the file extensions column in the file identification table. >>If there is a match, then return the media type value. (returns >>"image/jpeg") I think file-extension would not be matched, but only base-name. I think the grammar is not ambiguous with regard to which rules would be matched. The problem is that at present in case of .jpg, there would be no file extension. A greedy parser would only match base-name and leave file-extension empty, since it is optional. So we need to modify the grammar to clearly specify what the extension is. With the current grammar, there is also a problem that "." is also allowed in the file-extension as part of the allowed-char. Therefore any parser may be confused which dot is the "." from the file-extension rule (I am not sure whether a parser can be developed at all). And thus, file-extension has problems. I assume that file extensions do not have dots, dot is to be the delimiter. What about modifying the ABNF to: file-name = file-name-with-extension | file-name-no-extension file-name-with-extension = base-name file-extension base-name = *allowed-char file-extension = "." 1*allowed-char-no-dot allowed-char-no-dot = safe-char-no-dot / utf8-char safe-char-no-dot = ALPHA / DIGIT / SP / "$" / "%" / "'" / "-" / "_" / "@" / "~" / "(" / ")" / "&" / "+" / "," / "." / "=" / "[" / "]" file-name-no-extension = base-name-no-ext base-name-no-ext = 1*allowed-char-no-dot This would make the base-name optional. .jpg is a valid file name, specifically on Linux platforms. Then, .jpg would have (only) a file extension and probably the prose of P&C would not need to be changed. Thanks, Marcin Marcin Hanclik ACCESS Systems Germany GmbH Tel: +49-208-8290-6452 | Fax: +49-208-8290-6465 Mobile: +49-163-8290-646 E-Mail: marcin.hanclik@access-company.com -----Original Message----- From: public-webapps-request@w3.org [mailto:public-webapps-request@w3.org] On Behalf Of Marcos Caceres Sent: Tuesday, September 29, 2009 4:51 PM To: public-webapps Subject: [widgets] Potential bug in Rule for Identifying the Media Type of a File Hi, I think I found another bug :( The current ABNF for a zip relative path allows the first character of a file name to be a ".". So, imagine you have a file in the zip archive called ".jpg" which is actually a text file. In the Rule for Identifying the Media Type of a File, it reads: 1. Let file be the file to be processed. (in this case, ".jpg") 2. If file has a file-extension, attempt to match the file-extension to one in the file extensions column in the file identification table. If there is a match, then return the media type value. (returns "image/jpeg") 3. If file extension is absent, the media type of a file is determined by using the rules set forth in the [SNIFF] specification. So, the rule has incorrectly matched the type and returns "image/jpeg". Options: 1. Disallow "." in the base-name of a file (this means that files named "a...b...c." will be ignored, and so are any file starting with a ".": ".foobar"). 2. Modify 2 above to say: " If file has a file-extension and a base-name, ... " And modify 3, to say "Otherwise, the media type of a file is determined by using the rules set forth in the [SNIFF] specification." However, because of the ambiguity caused by allowing "." in base names, it is basically not possible to determine if the "file extension" of a file is in fact a file extension or a base name. Unsure how to proceed as it is likely that ".filename" type files will end up in widget packages.... it might be safe for user agents to ignore those files. -- Marcos Caceres http://datadriven.com.au ________________________________________ Access Systems Germany GmbH Essener Strasse 5 | D-46047 Oberhausen HRB 13548 Amtsgericht Duisburg Geschaeftsfuehrer: Michel Piquemal, Tomonori Watanabe, Yusuke Kanda www.access-company.com CONFIDENTIALITY NOTICE This e-mail and any attachments hereto may contain information that is privileged or confidential, and is intended for use only by the individual or entity to which it is addressed. Any disclosure, copying or distribution of the information by anyone else is strictly prohibited. If you have received this document in error, please notify us promptly by responding to this e-mail. Thank you. ________________________________________ Access Systems Germany GmbH Essener Strasse 5 | D-46047 Oberhausen HRB 13548 Amtsgericht Duisburg Geschaeftsfuehrer: Michel Piquemal, Tomonori Watanabe, Yusuke Kanda www.access-company.com CONFIDENTIALITY NOTICE This e-mail and any attachments hereto may contain information that is privileged or confidential, and is intended for use only by the individual or entity to which it is addressed. Any disclosure, copying or distribution of the information by anyone else is strictly prohibited. If you have received this document in error, please notify us promptly by responding to this e-mail. Thank you.
Received on Tuesday, 29 September 2009 19:10:09 UTC