- From: Marcos Caceres <marcosc@opera.com>
- Date: Mon, 12 Oct 2009 22:35:30 +0200
- To: Marcin Hanclik <Marcin.Hanclik@access-company.com>
- Cc: public-webapps <public-webapps@w3.org>
> >>>2. If file has a file-extension, attempt to match the file-extension >>>to one in the file extensions column in the file identification table. >>>If there is a match, then return the media type value. (returns >>>"image/jpeg") > I think file-extension would not be matched, but only base-name. > > I think the grammar is not ambiguous with regard to which rules would be matched. > The problem is that at present in case of .jpg, there would be no file extension. > A greedy parser would only match base-name and leave file-extension empty, since it is optional. > So we need to modify the grammar to clearly specify what the extension is. > With the current grammar, there is also a problem that "." is also allowed in the file-extension as part of the allowed-char. > Therefore any parser may be confused which dot is the "." from the file-extension rule (I am not sure whether a parser can be developed at all). > And thus, file-extension has problems. I assume that file extensions do not have dots, dot is to be the delimiter. > > What about modifying the ABNF to: > > file-name = file-name-with-extension | file-name-no-extension > > file-name-with-extension = base-name file-extension > > base-name = *allowed-char > > file-extension = "." 1*allowed-char-no-dot > > allowed-char-no-dot = safe-char-no-dot / utf8-char > > safe-char-no-dot = ALPHA / DIGIT / SP / "$" / "%" > / "'" / "-" / "_" / "@" > / "~" / "(" / ")" / "&" / "+" > / "," / "." / "=" / "[" / "]" > > file-name-no-extension = base-name-no-ext > > base-name-no-ext = 1*allowed-char-no-dot > > This would make the base-name optional. > .jpg is a valid file name, specifically on Linux platforms. > Then, .jpg would have (only) a file extension and probably the prose of P&C would not need to be changed. > As part of this discussion I spend some time fine tuning the ABNF. I merged in all the external refs and pumped out a few thousand test cases for analysis using abnfgen [1]. Works great in MacOS X. I also updated the spec to cover the following use cases [3]: 1. "noextension" > send to [SNIFF] spec. 2. "some.ext" > try to recognize extension. If fail, send to [SNIFF] spec. 3. ".something" > send to SNIFF spec. 4. ".something.ext" > try to recognize extension. If fail, send to SNIFF spec. New ABNF: Zip-rel-path = [locale-folder] [*folder-name] file-name/ [locale-folder] 1*folder-name locale-folder = %x6C %x6F %x63 %x61 %x6C %x65 %x73 "/" language-range "/" folder-name = file-name "/" file-name = base-name [ file-extension ] base-name = 1*allowed-char file-extension = "." 1*allowed-char allowed-char = safe-char / zip-UTF8-char zip-UTF8-char = UTF8-2 / UTF8-3 / UTF8-4 safe-char = ALPHA / DIGIT / SP / "$" / "%" / "'" / "-" / "_" / "@" / "~" / "(" / ")" / "&" / "+" / "," / "=" / "[" / "]" / "." UTF8-2 = %xC2-DF UTF8-tail UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) / %xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail ) UTF8-4 = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( UTF8-tail ) / %xF4 %x80-8F 2( UTF8-tail ) UTF8-tail = %x80-BF language-range = (1*8low-alpha / "*") *("-" (1*8alphanum / "*")) alphanum = low-alpha / DIGIT low-alpha = %x61-71 [1] http://www.quut.com/abnfgen/ (using abnfgen path.abnf | xargs mkdir -p ) [SNIFF] http://tools.ietf.org/html/draft-abarth-mime-sniff-03 [3] http://dev.w3.org/2006/waf/widgets/Overview_TSE.html#default-icons-table -- Marcos Caceres http://datadriven.com.au
Received on Monday, 12 October 2009 20:36:20 UTC