- From: Marcos Caceres <marcosc@opera.com>
- Date: Mon, 12 Oct 2009 22:35:30 +0200
- To: Marcin Hanclik <Marcin.Hanclik@access-company.com>
- Cc: public-webapps <public-webapps@w3.org>
>
>>>2. If file has a file-extension, attempt to match the file-extension
>>>to one in the file extensions column in the file identification table.
>>>If there is a match, then return the media type value. (returns
>>>"image/jpeg")
> I think file-extension would not be matched, but only base-name.
>
> I think the grammar is not ambiguous with regard to which rules would be matched.
> The problem is that at present in case of .jpg, there would be no file extension.
> A greedy parser would only match base-name and leave file-extension empty, since it is optional.
> So we need to modify the grammar to clearly specify what the extension is.
> With the current grammar, there is also a problem that "." is also allowed in the file-extension as part of the allowed-char.
> Therefore any parser may be confused which dot is the "." from the file-extension rule (I am not sure whether a parser can be developed at all).
> And thus, file-extension has problems. I assume that file extensions do not have dots, dot is to be the delimiter.
>
> What about modifying the ABNF to:
>
> file-name = file-name-with-extension | file-name-no-extension
>
> file-name-with-extension = base-name file-extension
>
> base-name = *allowed-char
>
> file-extension = "." 1*allowed-char-no-dot
>
> allowed-char-no-dot = safe-char-no-dot / utf8-char
>
> safe-char-no-dot = ALPHA / DIGIT / SP / "$" / "%"
> / "'" / "-" / "_" / "@"
> / "~" / "(" / ")" / "&" / "+"
> / "," / "." / "=" / "[" / "]"
>
> file-name-no-extension = base-name-no-ext
>
> base-name-no-ext = 1*allowed-char-no-dot
>
> This would make the base-name optional.
> .jpg is a valid file name, specifically on Linux platforms.
> Then, .jpg would have (only) a file extension and probably the prose of P&C would not need to be changed.
>
As part of this discussion I spend some time fine tuning the ABNF. I
merged in all the external refs and pumped out a few thousand test
cases for analysis using abnfgen [1]. Works great in MacOS X. I also
updated the spec to cover the following use cases [3]:
1. "noextension" > send to [SNIFF] spec.
2. "some.ext" > try to recognize extension. If fail, send to [SNIFF] spec.
3. ".something" > send to SNIFF spec.
4. ".something.ext" > try to recognize extension. If fail, send to SNIFF spec.
New ABNF:
Zip-rel-path = [locale-folder] [*folder-name] file-name/
[locale-folder] 1*folder-name
locale-folder = %x6C %x6F %x63 %x61 %x6C %x65 %x73
"/" language-range "/"
folder-name = file-name "/"
file-name = base-name [ file-extension ]
base-name = 1*allowed-char
file-extension = "." 1*allowed-char
allowed-char = safe-char / zip-UTF8-char
zip-UTF8-char = UTF8-2 / UTF8-3 / UTF8-4
safe-char = ALPHA / DIGIT / SP / "$" / "%"
/ "'" / "-" / "_" / "@"
/ "~" / "(" / ")" / "&" / "+"
/ "," / "=" / "[" / "]" / "."
UTF8-2 = %xC2-DF UTF8-tail
UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) /
%xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail )
UTF8-4 = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( UTF8-tail ) /
%xF4 %x80-8F 2( UTF8-tail )
UTF8-tail = %x80-BF
language-range = (1*8low-alpha / "*") *("-" (1*8alphanum / "*"))
alphanum = low-alpha / DIGIT
low-alpha = %x61-71
[1] http://www.quut.com/abnfgen/
(using abnfgen path.abnf | xargs mkdir -p )
[SNIFF]
http://tools.ietf.org/html/draft-abarth-mime-sniff-03
[3]
http://dev.w3.org/2006/waf/widgets/Overview_TSE.html#default-icons-table
--
Marcos Caceres
http://datadriven.com.au
Received on Monday, 12 October 2009 20:36:20 UTC