Re: [widgets] Potential bug in Rule for Identifying the Media Type of a File from Marcos Caceres on 2009-10-22 (public-webapps@w3.org from October to December 2009)

From: Marcos Caceres <marcosc@opera.com>
Date: Thu, 22 Oct 2009 13:56:17 +0200
To: Marcin Hanclik <Marcin.Hanclik@access-company.com>
Cc: public-webapps <public-webapps@w3.org>
Message-ID: <b21a10670910220456j79fb49f9vde98d04b01709bdc@mail.gmail.com>

On Fri, Oct 16, 2009 at 12:06 PM, Marcin Hanclik
<Marcin.Hanclik@access-company.com> wrote:
> Hi Marcos,
>
> These are my remarks as discussed yesterday on the call.
>
> Comment a)
>
> 6.A.If all characters in the extension are outside the two ranges, then go to step 5 in this algorithm.
>
> Should be
>
> 6.A.If any of the characters in the extension is outside the two ranges, then go to step 5 in this algorithm.
>
> But this is also problematic since it infinitely loops the algorithm in this given case.
> So it should be:
>
> 6.A.If any of the characters in the extension is outside the two ranges, then go to step 7 in this algorithm.

I changed it to:

If any character in the extension is outside the U+0041-U+005A range
and the U+0061-U+007A range, then go to step 7 in this algorithm.


> Another comment to 6.A:
> It seems that the whole algorithm assumes that the File Identification Table is constant.
> E.g. if any vendor would like to add some extension with a character outside of the given ranges (or we in W3C would like to do this in the future), then we would need to rewrite the algorithm.
>

> So what about this (we do not need the ranges IMHO):
> 6.  Attempt to case-insensitively match the value of extension to one of the values in the file extension column in the file identification table. If there is a match, then return the corresponding value from the media type column and terminate this algorithm.
>

That is not possible because trying to do Unicode case comparisons is
a nightmare (or so I'm told).  This is why we restrict to just
checking for ASCII. I find it highly unlikely that we will see
standardized file extensions outside the ASCII range - none exists to
date and there is no evidence to suggest that they will exists in the
future.

> And remove 6.A and 6.B as they were.
>
> *****************
> Comment b)
>
> 4. If the first character of the name is a U+002E 'FULL STOP' character, and the file name contains no other U+002E 'FULL STOP' character then go to step 7 of this algorithm.
>
> What about ".jpg"?
> Do you assume that this is filename and not file extension?

That is correct. This behavior is *nix systems (including Mac OS X).

> What about this:
> 4. If the first character of the name is a U+002E 'FULL STOP' character, and the file name contains no other U+002E 'FULL STOP' character then let extension be name and go to step 6 of this algorithm.
>

This is not consistent with the behavior of the operating systems I
have tested.

> *****************
> Comment c)
>
> Given that the processing model is developed in prose, I think we MUST fix the ambiguity of the grammar anyway.
>
> Thus I suggest the following change from:
>
> file-name      = base-name [ file-extension ]
> base-name      = 1*allowed-char
> file-extension = "." 1*allowed-char
>
> to:
>
> file-name      = 1*allowed-char
>
> (i.e. remove base-name and file-extension).
>
> The removal of ambiguity is motivated by the dependency of the WURI/WUS spec on P&C in this particular detail, so it is better to keep it right, I think.
> File extension does not play any role in WURI/WUS anyway.
> I think either the above change or the one in my mail below has to be implemented in the spec.

Ok, removed it.


-- 
Marcos Caceres
http://datadriven.com.au

Received on Thursday, 22 October 2009 11:56:51 UTC