Re: [widgets] Potential bug in Rule for Identifying the Media Type of a File

2009/10/22 Marcin Hanclik <Marcin.Hanclik@access-company.com>:
> Hi Marcos, All,
>
>>>If any character in the extension is outside the U+0041-U+005A range
>>>and the U+0061-U+007A range, then go to step 7 in this algorithm.
> Unfortunately I disagree with that.
> Motivation:
> a) only ASCII characters are listed
> b) no digits are listed. What about file extensions that include digits, like e.g. .p12 (PKCS#12 certificate)?

I don't see that file format in the  "File Identification Table".

> c) at present internationalization is a key topic in many circles and I do not understand why we shall restrict the file extensions in XXI century.
>

Because we are trying to find stuff in the "File Identification Table"
(i.e., the algorithm is limited just to those file names). We are not
writing a general algorithm for extension to MIME mapping! That's what
SNIFF does.

> d) there exist proprietary widget specifications and it seems none of them restricts the file extensions.

I don't know what you mean here? We don't restrict anything. We have
the most common types defined, and the ones we don't defined are
handled by SNIFF. I don't see the problem?

> Proposed actions:
> Drop ranges and limits.
> Eventually also contact I18N group and ask their opinion.

I think you've misunderstood the intention of the specification wrt
this section.

>>>That is not possible because trying to do Unicode case comparisons is
>>>a nightmare (or so I'm told).
> I think we should distinguish between possibility and difficulty.

this is totally irrelevant for this algorithm?

> The whole filenames are to be compared (as per P&C) in many cases, and suddenly file extensions cannot be compared.
>

This is just for efficiency.

> E.g.
> "A default start file is a reserved start file at the root of the widget package or at the root of a locale folder whose file name case-sensitively and exactly matches a file name given in the file name column of the default start files table, and whose media type matches the media type given in the media type column of the table."
>
>>>That is correct. This behavior is *nix systems (including Mac OS X).
>>>This is not consistent with the behavior of the operating systems I
>>>have tested.
> I disagree.
> Could you please publish your tests?

I created the files in the finder on MacOs X (Snow Leopard). I prefer
not to send a screenshot to the mailing list.

> In general I think that there is no standard for the term "file extension". P&C actually standardizes it, it seems.
> In the *nix, *inux systems it seems not to exist, it can only be somehow artificially handled by some application (shell etc., see below).
> Here is mine test (executed on Ubuntu and Debian):
> host:~$ mkdir test
> host:~$ touch test/.jpg
> host:~$ touch test/img.jpg
> host:~$ touch test/.gif
> host:~$ touch test/img.gif
> host:~$ ls -laX test/
> total 8
> drwxr-xr-x 2 user user 4096 2009-10-22 15:33 .
> drwxr-xr-x 5 user user 4096 2009-10-22 15:33 ..
> -rw-r--r-- 1 user user    0 2009-10-22 15:33 .gif
> -rw-r--r-- 1 user user    0 2009-10-22 15:33 img.gif
> -rw-r--r-- 1 user user    0 2009-10-22 15:33 img.jpg
> -rw-r--r-- 1 user user    0 2009-10-22 15:33 .jpg
> //It seems that shell is confused, or?
> host:~$ cd test/
> host:~/test$ ls -laX
> total 8
> drwxr-xr-x 2 user user 4096 2009-10-22 15:33 .
> drwxr-xr-x 5 user user 4096 2009-10-22 15:33 ..
> -rw-r--r-- 1 user user    0 2009-10-22 15:33 .gif
> -rw-r--r-- 1 user user    0 2009-10-22 15:33 img.gif
> -rw-r--r-- 1 user user    0 2009-10-22 15:33 img.jpg
> -rw-r--r-- 1 user user    0 2009-10-22 15:33 .jpg
> //It seems that shell is confused, or?
> host:~/test$ basename .jpg
> .jpg
> host:~/test$ cd ..
> host:~$ basename test/.jpg
> .jpg
> host:~$ basename test/.jpg .jpg
> .jpg
> host:~$ basename test/img.jpg .jpg
> img
> host:~$ basename test/img.jpg
> img.jpg
> host:~$ basename test/img.jpg pg
> img.j
> //this test actually proves that the basename app is looking for the [SUFFIX] string in the file name. File extension is ARTIFICIAL!!
>

We know this already, Basename does not exist in the spec anymore? you
made me take it out? That's why we have the prose.

> host:~$
>
> Further comments:
> [1] gives the following guidelines for media type registration:
> "Various sorts of optional information SHOULD be included in the
> specification of a media type if it is available:
> ...
>   o  File name extension(s) commonly used on one or more platforms to
>      indicate that some file contains a given media type.
>
>   o  Mac OS File Type code(s) (4 octets) used to label files containing
>      a given media type."
> The term file (name) extension is not defined. MacOS File Type code seems not to be equivalent to file extension (that stems more from Windows world).
>

is this even relevant now? Or is this some legacy thing for previous
version of Mac Os?

> Historically Windows worked with 3 characters and Mac with 4 characters.
>
> Therefore in P&C we shall assume that file extension is just any sequence of characters that occur after the last dot (U+002E FULL STOP) including that dot.
>

I really don't understand what you are intending to solve or what you
think the spec does here?

To be clear: All we want to do is check if the file extension of a
file case-insensitively matches one of the extensions in the File
Identification Table. If you can't match it, then the MIME type gets
resolved with SNIFF.




-- 
Marcos Caceres
http://datadriven.com.au

Received on Thursday, 22 October 2009 15:26:00 UTC