Re: [whatwg] How to determine content-type of file: protocol from Nils Dagsson Moskopp on 2014-08-14 (public-whatwg-archive@w3.org from August 2014)

From: Nils Dagsson Moskopp <nils@dieweltistgarnichtso.net>
Date: Thu, 14 Aug 2014 15:23:49 +0200
To: duanyao <duanyao@ustc.edu>, "Gordon P. Hemsley" <me@gphemsley.org>, whatwg@whatwg.org
Message-ID: <87vbpvmcgq.fsf@dieweltistgarnichtso.net>
duanyao <duanyao@ustc.edu> writes:

> On 07/28/2014 22:08, Gordon P. Hemsley wrote:
>> On 07/28/2014 08:01 AM, duanyao wrote:
>>> On 07/28/2014 06:34, Gordon P. Hemsley wrote:
>>>> Sorry for the delay in responding. Your message fell through the
>>>> cracks in my e-mail filters.
>>>>
>>>> On 07/17/2014 08:26 AM, duanyao wrote:
>>>>> Hi,
>>>>>
>>>>> My first question is about a rule in MIME Sniffing specification
>>>>> (http://mimesniff.spec.whatwg.org):
>>>>>
>>>>>     5.1 Interpreting the resource metadata
>>>>>     ...
>>>>>     If the resource is retrieved directly from the file system, set
>>>>> supplied-type to the MIME type
>>>>>     provided by the file system.
>>>>>
>>>>> As far as I know, no main-stream file systems record MIME type for
>>>>> files. Does the spec actually want to say "provided by the operating
>>>>> system" or
>>>>> "provided by the file name extension"?
>>>>
>>>> Yeah, you've hit a known (though apparently unrecorded) bug in the
>>>> spec, originally pointed out to me by Boris Zbarsky via IRC many
>>>> months ago. The intent here is basically just "whatever the computer
>>>> says it is"—whether that be via the file system, the operating system,
>>>> or whatever, and whether it uses magic bytes, file extensions, or
>>>> whatever.
>>>>
>>>> In other words, feel free to read that as "the correct behavior is
>>>> undefined/unknown" at this point.
>>> Thanks for the explanation.
>>>
>>> Recently, file: protocol becomes more and more important due to the
>>> popularity of packaged web applications, including PhoneGap app, Chrome
>>> app, Firefox OS app, Window 8 HTML app, etc (not all of them use file:
>>> protocol directly, but underlying mechanisms are similar).
>>> So If we can't specify a interoperable way to determine a local file's
>>> mime type, porting of packaged web applications can be problematic in
>>> some situations (actually my team already hit this).
>>>
>>> I know that currently there is no standard way to determine a local
>>> file's mime type, this may be one of the reason that mimesniff spec has
>>> not defined a behavior here.
>>
>> Well, the most basic reason is because I never delved into how it 
>> actually works, because I was primarily concerned with HTTP connections.
>>
>> It's possible that there is no interoperable way to determine a local 
>> file's MIME type, but see below.
>>
>>> I'd like to propose a simple way to resolve this problem:
>>> For mime types that has already been standardized by IANA and used in
>>> web standards, determine a local file's supplied-type according to its
>>> file extension.
>>> This list could include htm, html, xhtml, xml, svg, css, js, ipeg, ipg,
>>> png, mp4, webm, woff, etc. Otherwise, UAs can determine supplied-type by
>>> any means.
>>>
>>> I think this rule should resolve most of the interoperability problems,
>>> and largely maintain compatibility with current UAs' implementations.
>>
>> There is already a "standard" in place to detect file types on the 
>> operating system level:
>>
>> http://www.freedesktop.org/wiki/Specifications/shared-mime-info-spec/
>> http://cgit.freedesktop.org/xdg/shared-mime-info/
>>
>> I could just refer to that and be done with it. Do you think that 
>> would work? (That specification has complex rules for detecting files, 
>> including magic bytes and whatnot, and is already used on a number of 
>> Linux distros and probably other operating systems.)
>>
> Maybe no.
> (1) it's a standard of *nix desktops, I doubt MS widows will adopt it,

I see this as pure speculation.

> and maybe it's a bit heavy for mobile OS;

Widely used mobile operating systems are based on Unix (e.g. iOS,
Android). Based on your measurements, how long does file(1) take?

> (2) many packaged web apps are ported from (and share codes with) normal 
> web apps, and most web servers simply deduce mime type from file extension,
> so doing the same thing in UAs probably results in better
> compatibility.

It may not be possible to deduce the media type from the file extension
alone, since there can be parameters to the media type like “charset” or
“codecs”, e.g. “text/html; charset=UTF-8” or “audio/ogg; codecs=vorbis”.

> (3) UAs are already required to do mime type sniffing, which should be 
> enough to correct most wrong supplied-type.

Is this interoperable enough yet for the purpose at hand?

-- 
Nils Dagsson Moskopp // erlehmann
<http://dieweltistgarnichtso.net>
Received on Thursday, 14 August 2014 13:24:27 UTC