Re: [widgets] Content-Type Processing Model from Marcos Caceres on 2009-02-18 (public-webapps@w3.org from January to March 2009)

From: Marcos Caceres <marcosc@opera.com>
Date: Wed, 18 Feb 2009 14:11:58 +0100
To: Adam Barth <w3c@adambarth.com>
Cc: Ian Hickson <ian@hixie.ch>, public-webapps <public-webapps@w3.org>
Message-ID: <b21a10670902180511s18f65f1al1f1f90ef4cbf23b7@mail.gmail.com>
Hi Adam,
On Tue, Feb 3, 2009 at 7:22 PM, Adam Barth <w3c@adambarth.com> wrote:
> On Tue, Feb 3, 2009 at 7:39 AM, Marcos Caceres <marcosscaceres@gmail.com> wrote:
>>>> At the moment, [1] reads:
>>>>
>>>> " For resources fetched from the file system, user agents should use
>>>>  platform-specific conventions, e.g. operating system extension/type
>>>>  mappings."
>>>>
>>>> We are concerned that operating system extension/type mappings might
>>>> cause issues for widget engines because those mapping could be
>>>> incorrect, or come from arbitrary sources etc.
>>>
>>> We have the above text so that users that point their browsers at
>>> their own file systems get types that are consistent with the types
>>> they get by looking at the files in Windows Explorer or the Finder,
>>> etc.  I wouldn't think that widgets would trigger this clause, just as
>>> HTTP resources fetched from a disk cache don't trigger it.
>>
>> As some widget engines work with file:// so there is a risk that the
>> above is happening. We explicitly want to avoid that from happening.
>> However, I understand if that not a concern for your specification.
>
> I'm happy to change that sentence.  How would you like sentence changed?

As widgets handle this in their own spec, I think it is fine to leave
the sentence within your own spec as is. I think it's probably good
enough to provide the general guidance as it currently does.

>>>> So we are looking to
>>>> collaborate to resolve this issue and wondering if that can be
>>>> standardized as part of [1]. Our current approach in [2] has been to
>>>> provide a table with a bunch of file extension to MIME mappings.
>>>
>>> Why is the magic number approach insufficient?
>>
>> Personally, I don't know :) Coupling explicit extension to MIME
>> mapping with magic numbers as a fallback seemed like a reasonable
>> solution to me. However, some people keep getting paranoid about the
>> current solution in the widget spec. This is why I thought we would
>> work with you guys on this as you have more experience and understand
>> the security problem better. However, I also understand that this
>> might just be something that widget specs need to deal with on their
>> own.
>
> I'd recommend the following:
>
> 1) Provide a mechanism for the widget author to specify a mime type
> via some sort of metadata (analogous to the HTTP Content-Type header).

Ok, I will propose something to the Web Apps Working Group. I think we
will follow Apache's model on this (i.e., 'addtype ext mime/type'),
but also allow per file mime type declaration (e.g. <file
path="some/path.ext" type="some/type">).

However, do you foresee any security issues with allowing authors to
manually override commonly know mime types? Should we protect against
that or honor what an author declares? Is there situations where this
could be dangerous (e.g., treating a txt file as
'application/javascript' or something else)?

> 2) Use the magic number sniffing algorithm if authors don't specify a mime type.

During the configuration phase, I think we will use both file
extension and magic number sniffing, with file extension to mime
mapping taking priority over sniffing. Sniffing will only kick in if
the extension is missing. We will define these things as part of the
Widgets Packaging spec.

At runtime, however, documents will be made to rely on [1].

> Looking at the mime types in the widget spec, the only ones that
> aren't readily identifiable using magic numbers are
> application/javascript, text/css, and text/plain.  Those mime types
> only matter if they are displayed in the main content area (e.g., the
> <script> tag doesn't care about mime types).

Right.

> For things like GIF and JPEG, the magic numbers have to be right
> anyway or else the image won't display, so you're more likely to get
> the right answer from the magic number than the file extension anyway.

Right. That's what I thought.

> The remaining interesting case is image/svg+xml, which we don't have
> in the sniffing spec.  We can probably add it though, if that would be
> helpful.

It would be helpful for us. I guess that would constitute parsing the
XML and finding the SVG namespace?

Kind regards,
Marcos

[1] http://tools.ietf.org/html/draft-abarth-mime-sniff-00
-- 
Marcos Caceres
http://datadriven.com.au
Received on Wednesday, 18 February 2009 14:19:57 UTC