RE: MIME types for packaged content

2009/2/13 Larry Masinter <masinter@adobe.com>:

> I think it would be much better to allow content types to be
> derived by the packager and included in the package on
> a file-by-file basis. This was the finding during the
> development of MHTML many years ago, and the situation
> isn't different here.
>
to which Marcos replied:

> Personally, I'm not a big fan of having to get a packager to inspect
> all the media types to derive a manifest every time I build a package

But of course, that wasn't what I was proposing, 
and your proposal seems to allow what I was asking
for -- a way for individual types of individual
content to be marked.

-  "allow" is different from "require".
-  The information about content-types need not be
    in the "manifest" -- there are several other possibilities

I think the criteria for evaluation of the proposals should
be primarily focused on interoperability. If you use file
extensions to denote file types, then you will be faced with
content which would have to be renamed, but renaming is
itself a difficulty because it can't be done effectively
for some content and even when possible, requires rewriting
content that is otherwise sent intact.

That was the reason why MHTML allowed file-by-file
content-type.

Of course, the packager is itself reading the files in
any case to include them in the package, so I don't
think there's a significant performance impact.


> (I think if there is going to be a media type sniffer then it should
> just be built into the widget engine and should have standardized
> behavior... whatever that may be). 

If the source of a message knows something
about the types of data it is including in the
message, then the source should be able to
communicate that knowledge in an effective way.

The source may have additional information about
file types that requires no  "sniffing" at all.

The "sniffing" process is intrinsically unreliable.
There are well-known cases where file types
cannot be accurately "sniffed". 

In some cases, the sender might have locally
configured file extensions or creator codes
or other maintained sources of information, using
conventions only known to the sender and not
to the receiver, or other locally established
ways of mapping files to file types.

Perhaps misconfigured HTTP servers have damaged
the value of Content-Type headers in HTTP (a claim I
will dispute, but not the subject here), but if you
are creating a new packaging system, carrying forward
the breakage makes less sense.

> I also see having a 1 to 1 mapping between file and 
> media types in a separate metadata definition/file as
> fragile because they can easily fall out of sync.

I have trouble imagining a use case where this happens
at all, much less "easily". Can you provide a scenario
where this can happen?

> As an author, I
> should not have to rely on packaging tool for creating widgets. This
> is an explicit design goal for Widgets [2] (see 'ease of use'). For
> this reason, I proposed just using a MIME to file extension mapping
> mechanism, which is loosely based on Apache.

*Allowing* the indication of content-type
for individual files is not the same thing as *requiring*
them. The individual file designation could be
optional, requiring readers to respect such
indications, but not requiring writers to
write them.

Certainly if packaging has any normative guidelines
or requirements other than "Random bunch of files ZIP'd
together using anything that claims to be ZIP compatible",
then you will need a tool for creating a package,
in any case.

Apache includes mechanisms for setting the MIME type of
any piece of content, including those delivered by
scripts. So  claiming that Apache is the basis for not
allowing the content-type of individual package components
to be explicitly labeled on a component-by-component 
basis is misleading.

> So people don't need to go to [1], the proposed solutions looks
> something like this:

> <widget xmlns="..." >
>   <media ext='php' type='application/html+xml' />
> </widget>

Yes, PHP is clearly a precedent where different .php scripts
might produce different contents.

> For the second part of the proposal [1], I said we should have
>  something like:
> <file path="/some/path.file" type="some/type" charset="name" />

This would be OK, except:

"charset" is an allowed parameter of some Internet Media Types, and
not of others.  Rather than separating the Internet Media Type
from its optional or required parameters in two different attributes,
I would suggest you use   content-type="some/type;param='value'"
or some other syntax.  content-type="text/plain;charset='iso-8859-1'"
for example.

(The data: URI scheme covers the issues with the unnecessary flexibility
of content-type strings pretty well, you might consider using that
here.) http://www.ietf.org/rfc/rfc2397.txt


I don't have any problem with allowing a combination of
naming schemes, or even pattern directed matching of
path names to content-type.

** RANDOM IDEA ***
(Don't take this too seriously)

One alternative that comes to mind (haven't thought this out)
would be to use file extensions to indicate types, but change
the URI referencing mechanism to allow for renaming, such that
a relative link to
   /some/path/something.php

to be satisfied by

   /some/path/something.php.iso88591txt

where ".iso88591txt" is added at package time, stripped
at interpret time but turned into a suitable content-type
indication. 

Those building packages manually would just rename files
which had file extensions that didn't match their content-type.


Larry
--
http://larry.masinter.net

Received on Tuesday, 24 February 2009 18:38:31 UTC