- From: Larry Masinter <masinter@adobe.com>
- Date: Tue, 24 Feb 2009 10:37:22 -0800
- To: "marcosc@opera.com" <marcosc@opera.com>, Larry Masinter <masinter@adobe.com>
- CC: Bjoern Hoehrmann <derhoermi@gmx.net>, public-pkg-uri-scheme <public-pkg-uri-scheme@w3.org>
2009/2/13 Larry Masinter <masinter@adobe.com>: > I think it would be much better to allow content types to be > derived by the packager and included in the package on > a file-by-file basis. This was the finding during the > development of MHTML many years ago, and the situation > isn't different here. > to which Marcos replied: > Personally, I'm not a big fan of having to get a packager to inspect > all the media types to derive a manifest every time I build a package But of course, that wasn't what I was proposing, and your proposal seems to allow what I was asking for -- a way for individual types of individual content to be marked. - "allow" is different from "require". - The information about content-types need not be in the "manifest" -- there are several other possibilities I think the criteria for evaluation of the proposals should be primarily focused on interoperability. If you use file extensions to denote file types, then you will be faced with content which would have to be renamed, but renaming is itself a difficulty because it can't be done effectively for some content and even when possible, requires rewriting content that is otherwise sent intact. That was the reason why MHTML allowed file-by-file content-type. Of course, the packager is itself reading the files in any case to include them in the package, so I don't think there's a significant performance impact. > (I think if there is going to be a media type sniffer then it should > just be built into the widget engine and should have standardized > behavior... whatever that may be). If the source of a message knows something about the types of data it is including in the message, then the source should be able to communicate that knowledge in an effective way. The source may have additional information about file types that requires no "sniffing" at all. The "sniffing" process is intrinsically unreliable. There are well-known cases where file types cannot be accurately "sniffed". In some cases, the sender might have locally configured file extensions or creator codes or other maintained sources of information, using conventions only known to the sender and not to the receiver, or other locally established ways of mapping files to file types. Perhaps misconfigured HTTP servers have damaged the value of Content-Type headers in HTTP (a claim I will dispute, but not the subject here), but if you are creating a new packaging system, carrying forward the breakage makes less sense. > I also see having a 1 to 1 mapping between file and > media types in a separate metadata definition/file as > fragile because they can easily fall out of sync. I have trouble imagining a use case where this happens at all, much less "easily". Can you provide a scenario where this can happen? > As an author, I > should not have to rely on packaging tool for creating widgets. This > is an explicit design goal for Widgets [2] (see 'ease of use'). For > this reason, I proposed just using a MIME to file extension mapping > mechanism, which is loosely based on Apache. *Allowing* the indication of content-type for individual files is not the same thing as *requiring* them. The individual file designation could be optional, requiring readers to respect such indications, but not requiring writers to write them. Certainly if packaging has any normative guidelines or requirements other than "Random bunch of files ZIP'd together using anything that claims to be ZIP compatible", then you will need a tool for creating a package, in any case. Apache includes mechanisms for setting the MIME type of any piece of content, including those delivered by scripts. So claiming that Apache is the basis for not allowing the content-type of individual package components to be explicitly labeled on a component-by-component basis is misleading. > So people don't need to go to [1], the proposed solutions looks > something like this: > <widget xmlns="..." > > <media ext='php' type='application/html+xml' /> > </widget> Yes, PHP is clearly a precedent where different .php scripts might produce different contents. > For the second part of the proposal [1], I said we should have > something like: > <file path="/some/path.file" type="some/type" charset="name" /> This would be OK, except: "charset" is an allowed parameter of some Internet Media Types, and not of others. Rather than separating the Internet Media Type from its optional or required parameters in two different attributes, I would suggest you use content-type="some/type;param='value'" or some other syntax. content-type="text/plain;charset='iso-8859-1'" for example. (The data: URI scheme covers the issues with the unnecessary flexibility of content-type strings pretty well, you might consider using that here.) http://www.ietf.org/rfc/rfc2397.txt I don't have any problem with allowing a combination of naming schemes, or even pattern directed matching of path names to content-type. ** RANDOM IDEA *** (Don't take this too seriously) One alternative that comes to mind (haven't thought this out) would be to use file extensions to indicate types, but change the URI referencing mechanism to allow for renaming, such that a relative link to /some/path/something.php to be satisfied by /some/path/something.php.iso88591txt where ".iso88591txt" is added at package time, stripped at interpret time but turned into a suitable content-type indication. Those building packages manually would just rename files which had file extensions that didn't match their content-type. Larry -- http://larry.masinter.net
Received on Tuesday, 24 February 2009 18:38:31 UTC