- From: Boris Zbarsky <bzbarsky@MIT.EDU>
- Date: Fri, 24 Aug 2007 15:50:26 -0500
- To: Robert Burns <rob@robburns.com>
- CC: public-html@w3.org
Robert Burns wrote: I agree that investigating this might be worthwhile. Some comments: > 1) what is the source of author error in setting file type information The main sources I've seen are ignorance and inability to affect the server configuration, with the latter being very common. This combines with web servers that default unknown files to types that majority UAs ignore and sniff (e.g. text/plain), to produce a situation where there is little incentive for authors to either learn or to seek out hosting providers that allow types to be set on the server side. For example, nearly all OS X disk image files I've run into on the web have been served as text/plain. And it seems that the people serving them are OK with this. The ignorance aspect shows up in two common ways: allowing the web server to send its default type for the data, and using a server-side solution that sends out a default MIME type unless overridden (e.g. PHP often defaults to text/html) for generating all their documents. This way you get stylesheets, scripts, etc served as text/html. > • For local files, filename extensions have become the nearly universal > practice for setting file type handling in local file systems. Which is unfortunate, since extensions do not uniquely determine type (the "rpm" and "xml" extensions are good examples). In practice, various operating systems and applications use data other than the extension to decide what to do with the data (content sniffing, generall, but OS X 10.4 and later has a way of tagging files with type information independent of the filename or raw data bytes). > It would be useful to determine if authors make any significant number of errors > in setting filename extensions. Insofar as extensions are usable for type identification given the issues above, generally no. In my experience. Of course that doesn't help dynamically-generated content. > a missing filename extension (needed once the file goes tot he > server or is accessed using HTTP). For what it's worth, Apache does have a mode where it will use content-sniffing instead of filename extensions to determine the types it will send. It's not enabled by default (which is too bad), and I don't know how commonly it's used. > I suspect the server mapping issue is a big source of the problem. Absolutely. It's particularly a problem when a "new" extension that the server is not aware of appears or becomes popular. > Typically the problem may > occur when new file formats become common where the server has been > installed and configured long before those formats (and their associated > filename extensions) came onto the scene. As a UA developer, I can say that this is in fact the situation that forced Gecko to add text/plain sniffing. > For example, if we explore these issues, and determine that filename > extensions nearly universally reflect the authors intentions, then > perhaps content sniffing is not the way to go Filename extensions don't cover dynamically generated content, for which the relevant extensions are typically "php", "asp", "pl", "exe", "cgi". > A. identifying how current browsers handle content based on ... > I think having this information might help focus the conversation better. Absolutely. -Boris
Received on Friday, 24 August 2007 21:04:14 UTC