- From: Robert Burns <rob@robburns.com>
- Date: Fri, 24 Aug 2007 07:40:02 -0500
- To: Julian Reschke <julian.reschke@gmx.de>
- Cc: public-html@w3.org
I think the discussion over this has been interesting and fruitful. I wonder if we might draw some productive issues from it. We have a problem that file types are not labeled properly. Karl and Leif identified one part of this issue as a disjoint between the local practice of filename and the server practice of content headers. Sam suggested we might change those headers by indicating ! important or something like that. Browsers have tried to solve this problem by sniffing content (which actually contributes to the problem since authors are unaware of their errors in setting metadata because of the content sniffing). In addition to other approaches, I like Roy's suggestion of treating the mismatches as an error (even if we require that error be handled gracefully). So I think we all have a good understanding of the general problem. However, the specifics are the essential missing pieces from much of the discussion. I propose we try to focus more on those. In particular, 1) what is the source of author error in setting file type information; and 2) how do browsers currently handle type determination? So here's some avenues of investigation: Building on Kar's point about the mismatch between local and server approaches: • For local files, filename extensions have become the nearly universal practice for setting file type handling in local file systems. It would be useful to determine if authors make any significant number of errors in setting filename extensions. If they do, where does this happen and can we learn anything about why it happens. Is there anything we as the HTML WG (or in cooperation with other groups) can do to address this problem. Mac OS certainly creates opportunities for author error in this respect in that other type setting mechanisms can prevent authors from realizing a missing filename extension (needed once the file goes tot he server or is accessed using HTTP). However, modern Mac OS applications make if difficult to create new content that doesn't have a proper filename extension • Servers typically try to map filename extensions to content type headers. Servers may also be configured to provide content headers that are not based solely on the filename extension (or not at all). Is this a large source for the problem iI suspect the server mapping issue is a big source of the problem. That is authors set their filename extensions correctly because they receive immediate local OS feedback when the filename extension is wrong (except Mac OS provides other file type mapping techniques but they are seldom used for web-family resource files). It would be useful to know if this is a big source of the problem. For example, it will do no good to specify a new 'important' content header syntax if that too will be mis-configured. Typically the problem may occur when new file formats become common where the server has been installed and configured long before those formats (and their associated filename extensions) came onto the scene. Investigating these two issues (filename extensions and extension to header mapping) might require mere discussion among the WG members. What do we think about mistaken filename extensions? What do we think about mistaken filename extension to content header mappings? Is there any library research or research from W3C members that might shed some light on the issue? For example, if we explore these issues, and determine that filename extensions nearly universally reflect the authors intentions, then perhaps content sniffing is not the way to go (this is just a hypothetical, it may not be the case). In that case browsers that think they're providing greater value to their users by sniffing content, are not doing that. It is the browser that treats filename extensions or filename extensions in combination with content headers as authoritative that will provide a better experience than content sniffing. Sure content sniffing an image may be easy to do, but if the only time an image has a different filename extension is the times when an author wants it treated as a download (just as a semi- flawed example) then the browser that doesn't sniff provides a better user experience. At other times, a filename extension may be missing or unknown. There it might make sense to turn to sniffing as another (probably rare) fallback mechanism. Determining the source of author error is one avenue of exploration. The other specific that would help the conversation would be to understand what browsers are doing now. It would be helpful for the WG to conduct some research to find out how the latest browsers treat content whose labeling differs for all sorts of resources and sub- resources. So I suggest we might investigate A. identifying how current browsers handle content based on - content-type header - filename extension - @type attribute - sniffed content B. determine the browser priorities for the content indicators listed in 'A' for: - the main resource (including HTML, XML, RSS, ATOM, MPEG, JPEG, PNG, etc.) - LINK@href for style sheet data - LINK@href for other data - SCRIPT@src - OBJECT@data - IMG@src - IMG@longdesc - @cite - A@href - AREA@href - etc. I think having this information might help focus the conversation better. Take care, Rob
Received on Friday, 24 August 2007 12:40:39 UTC