- From: Robert Burns <rob@robburns.com>
- Date: Fri, 24 Aug 2007 07:40:02 -0500
- To: Julian Reschke <julian.reschke@gmx.de>
- Cc: public-html@w3.org
I think the discussion over this has been interesting and fruitful. I
wonder if we might draw some productive issues from it.
We have a problem that file types are not labeled properly. Karl and
Leif identified one part of this issue as a disjoint between the
local practice of filename and the server practice of content
headers. Sam suggested we might change those headers by indicating !
important or something like that. Browsers have tried to solve this
problem by sniffing content (which actually contributes to the
problem since authors are unaware of their errors in setting metadata
because of the content sniffing). In addition to other approaches, I
like Roy's suggestion of treating the mismatches as an error (even if
we require that error be handled gracefully).
So I think we all have a good understanding of the general problem.
However, the specifics are the essential missing pieces from much of
the discussion. I propose we try to focus more on those. In
particular, 1) what is the source of author error in setting file
type information; and 2) how do browsers currently handle type
determination?
So here's some avenues of investigation:
Building on Kar's point about the mismatch between local and server
approaches:
• For local files, filename extensions have become the nearly
universal practice for setting file type handling in local file
systems. It would be useful to determine if authors make any
significant number of errors in setting filename extensions. If they
do, where does this happen and can we learn anything about why it
happens. Is there anything we as the HTML WG (or in cooperation with
other groups) can do to address this problem. Mac OS certainly
creates opportunities for author error in this respect in that other
type setting mechanisms can prevent authors from realizing a missing
filename extension (needed once the file goes tot he server or is
accessed using HTTP). However, modern Mac OS applications make if
difficult to create new content that doesn't have a proper filename
extension
• Servers typically try to map filename extensions to content type
headers. Servers may also be configured to provide content headers
that are not based solely on the filename extension (or not at all).
Is this a large source for the problem
iI suspect the server mapping issue is a big source of the problem.
That is authors set their filename extensions correctly because they
receive immediate local OS feedback when the filename extension is
wrong (except Mac OS provides other file type mapping techniques but
they are seldom used for web-family resource files).
It would be useful to know if this is a big source of the problem.
For example, it will do no good to specify a new 'important' content
header syntax if that too will be mis-configured. Typically the
problem may occur when new file formats become common where the
server has been installed and configured long before those formats
(and their associated filename extensions) came onto the scene.
Investigating these two issues (filename extensions and extension to
header mapping) might require mere discussion among the WG members.
What do we think about mistaken filename extensions? What do we think
about mistaken filename extension to content header mappings? Is
there any library research or research from W3C members that might
shed some light on the issue?
For example, if we explore these issues, and determine that filename
extensions nearly universally reflect the authors intentions, then
perhaps content sniffing is not the way to go (this is just a
hypothetical, it may not be the case). In that case browsers that
think they're providing greater value to their users by sniffing
content, are not doing that. It is the browser that treats filename
extensions or filename extensions in combination with content headers
as authoritative that will provide a better experience than content
sniffing. Sure content sniffing an image may be easy to do, but if
the only time an image has a different filename extension is the
times when an author wants it treated as a download (just as a semi-
flawed example) then the browser that doesn't sniff provides a better
user experience. At other times, a filename extension may be missing
or unknown. There it might make sense to turn to sniffing as another
(probably rare) fallback mechanism.
Determining the source of author error is one avenue of exploration.
The other specific that would help the conversation would be to
understand what browsers are doing now. It would be helpful for the
WG to conduct some research to find out how the latest browsers treat
content whose labeling differs for all sorts of resources and sub-
resources. So I suggest we might investigate
A. identifying how current browsers handle content based on
- content-type header
- filename extension
- @type attribute
- sniffed content
B. determine the browser priorities for the content indicators
listed in 'A' for:
- the main resource (including HTML, XML, RSS, ATOM, MPEG,
JPEG, PNG, etc.)
- LINK@href for style sheet data
- LINK@href for other data
- SCRIPT@src
- OBJECT@data
- IMG@src
- IMG@longdesc
- @cite
- A@href
- AREA@href
- etc.
I think having this information might help focus the conversation
better.
Take care,
Rob
Received on Friday, 24 August 2007 12:40:39 UTC