W3C home > Mailing lists > Public > w3c-wai-ig@w3.org > October to December 2002

Document sniffing for type information

From: Charles McCathieNevile <charles@w3.org>
Date: Tue, 3 Dec 2002 04:38:15 -0500 (EST)
To: Al Gilman <asgilman@iamdigex.net>
cc: <w3c-wai-ig@w3.org>
Message-ID: <Pine.LNX.4.30.0212030425360.20023-100000@tux.w3.org>

I disagree with this interpretation of the natural order, and of what will

In the Web of today there are a number of different possible interpretations
of a given document. (the same thing applies to trying to open an HTML
document for editing - programs that are too clever don't provide the option
of treating it as a plain file, and those programs are not helpful).
This means that sniffing the type of the document by looking inside it is
bound to be unreliable in a system where one program can handle content in
two or more ways.

In some operating systems this was the technique for associating a default
piece of code to operate on a file. Others used a naming convention - either
explicit as in the old windows systems and new MacOS to some extent, or
hidden as in all Mac systems before OS X and newer windows sytems (sort of).

For building interoperability both these approaches have flaws. Looking in
the file restricts the options in a complex world wehere there are multiple
pieces of software useful (this incidentally is also the problem with the
codebase attribute for the html4 object element), and relying on a register
of codes means you have to agree on registering them - it worked OK
internally, but when you deal with the Web it is messy.

MIME types are an attempt to allow a file author to say what they are serving
at a particular URI. The fact that the same bits might be available at
another URI, with a different meaning (a JPEG file containing RDF might be
served at two URIs as the same data but in one case as RDF and another as
JPEG) is not relevant.

An advanced user agent should give the user the option o sniffing in a
document and rendering it in some other way, but the initial capability
should be there to treat the document as its server intended. There is a real
world problem that people don't know how to configure their servers. We
should solve that with better servers and not by dumbing down the entire
system (which only has a tiny bit of smarts in it anyway, so it really
suffers from being dumed down).

just my 2 bits...


On Thu, 28 Nov 2002, Al Gilman wrote:

>If you request the URL which includes the 'index.asp' explicit "filename"
>the server serves the same page but in the HTTP metadata marks it as
>Strictly spec-compliant browsers will not process the HTML markup in a
>document that the HTTP transport has announced to be of type text/plain.
>Sniffing in the file and rendering as HTML something that has been given you
>as plain text (in terms of the markings in the HTTP metadata) is something
>that some browsers do but the TAG would like to discourage.
>  http://www.w3.org/2001/tag/2002/0129-mime#consistency
>In my personal opinion the TAG is here trying to legislate away the natural
>order of things.  This won't work, and continuing to try will lead the Web
>to more rapid irrelevance, not it highest potential.  But that may be off
>topic for this list.  It does matter to whether there are business benefits
>to be had for investing in design-for-all for *a web presence*.
Received on Tuesday, 3 December 2002 04:38:16 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 13 October 2015 16:21:21 UTC