Re: Document sniffing for type information from Al Gilman on 2002-12-03 (w3c-wai-ig@w3.org from October to December 2002)

From: Al Gilman <asgilman@iamdigex.net>
Date: Tue, 03 Dec 2002 08:32:20 -0500
To: Charles McCathieNevile <charles@w3.org>
Cc: <w3c-wai-ig@w3.org>
Message-Id: <5.1.0.14.2.20021203081131.02174550@pop.iamdigex.net>
At 04:38 AM 2002-12-03, Charles McCathieNevile wrote:

>An advanced user agent should give the user the option o sniffing in a
>document and rendering it in some other way, but the initial capability
>should be there to treat the document as its server intended.

Exact, flaming agreement.  Thank you for saying in some detail what you
*do* support.  This is good to do and I failed to do so.  [At least on this
post.  See reference below.]

What I am objecting to in the TAG finding is the idea that "thou
shalt not" offer the user a guess when the indicated type clearly
fails to match the content.

<quote
from="TAG finding"
cite="http://www.w3.org/2001/tag/2002/0129-mime#consistency">

2. Consistency of Media Types and Message Contents

    The architecture of the Web depends on applications making dispatching
    and security decisions for resources based on their Internet Media
    Types and other MIME headers. It is a serious error for the response
    body to be inconsistent with the assertions made about it by the MIME
    headers. Web software SHOULD NOT attempt to recover from such errors
    by guessing, but SHOULD report the error to the user to allow
    intelligent corrective action.

</quote>

<quote
from="email discussion"
cite="http://lists.w3.org/Archives/Public/www-tag/2002May/0134.html">

** How to recover from a type error

In terms of how to recover, the user should be in command, but help is helpful
so long as it only helps.  This is territory that we have ploughed and re-ploug
hed in the WAI, so let me try to give some context and history.

</quote>

>There is a real
>world problem that people don't know how to configure their servers. We
>should solve that with better servers and not by dumbing down the entire
>system (which only has a tiny bit of smarts in it anyway, so it really
>suffers from being dumed down).

The problem is with the division of labor.  With the subject-matter
content coming from some people and the MIME headers being administered
by other people (an economically positive division of labor) there is
a natural tendency to fail to get these in lock step, so there is
a persistent need for a recovery plan _in the architecture_ for how
to heal the breach when content arrives with such a disconnect.


>I disagree with this interpretation of the natural order, and of what will
>work.

... or with this hyper-elliptical allusion to it...

Al

>In the Web of today there are a number of different possible interpretations
>of a given document. (the same thing applies to trying to open an HTML
>document for editing - programs that are too clever don't provide the option
>of treating it as a plain file, and those programs are not helpful).
>This means that sniffing the type of the document by looking inside it is
>bound to be unreliable in a system where one program can handle content in
>two or more ways.
>
>In some operating systems this was the technique for associating a default
>piece of code to operate on a file. Others used a naming convention - either
>explicit as in the old windows systems and new MacOS to some extent, or
>hidden as in all Mac systems before OS X and newer windows sytems (sort of).
>
>For building interoperability both these approaches have flaws. Looking in
>the file restricts the options in a complex world wehere there are multiple
>pieces of software useful (this incidentally is also the problem with the
>codebase attribute for the html4 object element), and relying on a register
>of codes means you have to agree on registering them - it worked OK
>internally, but when you deal with the Web it is messy.
>
>MIME types are an attempt to allow a file author to say what they are serving
>at a particular URI. The fact that the same bits might be available at
>another URI, with a different meaning (a JPEG file containing RDF might be
>served at two URIs as the same data but in one case as RDF and another as
>JPEG) is not relevant.
>
>An advanced user agent should give the user the option o sniffing in a
>document and rendering it in some other way, but the initial capability
>should be there to treat the document as its server intended. There is a real
>world problem that people don't know how to configure their servers. We
>should solve that with better servers and not by dumbing down the entire
>system (which only has a tiny bit of smarts in it anyway, so it really
>suffers from being dumed down).
>
>just my 2 bits...
>
>chaals
>
>On Thu, 28 Nov 2002, Al Gilman wrote:
>
> >If you request the URL which includes the 'index.asp' explicit "filename"
> >the server serves the same page but in the HTTP metadata marks it as
> >text/plain.
> >
> >Strictly spec-compliant browsers will not process the HTML markup in a
> >document that the HTTP transport has announced to be of type text/plain.
> >Sniffing in the file and rendering as HTML something that has been given you
> >as plain text (in terms of the markings in the HTTP metadata) is something
> >that some browsers do but the TAG would like to discourage.
> >
> >  http://www.w3.org/2001/tag/2002/0129-mime#consistency
> >
> >In my personal opinion the TAG is here trying to legislate away the natural
> >order of things.  This won't work, and continuing to try will lead the Web
> >to more rapid irrelevance, not it highest potential.  But that may be off
> >topic for this list.  It does matter to whether there are business benefits
> >to be had for investing in design-for-all for *a web presence*.
> >
> >Al
Received on Tuesday, 3 December 2002 08:29:38 UTC