Re: Why Microsoft's authoritative=true won't work and is a bad idea from Robert J Burns on 2008-07-07 (public-html@w3.org from July 2008)

From: Robert J Burns <rob@robburns.com>
Date: Mon, 7 Jul 2008 15:22:34 +0300
To: HTML WG <public-html@w3.org>
Message-Id: <AF4EF13F-578F-4A22-8807-BB2B8E6C8507@robburns.com>
I think Microsoft’s proposed solution (authoritative=true) could work  
as a stop-gap measure, but I think we need to think about a  
significantly different approach entirely. For example, I think HTML  
should have its own mechanism for setting the processing of embedded  
resources. I've proposed just such a mechanism in bugzilla[1].

I think we need to look at this with fresh eyes. The http content-type  
header was intended to serve double duty. First it provides access to  
mime types without needing to retrieve the entire resource to perform  
sniffing or otherwise examine the resource. Second it served as a  
mechanism for authors to alter the MIME type treatment of a file.  
There are problems with combining these two roles into one. There are  
also problems with not including such a mechanism within HTML itself.  
Some of those issues are covered in a wik page[2] on the topic.

Ideally, agents should be able to query the intrinsic type of  
resources across the network without needing to retrieve the resource.  
Also authors should be able to use the same resource with the same  
resource identifier to alter the treatment of a resource. The http  
content type header cannot serve both of these functions at the same  
time. It's time to have new headers and other new mechanisms to  
address all of these issues. Add to these problems the fact that http  
content type headers cannot address, the issue of compound document  
types (multiple parts, etc), and content type headers again cannot  
meet the needs of modern resources.

What I think we need is 1) an entirely new http header (and this is  
probably something for the http wg to consider) that can return an  
array of intrinsic content types for each resource (perhaps the  
sniffing code could be moved from the open source browser projects to  
the open source server projects to generate this header) and 2) a  
separate header for author control over the processing of a resource.  
However, this second function should not be needed for HTML since HTML  
should include its own attributes for controlling the processing of  
resources (as proposed in bugzilla).  Together these mechanisms  
address the problems identified in the wiki.

Finally, consider the problem that apache still has a long-standing  
bug that makes it impossible to configure the server to return no  
content type header when the content type of a file is unknown. This  
is over a decade after the spec and the creation of apache. Certainly  
apache addressed a need to handle files with no filename extension and  
send permit administrators to configure the server to send text/plain  
in such circumstances (as Roy Fielding has pointed out on numerous  
occasions[3]). However, apache goes further and sends "text/plain" for  
every unknown (unmapped) filename extension. Basically httpd's  
DefaultType should not even exist and instead there should be a  
setting to sniff extension-less filenames for text/plain type.  
Nevertheless, this long history created some of the need for client UA  
sniffing in the first place and I'm afraid I don't see a way back to  
no sniffing given this history. The only way out now is to come up  
with new replacement mechanisms to achieve the goals originally set  
for the http content type header.

In summary, we need:
* an http mechanism for discovery of the intrinsic type of resources  
including an array of multiple types in the case of multipart of  
compound documents
* an HTML mechanism for controlling the processing of resources
* perhaps an http mechanism also for controlling the processing of  
resources, but not for use in HTML

Take care,
Rob

[1]: <http://www.w3.org/Bugs/Public/show_bug.cgi?id=5776>
[2]: <http://esw.w3.org/topic/HTML/ContentTypeIssues>
[3]: <http://lists.w3.org/Archives/Public/public-html/2008Jul/0038.html>


On Jul 6, 2008, at 1:40 PM, Julian Reschke wrote:

>
> Ian Hickson wrote:
>> ...
>> If you would like the document to be processed as plain text, then  
>> there
>> might not be a good answer for you, sorry. Your use case is  
>> incompatible
>> with the use case of the many users who want to see feeds sent as
>> text/plain handled as feeds. Enough people mislabel their feeds as
>> text/plain that in practice documents labeled as text/plain are, in  
>> some
>> browsers, sniffed for feeds before being treated as plain text.
>> ...
>
> With the current text in HTML5, there's not only no "good answer"  
> but no
> answer at all (except by telling users to configure their UAs to  
> respect
> mime types).
>
> Sam's use case could be made compatible by making the response
> distinguishable from one sent by a misconfigured server.
>
> At this point it seems to me that you are simply not interested in  
> that
> case. Is this correct?
>
> BR, Julian
>
>
Received on Monday, 7 July 2008 12:23:17 UTC