Re: review of content type rules by IETF/HTTP community from Robert Burns on 2007-08-18 (public-html@w3.org from August 2007)

From: Robert Burns <rob@robburns.com>
Date: Fri, 17 Aug 2007 21:15:46 -0500
To: "public-html@w3.org WG" <public-html@w3.org>
Cc: Sam Ruby <rubys@us.ibm.com>, Dan Connolly <connolly@w3.org>
Message-Id: <2CCDD271-71E2-4D20-BF63-E8173AEE5FBC@robburns.com>
Hi Sam,

On Aug 17, 2007, at 8:47 PM, Robert Burns wrote:

>
> Hi Sam,
>
> On Aug 17, 2007, at 4:09 PM, Sam Ruby wrote:
>
>>
>> Dan Connolly wrote:
>>> The Feed/HTML sniffing review comment reminded me... since
>>> the scope of the HTML 5 spec overlaps with the scope
>>> of the HTTP spec, we should get review by the IETF/HTTP
>>> community (including the W3C TAG).
>>> I just packaged the relevant section
>>>   http://www.w3.org/html/wg/html5/#content-type-sniffing
>>> as an Internet Draft-to-be, with this introduction:
>>> ---8<---
>>> The HTTP specification[HTTP], in section 14.17 Content-Type, says  
>>> The
>>> Content-Type entity-header field indicates the media type of the
>>> entity-body sent to the recipient.
>>> The HTML 5 specification[HTML5] specifies an algorithm for  
>>> determining
>>> content types based on widely deployed practices and software.
>>> These specifications conflict in some cases. (@@ extract a test  
>>> cases
>>> from Step 10 of Feed/HTML sniffing (part of detailed review of
>>> "Determining the type of a new resource in a browsing context"))
>>> According to a straightforward architecture for content types in the
>>> Web[META], the HTTP specification should suffice and the HTML 5
>>> specification need not specify another algorithm. But that  
>>> architecture
>>> assumes that Web publishers (server adminstrators and content
>>> developers) reliably label content. Observing that labelling by Web
>>> publishers is widely unreliable, and software that works around  
>>> these
>>> problems is widespread, the choices seem to be:
>>>       * Convince Web publishers to fix incorrectly labelled Web  
>>> content
>>>         and label it correctly in the future.
>>>       * Update the HTTP specification to match widely deployed
>>>         conventions captured in the HTML 5 draft.
>>> While the second option is unappealing, the first option seems
>>> infeasible.
>>> The IETF community is invited to review the details of the HTML 5
>>> algorithm in detail.
>>
>> On this subject, I have a request.  I'll phrase it as a mild rant,  
>> but I fully understand why firefox made the change that it did.
>>
>> The following is a test case:
>>
>> http://feedvalidator.org/testcases/atom/1.1/brief-noerror.xml
>>
>> The response contains the content type of application/xml as I  
>> wanted to view the data in an XML parse tree.  Even though what I  
>> sent was per spec, and used to work, firefox decided that the need  
>> to emulate IE's broken behavior was more important than respecting  
>> my expressed wishes.
>>
>> While I don't expect this to be fixed, I would like to request  
>> that there be some parameter (like, "application/xml; damnit")  
>> which indicates that I think I know what I'm doing and would  
>> appreciate being treated like an adult.
>
> I'm getting a response header of text/xml not application/xml. I  
> may not understand what you're trying to accomplish here, but I  
> thought the application/xml MIME type was meant to solve the  
> problem with the text/xml MIME type.

After I sent this, I finally understood what you were saying. I think  
this issue is a little more fine-grained than normal content-type  
headers. While many of the XMLs register a MIME type. Not al of them  
do (as I'm sure you know). So even if the feed has its own MIME type  
(does it?), the browser still needs to be able to provide specific  
presentation for this MIME type. This issue might be better solved if  
browsers could toggle between tree view and normal view. (similar to  
the way all the browsers allow viewing of the source text).

Having said that, I share the concern about not treating content-type  
headers as authoritative. Earlier I had misread the XML spec to not  
call for fatal errors for stray & characters. Now that I have a  
better understanding of that issue, I see it calls for fatal-error  
type handling. However, playing loose with that norm is far less  
damaging than playing loose with the content-types. After all, as  
you're saying it actually prevents the authors who do want to deliver  
content in a specific way from being able to do so.

In contrast, trying to recover from mis-specified references in an  
XML document doesn't really have any impacts for those using the  
language correctly. It merely causes interoperability issues among  
different processors who recover from the error in different ways.  
This issue is much more serious, again, because it prevents delivery  
of content by authors who are following the spec correctly.

Take care,
Rob
Received on Saturday, 18 August 2007 02:16:14 UTC